ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

Konstantin Mishchenko, Grigory Malinovsky, Sebastian Stich, Peter Richtárik*

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

23 Scopus citations

Abstract

We introduce ProxSkip-a surprisingly simple and provably efficient method for minimizing the sum of a smooth (f) and an expensive nonsmooth proximable (ψ) function. The canonical approach to solving such problems is via the proximal gradient descent (ProxGD) algorithm, which is based on the evaluation of the gradient of f and the prox operator of ψ in each iteration. In this work we are specifically interested in the regime in which the evaluation of prox is costly relative to the evaluation of the gradient, which is the case in many applications. ProxSkip allows for the expensive prox operator to be skipped in most iterations: while its iteration complexity is O(κ log 1/ε), where κ is the condition number of f, the number of prox evaluations is O(√κ log 1/ε) only. Our main motivation comes from federated learning, where evaluation of the gradient operator corresponds to taking a local GD step independently on all devices, and evaluation of prox corresponds to (expensive) communication in the form of gradient averaging. In this context, ProxSkip offers an effective acceleration of communication complexity. Unlike other local gradient-type methods, such as FedAvg, SCAFFOLD, S-Local-GD and FedLin, whose theoretical communication complexity is worse than, or at best matching, that of vanilla GD in the heterogeneous data regime, we obtain a provable and large improvement without any heterogeneity-bounding assumptions.

Original languageEnglish (US)
Pages15750-15769
Number of pages20
StatePublished - 2022
Event39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States
Duration: Jul 17 2022Jul 23 2022

Conference

Conference39th International Conference on Machine Learning, ICML 2022
Country/TerritoryUnited States
CityBaltimore
Period07/17/2207/23/22

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!'. Together they form a unique fingerprint.

Cite this