DASHA: DISTRIBUTED NONCONVEX OPTIMIZATION WITH COMMUNICATION COMPRESSION AND OPTIMAL ORACLE COMPLEXITY

Alexander Tyurin, Peter Richtárik

Research output: Contribution to conferencePaperpeer-review

6 Scopus citations

Abstract

We develop and analyze DASHA: a new family of methods for nonconvex distributed optimization problems. When the local functions at the nodes have a finite-sum or an expectation form, our new methods, DASHA-PAGE, DASHA-MVR and DASHA-SYNC-MVR, improve the theoretical oracle and communication complexity of the previous state-of-the-art method MARINA by Gorbunov et al. (2020). In particular, to achieve an ε-stationary point, and considering the random sparsifier RandK as an example, our methods compute the optimal number of gradients O(√m/ε√n) and O(σ3/2n) in finite-sum and expectation form cases, respectively, while maintaining the SOTA communication complexity O(d/ε√n). Furthermore, unlike MARINA, the new methods DASHA, DASHA-PAGE and DASHA-MVR send compressed vectors only, which makes them more practical for federated learning. We extend our results to the case when the functions satisfy the Polyak-Łojasiewicz condition. Finally, our theory is corroborated in practice: we see a significant improvement in experiments with nonconvex classification and training of deep learning models.

Original languageEnglish (US)
StatePublished - 2023
Event11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda
Duration: May 1 2023May 5 2023

Conference

Conference11th International Conference on Learning Representations, ICLR 2023
Country/TerritoryRwanda
CityKigali
Period05/1/2305/5/23

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'DASHA: DISTRIBUTED NONCONVEX OPTIMIZATION WITH COMMUNICATION COMPRESSION AND OPTIMAL ORACLE COMPLEXITY'. Together they form a unique fingerprint.

Cite this