TY - GEN
T1 - Local SGD: Unified Theory and New Efficient Methods
AU - Gorbunov, Eduard
AU - Hanzely, Filip
AU - Richtarik, Peter
N1 - KAUST Repository Item: Exported on 2021-08-31
Acknowledgements: This work was supported by the KAUST baseline research grant of P. Richtarik. Part of this work was done while E. Gorbunov was a research intern at KAUST. The research of E. Gorbunov in Lemmas E.1, E.3 was also supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) 075-00337-20-03, and in Lemmas E.2, E.4 – by RFBR, project number 19-31-51001.
PY - 2021
Y1 - 2021
N2 - We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models. We recover several known methods as a special case of our general framework, including Local SGD/FedAvg, SCAFFOLD, and several variants of SGD not originally designed for federated learning. Our framework covers both the identical and heterogeneous data settings, supports both random and deterministic number of local steps, and can work with a wide array of local stochastic gradient estimators, including shifted estimators which are able to adjust the fixed points of local iterations for faster convergence. As an application of our framework, we develop multiple novel FL optimizers which are superior to existing methods. In particular, we develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions.
AB - We present a unified framework for analyzing local SGD methods in the convex and strongly convex regimes for distributed/federated training of supervised machine learning models. We recover several known methods as a special case of our general framework, including Local SGD/FedAvg, SCAFFOLD, and several variants of SGD not originally designed for federated learning. Our framework covers both the identical and heterogeneous data settings, supports both random and deterministic number of local steps, and can work with a wide array of local stochastic gradient estimators, including shifted estimators which are able to adjust the fixed points of local iterations for faster convergence. As an application of our framework, we develop multiple novel FL optimizers which are superior to existing methods. In particular, we develop the first linearly converging local SGD method which does not require any data homogeneity or other strong assumptions.
UR - http://hdl.handle.net/10754/665897
UR - http://proceedings.mlr.press/v130/gorbunov21a.html
M3 - Conference contribution
BT - 24th International Conference on Artificial Intelligence and Statistics (AISTATS)
PB - MLResearchPress
ER -