TY - JOUR
T1 - A distributed flexible delay-tolerant proximal gradient algorithm
AU - Mishchenko, Konstantin
AU - Iutzeler, Franck
AU - Malick, Jérôme
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: We thank Robert Gower for valuable comments on the first versions of this paper.
PY - 2020/3/17
Y1 - 2020/3/17
N2 - We develop and analyze an asynchronous algorithm for distributed convex optimization when the objective can be written as a sum of smooth functions, local to each worker, and a nonsmooth function. Unlike many existing methods, our distributed algorithm is adjustable to various levels of communication cost, delays, machines' computational power, and functions' smoothness. A unique feature is that the step sizes do not depend on communication delays nor number of machines, which is highly desirable for scalability. We prove that the algorithm converges linearly in the strongly convex case, and provide guarantees of convergence for the non-strongly convex case. The obtained rates are the same as the vanilla proximal gradient algorithm over some introduced epoch sequence that subsumes the delays of the system. We provide numerical results on large-scale machine learning problems to demonstrate the merits of the proposed method.
AB - We develop and analyze an asynchronous algorithm for distributed convex optimization when the objective can be written as a sum of smooth functions, local to each worker, and a nonsmooth function. Unlike many existing methods, our distributed algorithm is adjustable to various levels of communication cost, delays, machines' computational power, and functions' smoothness. A unique feature is that the step sizes do not depend on communication delays nor number of machines, which is highly desirable for scalability. We prove that the algorithm converges linearly in the strongly convex case, and provide guarantees of convergence for the non-strongly convex case. The obtained rates are the same as the vanilla proximal gradient algorithm over some introduced epoch sequence that subsumes the delays of the system. We provide numerical results on large-scale machine learning problems to demonstrate the merits of the proposed method.
UR - http://hdl.handle.net/10754/662928
UR - https://epubs.siam.org/doi/10.1137/18M1194699
UR - http://www.scopus.com/inward/record.url?scp=85084680082&partnerID=8YFLogxK
U2 - 10.1137/18M1194699
DO - 10.1137/18M1194699
M3 - Article
SN - 1052-6234
VL - 30
SP - 933
EP - 959
JO - SIAM Journal on Optimization
JF - SIAM Journal on Optimization
IS - 1
ER -