Adding vs. averaging in distributed primal-dual optimization

Chenxin Ma, Virginia Smith, Martin Jaggi, Michael I. Jordan, Peter Richtárik, Martin Takáč

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

87 Scopus citations

Abstract

Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of the recent communication-efficient primal-dual framework (CoCoA) for distributed optimization. Our framework, CoCoA+, allows for additive combination of local updates to the global parameters at each iteration, whereas previous schemes with convergence guarantees only allow conservative averaging. We give stronger (primal-dual) convergence rate guarantees for both CoCoA as well as our new variants, and generalize the theory for both methods to cover non-smooth convex loss functions. We provide an extensive experimental comparison that shows the markedly improved performance of CoCoA+ on several real-world distributed datasets, especially when scaling up the number of machines.

Original languageEnglish (US)
Title of host publication32nd International Conference on Machine Learning, ICML 2015
EditorsFrancis Bach, David Blei
PublisherInternational Machine Learning Society (IMLS)
Pages1973-1982
Number of pages10
ISBN (Electronic)9781510810587
StatePublished - 2015
Externally publishedYes
Event32nd International Conference on Machine Learning, ICML 2015 - Lile, France
Duration: Jul 6 2015Jul 11 2015

Publication series

Name32nd International Conference on Machine Learning, ICML 2015
Volume3

Other

Other32nd International Conference on Machine Learning, ICML 2015
Country/TerritoryFrance
CityLile
Period07/6/1507/11/15

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Adding vs. averaging in distributed primal-dual optimization'. Together they form a unique fingerprint.

Cite this