Accelerated, Parallel, and PROXimal coordinate descent

Olivier Fercoq, Peter Richtárik

Research output: Contribution to journalArticlepeer-review

173 Scopus citations

Abstract

We propose a new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel, and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate 2ω¯L¯R2/(k + 1)2, where k is the iteration counter, ω¯ is a data-weighted average degree of separability of the loss function, L is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and R is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent. The fact that the method depends on the average degree of separability, and not on the maximum degree, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel randomized coordinate descent algorithms based on the concept of ESO. In special cases, our method recovers several classical and recent algorithms such as simple and accelerated proximal gradient descent, as well serial, parallel, and distributed versions of randomized block coordinate descent. Our bounds match or improve on the best known bounds for these methods. (c) 2015 SIAM. Published by SIAM under the terms of the Creative Commons 4.0 license.

Original languageEnglish (US)
Pages (from-to)1997-2023
Number of pages27
JournalSIAM Journal on Optimization
Volume25
Issue number4
DOIs
StatePublished - 2015
Externally publishedYes

Keywords

  • Acceleration
  • Big data
  • Complexity
  • Convex optimization
  • Parallel methods
  • Partial separability
  • Proximal methods
  • Randomized coordinate descent

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'Accelerated, Parallel, and PROXimal coordinate descent'. Together they form a unique fingerprint.

Cite this