Random Reshuffling with Variance Reduction: New Analysis and Better Rates

Grigory Malinovsky, Alibek Sailanbayev, Peter Richtárik

Research output: Contribution to conferencePaperpeer-review

3 Scopus citations

Abstract

Virtually all state-of-the-art methods for training supervised machine learning models are variants of Stochastic Gradient Descent (SGD), enhanced with a number of additional tricks, such as minibatching, momentum, and adaptive stepsizes. However, one of the most basic questions in the design of successful SGD methods, one that is orthogonal to the aforementioned tricks, is the choice of the next training data point to be learning from. Standard variants of SGD employ a sampling with replacement strategy, which means that the next training data point is sampled from the entire data set, often independently of all previous samples. While standard SGD is well understood theoretically, virtually all widely used machine learning software is based on sampling without replacement as this is often empirically superior. That is, the training data is randomly shuffled/permuted, either only once at the beginning, strategy known as random shuffling (Rand-Shuffle), or before every epoch, strategy known as random reshuffling (Rand-Reshuffle), and training proceeds in the data order dictated by the shuffling. Rand-Shuffle and Rand-Reshuffle strategies have for a long time remained beyond the reach of theoretical analysis that would satisfactorily explain their success. However, very recently, Mishchenko et al. [2020] provided tight sublinear convergence rates through a novel analysis, and showed that these strategies can improve upon standard SGD in certain regimes. Inspired by these results, we seek to further improve the rates of shuffling-based methods. In particular, we show that it is possible to enhance them with a variance reduction mechanism, obtaining linear convergence rates. To the best of our knowledge, our linear convergence rates are the best for any method based on sampling without replacement.

Original languageEnglish (US)
Pages1347-1357
Number of pages11
StatePublished - 2023
Event39th Conference on Uncertainty in Artificial Intelligence, UAI 2023 - Pittsburgh, United States
Duration: Jul 31 2023Aug 4 2023

Conference

Conference39th Conference on Uncertainty in Artificial Intelligence, UAI 2023
Country/TerritoryUnited States
CityPittsburgh
Period07/31/2308/4/23

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Random Reshuffling with Variance Reduction: New Analysis and Better Rates'. Together they form a unique fingerprint.

Cite this