Bridging the gap between stochastic gradient MCMC and stochastic optimization

Changyou Chen, David Carlson, Zhe Gan, Chunyuan Li, Lawrence Carin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

42 Scopus citations


Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied. We explore this relationship by applying simulated annealing to an SG-MCMC algorithm. Furthermore, we extend recent SG-MCMC methods with two key components: i) adaptive preconditioners (as in ADAgrad or RMSprop), and ii) adaptive element-wise momentum weights. The zero-temperature limit gives a novel stochastic optimization method with adaptive element-wise momentum weights, while conventional optimization methods only have a shared, static momentum weight. Under certain assumptions, our theoretical analysis suggests the proposed simulated annealing approach converges close to the global optima. Experiments on several deep neural network models show state-of-the-art results compared to related stochastic optimization algorithms.
Original languageEnglish (US)
Title of host publicationProceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016
Number of pages10
StatePublished - Jan 1 2016
Externally publishedYes


Dive into the research topics of 'Bridging the gap between stochastic gradient MCMC and stochastic optimization'. Together they form a unique fingerprint.

Cite this