Abstract
We revisit the theoretical properties of Hamiltonian stochastic differential equations (SDEs) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. Our main result is a novel analysis for the effect of mini-batches through the lens of differential operator splitting, revising previous literature results. The stochastic component of a Hamiltonian SDE is decoupled from the gradient noise, for which we make no normality assumptions. This leads to the identification of a convergence bottleneck: when considering mini-batches, the best achievable error rate is O(η2), with η being the integrator step size. Our theoretical results are supported by an empirical study on a variety of regression and classification tasks for Bayesian neural networks.
Original language | English (US) |
---|---|
Pages | 6744-6778 |
Number of pages | 35 |
State | Published - 2022 |
Event | 39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States Duration: Jul 17 2022 → Jul 23 2022 |
Conference
Conference | 39th International Conference on Machine Learning, ICML 2022 |
---|---|
Country/Territory | United States |
City | Baltimore |
Period | 07/17/22 → 07/23/22 |
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability