TY - JOUR
T1 - The Bayesian Learning Rule
AU - Khan, Mohammad Emtiyaz
AU - Rue, Haavard
N1 - KAUST Repository Item: Exported on 2023-09-26
Acknowledgements: M. E. Khan would like to thank many current and past colleagues at RIKEN-AIP, including W. Lin, D. Nielsen, X. Meng, T. M¨ollenhoff and P. Alquier, for many insightful discussions that helped shape parts of this paper. We also thank the anonymous reviewers for their feedback to improve the presentation.
PY - 2023/9/21
Y1 - 2023/9/21
N2 - We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.
AB - We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.
UR - http://hdl.handle.net/10754/670192
UR - https://arxiv.org/pdf/2107.04562.pdf
M3 - Article
JO - Accepted by Journal of Machine Learning Research
JF - Accepted by Journal of Machine Learning Research
ER -