TY - GEN
T1 - Learning sigmoid belief networks via monte carlo expectation maximization
AU - Song, Zhao
AU - Henao, Ricardo
AU - Carlson, David
AU - Carin, Lawrence
N1 - Generated from Scopus record by KAUST IRTS on 2021-02-09
PY - 2016/1/1
Y1 - 2016/1/1
N2 - Belief networks are commonly used generative models of data, but require expensive posterior estimation to train and test the model. Learning typically proceeds by posterior sampling, variational approximations, or recognition networks, combined with stochastic optimization. We propose using an online Monte Carlo expectation-maximization (MCEM) algorithm to learn the maximum a posteriori (MAP) estimator of the generative model or optimize the variational lower bound of a recognition network. The E-step in this algorithm requires posterior samples, which are already generated in current learning schema. For the M-step, we augment with Pólya-Gamma (PG) random variables to give an analytic updating scheme. We show relationships to standard learning approaches by deriving stochastic gradient ascent in the MCEM framework. We apply the proposed methods to both binary and count data. Experimental results show that MCEM improves the convergence speed and often improves hold-out performance over existing learning methods. Our approach is readily generalized to other recognition networks.
AB - Belief networks are commonly used generative models of data, but require expensive posterior estimation to train and test the model. Learning typically proceeds by posterior sampling, variational approximations, or recognition networks, combined with stochastic optimization. We propose using an online Monte Carlo expectation-maximization (MCEM) algorithm to learn the maximum a posteriori (MAP) estimator of the generative model or optimize the variational lower bound of a recognition network. The E-step in this algorithm requires posterior samples, which are already generated in current learning schema. For the M-step, we augment with Pólya-Gamma (PG) random variables to give an analytic updating scheme. We show relationships to standard learning approaches by deriving stochastic gradient ascent in the MCEM framework. We apply the proposed methods to both binary and count data. Experimental results show that MCEM improves the convergence speed and often improves hold-out performance over existing learning methods. Our approach is readily generalized to other recognition networks.
UR - http://www.scopus.com/inward/record.url?scp=85048438467&partnerID=8YFLogxK
M3 - Conference contribution
BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016
PB - PMLR
ER -