TY - GEN
T1 - Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory
AU - Zhang, Jianyi
AU - Zhang, Ruiyi
AU - Carin, Lawrence
AU - Chen, Changyou
PY - 2020
Y1 - 2020
N2 - Particle-optimization-based sampling (POS) is a recently developed effective sampling technique that interactively updates a set of particles to approximate a target distribution. A representative algorithm is the Stein variational gradient descent (SVGD). We prove, under certain conditions, SVGD experiences a theoretical pitfall, i.e., particles tend to collapse. As a remedy, we generalize POS to a stochastic setting by injecting random noise into particle updates, thus termed stochastic particle-optimization sampling (SPOS). Notably, for the first time, we develop non asymptotic convergence theory for the SPOS framework (related to SVGD), characterizing algorithm convergence in terms of the 1-Wasserstein distance w.r.t. the numbers of particles and iterations. Somewhat surprisingly, with the same number of updates (not too large) for each particle, our theory suggests adopting more particles does not necessarily lead to a better approximation of a target distribution, due to limited computational budget and numerical errors. This phenomenon is also observed in SVGD and verified via a synthetic experiment. Extensive experimental results verify our theory and demonstrate the effectiveness of our proposed framework.
AB - Particle-optimization-based sampling (POS) is a recently developed effective sampling technique that interactively updates a set of particles to approximate a target distribution. A representative algorithm is the Stein variational gradient descent (SVGD). We prove, under certain conditions, SVGD experiences a theoretical pitfall, i.e., particles tend to collapse. As a remedy, we generalize POS to a stochastic setting by injecting random noise into particle updates, thus termed stochastic particle-optimization sampling (SPOS). Notably, for the first time, we develop non asymptotic convergence theory for the SPOS framework (related to SVGD), characterizing algorithm convergence in terms of the 1-Wasserstein distance w.r.t. the numbers of particles and iterations. Somewhat surprisingly, with the same number of updates (not too large) for each particle, our theory suggests adopting more particles does not necessarily lead to a better approximation of a target distribution, due to limited computational budget and numerical errors. This phenomenon is also observed in SVGD and verified via a synthetic experiment. Extensive experimental results verify our theory and demonstrate the effectiveness of our proposed framework.
KW - GRANULAR MEDIA EQUATIONS
M3 - Conference contribution
T3 - Proceedings of Machine Learning Research
SP - 1877
EP - 1886
BT - INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108
A2 - Chiappa, S
A2 - Calandra, R
PB - ADDISON-WESLEY PUBL CO
T2 - 23rd International Conference on Artificial Intelligence and Statistics (AISTATS)
Y2 - 26 August 2020 through 28 August 2020
ER -