TY - GEN
T1 - Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm
AU - Salim, Adil
AU - Richtarik, Peter
N1 - KAUST Repository Item: Exported on 2021-08-10
PY - 2020
Y1 - 2020
N2 - We consider the task of sampling with respect to a log concave probability distribution. The potential of the target distribution is assumed to be composite, i.e., written as the sum of a smooth convex term, and a nonsmooth convex term possibly taking infinite values. The target distribution can be seen as a minimizer of the Kullback-Leibler divergence defined on the Wasserstein space (i.e., the space of probability measures). In the first part of this paper, we establish a strong duality result for this minimization problem. In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm. Our approach relies on viewing PSGLA as a primal dual algorithm and covers many cases where the target distribution is not fully supported. In particular, we show that if the potential is strongly convex, the complexity of PSGLA is O(1/e2) in terms of the 2-Wasserstein distance. In contrast, the complexity of the Projected Langevin Algorithm is O(1/e12) in terms of total variation when the potential is convex.
AB - We consider the task of sampling with respect to a log concave probability distribution. The potential of the target distribution is assumed to be composite, i.e., written as the sum of a smooth convex term, and a nonsmooth convex term possibly taking infinite values. The target distribution can be seen as a minimizer of the Kullback-Leibler divergence defined on the Wasserstein space (i.e., the space of probability measures). In the first part of this paper, we establish a strong duality result for this minimization problem. In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm. Our approach relies on viewing PSGLA as a primal dual algorithm and covers many cases where the target distribution is not fully supported. In particular, we show that if the potential is strongly convex, the complexity of PSGLA is O(1/e2) in terms of the 2-Wasserstein distance. In contrast, the complexity of the Projected Langevin Algorithm is O(1/e12) in terms of total variation when the potential is convex.
UR - http://hdl.handle.net/10754/666021
UR - https://proceedings.neurips.cc/paper/2020/hash/2779fda014fbadb761f67dd708c1325e-Abstract.html
UR - http://www.scopus.com/inward/record.url?scp=85108254678&partnerID=8YFLogxK
M3 - Conference contribution
BT - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
PB - Neural information processing systems foundation
ER -