TY - GEN
T1 - High Dimensional Differentially Private Stochastic Optimization with Heavy-Tailed Data
AU - Hu, Lijie
AU - Ni, Shuo
AU - Xiao, Hanshen
AU - Wang, Di
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/6/12
Y1 - 2022/6/12
N2 - As one of the most fundamental problems in machine learning, statistics and differential privacy, Differentially Private Stochastic Convex Optimization (DP-SCO) has been extensively studied in recent years. However, most of the previous work can only handle either regular data distributions or irregular data in the low dimensional space case. To better understand the challenges arising from irregular data distributions, in this paper we provide the first study on the problem of DP-SCO with heavy-Tailed data in the high dimensional space. In the first part we focus on the problem over some polytope constraint (such as the l1-norm ball). We show that if the loss function is smooth and its gradient has bounded second order moment, it is possible to get a (high probability) error bound (excess population risk) of Õ(log d/(n?)1/3) in the ?-DP model, where n is the sample size and d is the dimension of the underlying space. Next, for LASSO, if the data distribution has bounded fourth-order moments, we improve the bound to Õ(log d/(n?)2/5) in the $(?)-DP model. In the second part of the paper, we study sparse learning with heavy-Tailed data. We first revisit the sparse linear model and propose a truncated DP-IHT method whose output could achieve an error of Õ ((s*2 log2d)/n?), where s*is the sparsity of the underlying parameter. Then we study a more general problem over the sparsity (i.e., l0-norm) constraint, and show that it is possible to achieve an error of Õ((s*3/2 log d)/n?), which is also near optimal up to a factor of Õ(gs*), if the loss function is smooth and strongly convex.
AB - As one of the most fundamental problems in machine learning, statistics and differential privacy, Differentially Private Stochastic Convex Optimization (DP-SCO) has been extensively studied in recent years. However, most of the previous work can only handle either regular data distributions or irregular data in the low dimensional space case. To better understand the challenges arising from irregular data distributions, in this paper we provide the first study on the problem of DP-SCO with heavy-Tailed data in the high dimensional space. In the first part we focus on the problem over some polytope constraint (such as the l1-norm ball). We show that if the loss function is smooth and its gradient has bounded second order moment, it is possible to get a (high probability) error bound (excess population risk) of Õ(log d/(n?)1/3) in the ?-DP model, where n is the sample size and d is the dimension of the underlying space. Next, for LASSO, if the data distribution has bounded fourth-order moments, we improve the bound to Õ(log d/(n?)2/5) in the $(?)-DP model. In the second part of the paper, we study sparse learning with heavy-Tailed data. We first revisit the sparse linear model and propose a truncated DP-IHT method whose output could achieve an error of Õ ((s*2 log2d)/n?), where s*is the sparsity of the underlying parameter. Then we study a more general problem over the sparsity (i.e., l0-norm) constraint, and show that it is possible to achieve an error of Õ((s*3/2 log d)/n?), which is also near optimal up to a factor of Õ(gs*), if the loss function is smooth and strongly convex.
KW - differential privacy
KW - high dimensional statistics
KW - robust statistics
KW - stochastic convex optimization
UR - http://www.scopus.com/inward/record.url?scp=85133016718&partnerID=8YFLogxK
U2 - 10.1145/3517804.3524144
DO - 10.1145/3517804.3524144
M3 - Conference contribution
AN - SCOPUS:85133016718
T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
SP - 227
EP - 236
BT - PODS 2022 - Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
PB - Association for Computing Machinery
T2 - 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2022
Y2 - 12 June 2022 through 17 June 2022
ER -