TY - GEN
T1 - Privacy-preserving Sparse Generalized Eigenvalue Problem
AU - Hu, Lijie
AU - Xiang, Zihang
AU - Liu, Jiabin
AU - Wang, Di
N1 - KAUST Repository Item: Exported on 2023-07-28
Acknowledged KAUST grant number(s): BAS/1/1689-01-01, FCC/1/1976-49-01, REI/1/4811-10-01, URF/1/4663-01-01
Acknowledgements: Lijie Hu, Zihang Xiang and Di Wang are supported in part by the baseline funding BAS/1/1689-01-01, funding from the CRG grand URF/1/4663-01-01, FCC/1/1976-49-01 from CBRC and funding from the AI Initiative REI/1/4811-10-01 of King Abdullah University of Science and Technology (KAUST). Di Wang was also supported by the funding of the SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence (SDAIA-KAUST AI).
PY - 2023/6/4
Y1 - 2023/6/4
N2 - In this paper we study the (sparse) Generalized Eigenvalue Problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA) and sliced inverse regression (SIR). We provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a ρ- Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has an ℓ2-norm estimation error of Õ(n/d + d/n2ρ) (under some mild assumptions), where d is the dimension and n is the sample size. Next, we discuss how to find such a initial parameter privately. In the high dimensional sparse case where d ≫ n, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of Õ(s log d/n + s log d/n2ρ) for various statistical models, where s is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of Poly(log n) by providing the lower bounds of PCA and SIR under statistical setting and in the CDP model. Finally, to give a separation between ∊-DP and ρ-CDP for GEP, we also provide the lower bound Ω(d/n + d2/n2 ∊2) and Ω(s log d/n + s2 log2d/n2∊2) of private minimax risk for PCA, under the statistical setting and ∊-DP model, in low and high dimensional sparse case respectively.
AB - In this paper we study the (sparse) Generalized Eigenvalue Problem (GEP), which arises in a number of modern statistical learning models, such as principal component analysis (PCA), canonical correlation analysis (CCA), Fisher's discriminant analysis (FDA) and sliced inverse regression (SIR). We provide the first study on GEP in the differential privacy (DP) model under both deterministic and stochastic settings. In the low dimensional case, we provide a ρ- Concentrated DP (CDP) method namely DP-Rayleigh Flow and show if the initial vector is close enough to the optimal vector, its output has an ℓ2-norm estimation error of Õ(n/d + d/n2ρ) (under some mild assumptions), where d is the dimension and n is the sample size. Next, we discuss how to find such a initial parameter privately. In the high dimensional sparse case where d ≫ n, we propose the DP-Truncated Rayleigh Flow method whose output could achieve an error of Õ(s log d/n + s log d/n2ρ) for various statistical models, where s is the sparsity of the underlying parameter. Moreover, we show that these errors in the stochastic setting are optimal up to a factor of Poly(log n) by providing the lower bounds of PCA and SIR under statistical setting and in the CDP model. Finally, to give a separation between ∊-DP and ρ-CDP for GEP, we also provide the lower bound Ω(d/n + d2/n2 ∊2) and Ω(s log d/n + s2 log2d/n2∊2) of private minimax risk for PCA, under the statistical setting and ∊-DP model, in low and high dimensional sparse case respectively.
UR - http://hdl.handle.net/10754/693273
UR - https://proceedings.mlr.press/v206/hu23a.html
UR - http://www.scopus.com/inward/record.url?scp=85165177202&partnerID=8YFLogxK
M3 - Conference contribution
SP - 5052
EP - 5062
BT - 26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023
PB - ML Research Press
ER -