TY - JOUR
T1 - Direct nonlinear acceleration
AU - Dutta, Aritra
AU - Bergou, El Houcine
AU - Xiao, Yunming
AU - Canini, Marco
AU - Richtárik, Peter
N1 - Funding Information:
Aritra Dutta acknowledges being an affiliated researcher at the Pioneer Centre for AI, Denmark.
Publisher Copyright:
© 2022 The Author(s)
PY - 2022/1
Y1 - 2022/1
N2 - Optimization acceleration techniques such as momentum play a key role in state-of-the-art machine learning algorithms. Recently, generic vector sequence extrapolation techniques, such as regularized nonlinear acceleration (RNA) of Scieur et al. [22], were proposed and shown to accelerate fixed point iterations. In contrast to RNA which computes extrapolation coefficients by (approximately) setting the gradient of the objective function to zero at the extrapolated point, we propose a more direct approach, which we call direct nonlinear acceleration (DNA). In DNA, we aim to minimize (an approximation of) the function value at the extrapolated point instead. We adopt a regularized approach with regularizers designed to prevent the model from entering a region in which the functional approximation is less precise. While the computational cost of DNA is comparable to that of RNA, our direct approach significantly outperforms RNA on both synthetic and real-world datasets. While the focus of this paper is on convex problems, we obtain very encouraging results in accelerating the training of neural networks.
AB - Optimization acceleration techniques such as momentum play a key role in state-of-the-art machine learning algorithms. Recently, generic vector sequence extrapolation techniques, such as regularized nonlinear acceleration (RNA) of Scieur et al. [22], were proposed and shown to accelerate fixed point iterations. In contrast to RNA which computes extrapolation coefficients by (approximately) setting the gradient of the objective function to zero at the extrapolated point, we propose a more direct approach, which we call direct nonlinear acceleration (DNA). In DNA, we aim to minimize (an approximation of) the function value at the extrapolated point instead. We adopt a regularized approach with regularizers designed to prevent the model from entering a region in which the functional approximation is less precise. While the computational cost of DNA is comparable to that of RNA, our direct approach significantly outperforms RNA on both synthetic and real-world datasets. While the focus of this paper is on convex problems, we obtain very encouraging results in accelerating the training of neural networks.
UR - http://www.scopus.com/inward/record.url?scp=85139592479&partnerID=8YFLogxK
U2 - 10.1016/j.ejco.2022.100047
DO - 10.1016/j.ejco.2022.100047
M3 - Article
AN - SCOPUS:85139592479
SN - 2192-4406
VL - 10
JO - EURO Journal on Computational Optimization
JF - EURO Journal on Computational Optimization
M1 - 100047
ER -