Smaller generalization error derived for a deep residual neural network compared with shallow networks

Aku Jaakko Alexis Kammonen, Jonas Kiessling, Petr Plecháč, Mattias Sandberg, Anders Szepessy, Raul Tempone

Research output: Contribution to journalArticlepeer-review

Abstract

Estimates of the generalization error are proved for a residual neural network with L random Fourier features layers z¯ℓ+1=z¯ℓ+Re∑Kk=1b¯ℓkeiωℓkz¯ℓ+Re∑Kk=1c¯ℓkeiω′ℓk⋅x⁠. An optimal distribution for the frequencies (ωℓk,ω′ℓk) of the random Fourier features eiωℓkz¯ℓ and eiω′ℓk⋅x is derived. This derivation is based on the corresponding generalization error for the approximation of the function values f(x)⁠. The generalization error turns out to be smaller than the estimate ∥f^∥2L1(Rd)/(KL) of the generalization error for random Fourier features, with one hidden layer and the same total number of nodes KL⁠, in the case of the L∞-norm of f is much less than the L1-norm of its Fourier transform f^⁠. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.
Original languageEnglish (US)
JournalIMA Journal of Numerical Analysis
DOIs
StatePublished - Sep 12 2022

ASJC Scopus subject areas

  • Computational Mathematics
  • Applied Mathematics
  • General Mathematics

Fingerprint

Dive into the research topics of 'Smaller generalization error derived for a deep residual neural network compared with shallow networks'. Together they form a unique fingerprint.

Cite this