TY - GEN
T1 - The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization
AU - Csordás, Róbert
AU - Irie, Kazuki
AU - Schmidhuber, Juergen
N1 - KAUST Repository Item: Exported on 2022-12-21
Acknowledgements: We thank Imanol Schlag and Sjoerd van Steenkiste for helpful discussions and suggestions on an earlier version of the manuscript. This research was partially funded by ERC Advanced grant no: 742870, project AlgoRNN, and by Swiss National Science Foundation grant no: 200021 192356, project NEUSYM. We are thankful for hardware donations from NVIDIA & IBM. The resources usedfor the project were partially provided by Swiss National Supercomputing Centre (CSCS) project s1023.
PY - 2022/5/5
Y1 - 2022/5/5
N2 - Despite successes across a broad range of applications, Transformers have limited success in systematic generalization. The situation is especially frustrating in the case of algorithmic tasks, where they often fail to find intuitive solutions that route relevant information to the right node/operation at the right time in the grid represented by Transformer columns. To facilitate the learning of useful control flow, we propose two modifications to the Transformer architecture, copy gate and geometric attention. Our novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the classic compositional table lookup task, as well as near-perfect accuracy on the simple arithmetic task and a new variant of ListOps testing for generalization across computational depth. NDR's attention and gating patterns tend to be interpretable as an intuitive form of neural routing. Our code is public.
AB - Despite successes across a broad range of applications, Transformers have limited success in systematic generalization. The situation is especially frustrating in the case of algorithmic tasks, where they often fail to find intuitive solutions that route relevant information to the right node/operation at the right time in the grid represented by Transformer columns. To facilitate the learning of useful control flow, we propose two modifications to the Transformer architecture, copy gate and geometric attention. Our novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the classic compositional table lookup task, as well as near-perfect accuracy on the simple arithmetic task and a new variant of ListOps testing for generalization across computational depth. NDR's attention and gating patterns tend to be interpretable as an intuitive form of neural routing. Our code is public.
UR - http://hdl.handle.net/10754/672918
UR - https://arxiv.org/pdf/2110.07732.pdf
M3 - Conference contribution
BT - The Tenth International Conference on Learning Representations ICLR 2022
PB - arXiv
ER -