TY - JOUR
T1 - Supervised Local Training with Backward Links for Deep Neural Networks
AU - Guo, Wenzhe
AU - Fouda, Mohamed E.
AU - Eltawil, Ahmed
AU - Salama, Khaled N.
N1 - KAUST Repository Item: Exported on 2023-03-06
Acknowledgements: This work was funded by the King Abdullah University of Science and Technology (KAUST) AI Initiative, Saudi Arabia.
PY - 2023/3/2
Y1 - 2023/3/2
N2 - The restricted training pattern in the standard BP requires end-to-end error propagation, causing large memory costs and prohibiting model parallelization. Existing local training methods aim to resolve the training obstacles by completely cutting off the backward path between modules and isolating their gradients. These methods prevent information exchange between modules and result in inferior performance. This work proposes a novel local training algorithm, BackLink, which introduces inter-module backward dependency and facilitates information to flow backward along with the network. To preserve the computational advantage of local training, BackLink restricts the error propagation length within the module. Extensive experiments performed in various deep convolutional neural networks demonstrate that our method consistently improves the classification performance of local training algorithms over other methods. For example, our method can surpass the conventional greedy local training method by 6.45% in accuracy in ResNet32 classifying CIFAR100 and recent work by 2.58% in ResNet110 classifying STL-10 with much lower complexity, respectively. Analysis of computational costs reveals that small overheads are incurred in GPU memory costs and runtime on multiple GPUs. Our method can lead up to a 79% reduction in memory cost and 52% in simulation runtime in ResNet110 compared to the standard BP. Therefore, our method could create new opportunities for improving training algorithms towards better efficiency for real-time learning applications.
AB - The restricted training pattern in the standard BP requires end-to-end error propagation, causing large memory costs and prohibiting model parallelization. Existing local training methods aim to resolve the training obstacles by completely cutting off the backward path between modules and isolating their gradients. These methods prevent information exchange between modules and result in inferior performance. This work proposes a novel local training algorithm, BackLink, which introduces inter-module backward dependency and facilitates information to flow backward along with the network. To preserve the computational advantage of local training, BackLink restricts the error propagation length within the module. Extensive experiments performed in various deep convolutional neural networks demonstrate that our method consistently improves the classification performance of local training algorithms over other methods. For example, our method can surpass the conventional greedy local training method by 6.45% in accuracy in ResNet32 classifying CIFAR100 and recent work by 2.58% in ResNet110 classifying STL-10 with much lower complexity, respectively. Analysis of computational costs reveals that small overheads are incurred in GPU memory costs and runtime on multiple GPUs. Our method can lead up to a 79% reduction in memory cost and 52% in simulation runtime in ResNet110 compared to the standard BP. Therefore, our method could create new opportunities for improving training algorithms towards better efficiency for real-time learning applications.
UR - http://hdl.handle.net/10754/690008
UR - https://ieeexplore.ieee.org/document/10058021/
U2 - 10.1109/tai.2023.3251313
DO - 10.1109/tai.2023.3251313
M3 - Article
SN - 2691-4581
SP - 1
EP - 14
JO - IEEE Transactions on Artificial Intelligence
JF - IEEE Transactions on Artificial Intelligence
ER -