TY - JOUR
T1 - Training-Free Stuck-at Fault Mitigation for ReRAM-based Deep Learning Accelerators
AU - Quan, Chenghao
AU - Fouda, Mohamed E.
AU - Lee, Sugil
AU - Jung, Giju
AU - Lee, Jongeun
AU - Eltawil, Ahmed
AU - Kurdahi, Fadi
N1 - KAUST Repository Item: Exported on 2022-11-18
Acknowledgements: This work was supported by IITP grants (No. 2020-0-01336, Artificial Intelligence Graduate School Program (UNIST) and IITP-2021-0-02052, ITRC support program) and NRF grant (No. 2020R1A2C2015066) funded by MSIT of Korea, and by Free Innovative Research Fund of UNIST (1.170067.01). The EDA tool was supported by the IC Design Education Center (IDEC), Korea.
PY - 2022/11/15
Y1 - 2022/11/15
N2 - Although Resistive RAMs can support highly efficient matrix-vector multiplication, which is very useful for machine learning and other applications, the non-ideal behavior of hardware such as stuck-at fault and IR drop is an important concern in making ReRAM crossbar array-based deep learning accelerators. Previous work has addressed the nonideality problem through either redundancy in hardware, which requires a permanent increase of hardware cost, or software retraining, which may be even more costly or unacceptable due to its need for a training dataset as well as high computation overhead. In this paper we propose a very light-weight method that can be applied on top of existing hardware or software solutions. Our method, called FPT (Forward-Parameter Tuning), takes advantage of a certain statistical property existing in the activation data of neural network layers, and can mitigate the impact of mild nonidealities in ReRAM crossbar arrays for deep learning applications without using any hardware, a dataset, or gradientbased training. Our experimental results using MNIST, CIFAR-10, CIFAR-100, and ImageNet datasets in binary and multibit networks demonstrate that our technique is very effective, both alone and together with previous methods, up to 20rate, which is higher than even some of the previous remapping methods. We also evaluate our method in the presence of other nonidealities such as variability and IR drop. Further, we provide an analysis based on the concept of effective fault rate, which not only demonstrates that effective fault rate can be a useful tool to predict the accuracy of faulty RCA-based neural networks, but also explains why mitigating the SAF problem is more difficult with multi-bit neural networks.
AB - Although Resistive RAMs can support highly efficient matrix-vector multiplication, which is very useful for machine learning and other applications, the non-ideal behavior of hardware such as stuck-at fault and IR drop is an important concern in making ReRAM crossbar array-based deep learning accelerators. Previous work has addressed the nonideality problem through either redundancy in hardware, which requires a permanent increase of hardware cost, or software retraining, which may be even more costly or unacceptable due to its need for a training dataset as well as high computation overhead. In this paper we propose a very light-weight method that can be applied on top of existing hardware or software solutions. Our method, called FPT (Forward-Parameter Tuning), takes advantage of a certain statistical property existing in the activation data of neural network layers, and can mitigate the impact of mild nonidealities in ReRAM crossbar arrays for deep learning applications without using any hardware, a dataset, or gradientbased training. Our experimental results using MNIST, CIFAR-10, CIFAR-100, and ImageNet datasets in binary and multibit networks demonstrate that our technique is very effective, both alone and together with previous methods, up to 20rate, which is higher than even some of the previous remapping methods. We also evaluate our method in the presence of other nonidealities such as variability and IR drop. Further, we provide an analysis based on the concept of effective fault rate, which not only demonstrates that effective fault rate can be a useful tool to predict the accuracy of faulty RCA-based neural networks, but also explains why mitigating the SAF problem is more difficult with multi-bit neural networks.
UR - http://hdl.handle.net/10754/685828
UR - https://ieeexplore.ieee.org/document/9951395/
U2 - 10.1109/tcad.2022.3222288
DO - 10.1109/tcad.2022.3222288
M3 - Article
SN - 0278-0070
SP - 1
EP - 1
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
ER -