TY - GEN
T1 - GPU-Based Low-Precision Detection Approach for Massive MIMO Systems
AU - Dabah, Adel
AU - Ltaief, Hatem
AU - Rezki, Zouheir
AU - Alouini, Mohamed-Slim
AU - Keyes, David E.
N1 - KAUST Repository Item: Exported on 2023-06-19
PY - 2023/5/10
Y1 - 2023/5/10
N2 - Massive Multiple-Input Multiple-Output (M-MIMO) uses hundreds of antennas in mobile communications base stations to increase the amount of transmitted data and the number of connected devices in 5G and beyond. However, M-MIMO systems increase the complexity of recovering the transmitted data (detection phase). To address this challenge, we leverage low-precision arithmetic in recent NVIDIA GPUs to improve the latency/scalability/accuracy of M-MIMO detection. We propose a GPU tree-based detection algorithm that aggregates multiple tree levels and formulates the computation as a matrix multiplication operation followed by a square-norm calculation and sorting (reduction) phase. This process is repeated until reaching the last level of the detection tree. The obtained results show near-optimal data detection with a 10 × speedup compared to a two-socket 28-core IceLake CPU implementation. We further deploy low-precision arithmetic operations. We show that moving from single-precision 32-bit floating-point arithmetic (FP32) to half-precision 16-bit representation (FP16) does not affect the accuracy performance while translating into an additional 1.7 × speedup. In addition, exploiting 8-bit integer representation results in an acceptable error rate degradation that can be compensated by increasing the number of aggregated levels. In addition, we propose a multi-GPU version that computes the matrix-multiplication operation of subsequent iterations in parallel. This latter operation represents more than 80% of the elapsed time for dense constellations. Results with four A100 GPUs show an additional 2.3 × relative speedup compared to our single GPU version. The achieved accuracy/scalability balance may accelerate the deployment of this technology and promote low-precision GPU computations within the wireless communication community.
AB - Massive Multiple-Input Multiple-Output (M-MIMO) uses hundreds of antennas in mobile communications base stations to increase the amount of transmitted data and the number of connected devices in 5G and beyond. However, M-MIMO systems increase the complexity of recovering the transmitted data (detection phase). To address this challenge, we leverage low-precision arithmetic in recent NVIDIA GPUs to improve the latency/scalability/accuracy of M-MIMO detection. We propose a GPU tree-based detection algorithm that aggregates multiple tree levels and formulates the computation as a matrix multiplication operation followed by a square-norm calculation and sorting (reduction) phase. This process is repeated until reaching the last level of the detection tree. The obtained results show near-optimal data detection with a 10 × speedup compared to a two-socket 28-core IceLake CPU implementation. We further deploy low-precision arithmetic operations. We show that moving from single-precision 32-bit floating-point arithmetic (FP32) to half-precision 16-bit representation (FP16) does not affect the accuracy performance while translating into an additional 1.7 × speedup. In addition, exploiting 8-bit integer representation results in an acceptable error rate degradation that can be compensated by increasing the number of aggregated levels. In addition, we propose a multi-GPU version that computes the matrix-multiplication operation of subsequent iterations in parallel. This latter operation represents more than 80% of the elapsed time for dense constellations. Results with four A100 GPUs show an additional 2.3 × relative speedup compared to our single GPU version. The achieved accuracy/scalability balance may accelerate the deployment of this technology and promote low-precision GPU computations within the wireless communication community.
UR - http://hdl.handle.net/10754/692645
UR - https://link.springer.com/10.1007/978-3-031-32041-5_8
UR - http://www.scopus.com/inward/record.url?scp=85161161231&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-32041-5_8
DO - 10.1007/978-3-031-32041-5_8
M3 - Conference contribution
SN - 9783031320408
SP - 144
EP - 163
BT - Lecture Notes in Computer Science
PB - Springer Nature Switzerland
ER -