TY - GEN
T1 - Scaling the “Memory Wall” for Multi-Dimensional Seismic Processing with Algebraic Compression on Cerebras CS-2 Systems
AU - Ltaief, Hatem
AU - Hong, Yuxi
AU - Wilson, Leighton
AU - Jacquelin, Mathias
AU - Ravasi, Matteo
AU - Keyes, David E.
N1 - KAUST Repository Item: Exported on 2023-09-12
Acknowledgements: For computer time, this research used the resources of the Ibex NVIDIA GPU cluster of the Supercomputing Laboratory (KSL) at King Abdullah University of Science and Technology (KAUST) in Thuwal, Saudi Arabia and the Condor Galaxy-1 CS-2 cluster provided by G42.
PY - 2023/9/11
Y1 - 2023/9/11
N2 - We exploit the high memory bandwidth of AIcustomized Cerebras CS-2 systems for seismic processing. By leveraging low-rank matrix approximation, we fit memoryhungry seismic applications onto memory-austere SRAM waferscale hardware, thus addressing a challenge arising in many wave-equation-based algorithms that rely on Multi-Dimensional Convolution (MDC) operators. Exploiting sparsity inherent in seismic data in the frequency domain, we implement embarrassingly parallel tile low-rank matrix-vector multiplications (TLRMVM), which account for most of the elapsed time in MDC operations, to successfully solve the Multi-Dimensional Deconvolution (MDD) inverse problem. By reducing memory footprint along with arithmetic complexity, we fit a standard seismic benchmark dataset into the small local memories of Cerebras processing elements. Deploying TLR-MVM execution onto 48 CS-2 systems in support of MDD gives a sustained memory bandwidth of 92.58PB/s on 35, 784, 000 processing elements, a significant milestone that highlights the capabilities of AIcustomized architectures to enable a new generation of seismic algorithms that will empower multiple technologies of our lowcarbon future.
AB - We exploit the high memory bandwidth of AIcustomized Cerebras CS-2 systems for seismic processing. By leveraging low-rank matrix approximation, we fit memoryhungry seismic applications onto memory-austere SRAM waferscale hardware, thus addressing a challenge arising in many wave-equation-based algorithms that rely on Multi-Dimensional Convolution (MDC) operators. Exploiting sparsity inherent in seismic data in the frequency domain, we implement embarrassingly parallel tile low-rank matrix-vector multiplications (TLRMVM), which account for most of the elapsed time in MDC operations, to successfully solve the Multi-Dimensional Deconvolution (MDD) inverse problem. By reducing memory footprint along with arithmetic complexity, we fit a standard seismic benchmark dataset into the small local memories of Cerebras processing elements. Deploying TLR-MVM execution onto 48 CS-2 systems in support of MDD gives a sustained memory bandwidth of 92.58PB/s on 35, 784, 000 processing elements, a significant milestone that highlights the capabilities of AIcustomized architectures to enable a new generation of seismic algorithms that will empower multiple technologies of our lowcarbon future.
UR - http://hdl.handle.net/10754/694388
M3 - Conference contribution
BT - ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'23)
PB - ACM/IEEE
ER -