TY - GEN
T1 - High-Performance SVD Partial Spectrum Computation
AU - Keyes, David
AU - Ltaief, Hatem
AU - Nakatsukasa, Yuji
AU - Sukkari, Dalal
N1 - Publisher Copyright:
© 2023 Owner/Author(s).
PY - 2023/11/12
Y1 - 2023/11/12
N2 - We introduce a new singular value decomposition (SVD) solver based on the QR-based Dynamically Weighted Halley (QDWH) algorithm for computing the partial spectrum SVD (QDWHpartial-SVD) problems. By optimizing the rational function underlying the algorithms in the desired part of the spectrum only, the QDWHpartial-SVD algorithm efficiently computes a fraction (say 1 - 20%) of the leading singular values/vectors. We develop a high-performance implementation of QDWHpartial-SVD 1 on distributed-memory manycore systems and demonstrate its numerical robustness. We perform a benchmarking campaign against counterparts from the state-of-the-art numerical libraries across various matrix sizes using up to 36K MPI processes. Experimental results show performance speedups for QDWHpartial-SVD up to 6X and 2X against vendor-optimized PDGESVD from ScaLAPACK and KSVD on a Cray XC40 system using 1152 nodes based on two-socket 16-core Intel Haswell CPU, respectively. We also port our QDWHpartial-SVD software library to a system composed of 256 nodes with two-socket 64-Core AMD EPYC Milan CPU and achieve performance speedup up to 4X compared to vendor-optimized PDGESVD from ScaLAPACK. We also compare energy consumption for the two algorithms and demonstrate how QDWHpartial-SVD can further outperform PDGESVD in that regard by performing fewer memory-bound operations.
AB - We introduce a new singular value decomposition (SVD) solver based on the QR-based Dynamically Weighted Halley (QDWH) algorithm for computing the partial spectrum SVD (QDWHpartial-SVD) problems. By optimizing the rational function underlying the algorithms in the desired part of the spectrum only, the QDWHpartial-SVD algorithm efficiently computes a fraction (say 1 - 20%) of the leading singular values/vectors. We develop a high-performance implementation of QDWHpartial-SVD 1 on distributed-memory manycore systems and demonstrate its numerical robustness. We perform a benchmarking campaign against counterparts from the state-of-the-art numerical libraries across various matrix sizes using up to 36K MPI processes. Experimental results show performance speedups for QDWHpartial-SVD up to 6X and 2X against vendor-optimized PDGESVD from ScaLAPACK and KSVD on a Cray XC40 system using 1152 nodes based on two-socket 16-core Intel Haswell CPU, respectively. We also port our QDWHpartial-SVD software library to a system composed of 256 nodes with two-socket 64-Core AMD EPYC Milan CPU and achieve performance speedup up to 4X compared to vendor-optimized PDGESVD from ScaLAPACK. We also compare energy consumption for the two algorithms and demonstrate how QDWHpartial-SVD can further outperform PDGESVD in that regard by performing fewer memory-bound operations.
KW - distributed-memory systems
KW - parallel numerical algorithms
KW - partial spectrum
KW - singular value decomposition
UR - http://www.scopus.com/inward/record.url?scp=85179548226&partnerID=8YFLogxK
U2 - 10.1145/3581784.3607109
DO - 10.1145/3581784.3607109
M3 - Conference contribution
AN - SCOPUS:85179548226
T3 - Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
BT - Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
PB - Association for Computing Machinery, Inc
T2 - 2023 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
Y2 - 12 November 2023 through 17 November 2023
ER -