TY - GEN
T1 - Comparative study of one-sided factorizations with multiple software packages on multi-core hardware
AU - Agullo, Emmanuel
AU - Hadri, Bilel
AU - Ltaief, Hatem
AU - Dongarrra, Jack
PY - 2009
Y1 - 2009
N2 - The emergence and continuing use of multi-core architectures require changes in the existing software and sometimes even a redesign of the established algorithms in order to take advantage of now prevailing parallelism. The Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) is a project that aims to achieve both high performance and portability across a wide range of multi-core architectures. We present in this paper a comparative study of PLASMA's performance against established linear algebra packages (LAPACK and ScaLAPACK), against new approaches at parallel execution (Task Based Linear Algebra Subroutines - TBLAS), and against equivalent commercial software offerings (MKL, ESSL and PESSL). Our experiments were conducted on one-sided linear algebra factorizations (LU, QR and Cholesky) and used multi-core architectures (based on Intel Xeon EMT64 and IBM Power6). A performance improvement of 67% was for instance obtained on the Cholesky factorization of a matrix of order 4000, using 32 cores.
AB - The emergence and continuing use of multi-core architectures require changes in the existing software and sometimes even a redesign of the established algorithms in order to take advantage of now prevailing parallelism. The Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) is a project that aims to achieve both high performance and portability across a wide range of multi-core architectures. We present in this paper a comparative study of PLASMA's performance against established linear algebra packages (LAPACK and ScaLAPACK), against new approaches at parallel execution (Task Based Linear Algebra Subroutines - TBLAS), and against equivalent commercial software offerings (MKL, ESSL and PESSL). Our experiments were conducted on one-sided linear algebra factorizations (LU, QR and Cholesky) and used multi-core architectures (based on Intel Xeon EMT64 and IBM Power6). A performance improvement of 67% was for instance obtained on the Cholesky factorization of a matrix of order 4000, using 32 cores.
UR - http://www.scopus.com/inward/record.url?scp=74049090446&partnerID=8YFLogxK
U2 - 10.1145/1654059.1654080
DO - 10.1145/1654059.1654080
M3 - Conference contribution
AN - SCOPUS:74049090446
SN - 9781605587448
T3 - Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09
BT - Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09
T2 - Conference on High Performance Computing Networking, Storage and Analysis, SC '09
Y2 - 14 November 2009 through 20 November 2009
ER -