TY - JOUR
T1 - Power profiling of Cholesky and QR factorizations on distributed memory systems
AU - Bosilca, George
AU - Ltaief, Hatem
AU - Dongarra, Jack
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2012/8/30
Y1 - 2012/8/30
N2 - This paper presents the power profile of two high performance dense linear algebra libraries on distributed memory systems, ScaLAPACK and DPLASMA. From the algorithmic perspective, their methodologies are opposite. The former is based on block algorithms and relies on multithreaded BLAS and a two-dimensional block cyclic data distribution to achieve high parallel performance. The latter is based on tile algorithms running on top of a tile data layout and uses fine-grained task parallelism combined with a dynamic distributed scheduler (DAGuE) to leverage distributed memory systems. We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely Cholesky and QR. The reported numbers show that DPLASMA surpasses ScaLAPACK not only in terms of performance (up to 2X speedup) but also in terms of energy efficiency (up to 62 %). © 2012 Springer-Verlag (outside the USA).
AB - This paper presents the power profile of two high performance dense linear algebra libraries on distributed memory systems, ScaLAPACK and DPLASMA. From the algorithmic perspective, their methodologies are opposite. The former is based on block algorithms and relies on multithreaded BLAS and a two-dimensional block cyclic data distribution to achieve high parallel performance. The latter is based on tile algorithms running on top of a tile data layout and uses fine-grained task parallelism combined with a dynamic distributed scheduler (DAGuE) to leverage distributed memory systems. We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely Cholesky and QR. The reported numbers show that DPLASMA surpasses ScaLAPACK not only in terms of performance (up to 2X speedup) but also in terms of energy efficiency (up to 62 %). © 2012 Springer-Verlag (outside the USA).
UR - http://hdl.handle.net/10754/562284
UR - http://link.springer.com/10.1007/s00450-012-0224-2
UR - http://www.scopus.com/inward/record.url?scp=84899618047&partnerID=8YFLogxK
U2 - 10.1007/s00450-012-0224-2
DO - 10.1007/s00450-012-0224-2
M3 - Article
SN - 1865-2034
VL - 29
SP - 139
EP - 147
JO - Computer Science - Research and Development
JF - Computer Science - Research and Development
IS - 2
ER -