Power profiling of Cholesky and QR factorizations on distributed memory systems

George Bosilca, Hatem Ltaief, Jack Dongarra

Research output: Contribution to journalArticlepeer-review

9 Scopus citations


This paper presents the power profile of two high performance dense linear algebra libraries on distributed memory systems, ScaLAPACK and DPLASMA. From the algorithmic perspective, their methodologies are opposite. The former is based on block algorithms and relies on multithreaded BLAS and a two-dimensional block cyclic data distribution to achieve high parallel performance. The latter is based on tile algorithms running on top of a tile data layout and uses fine-grained task parallelism combined with a dynamic distributed scheduler (DAGuE) to leverage distributed memory systems. We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely Cholesky and QR. The reported numbers show that DPLASMA surpasses ScaLAPACK not only in terms of performance (up to 2X speedup) but also in terms of energy efficiency (up to 62 %). © 2012 Springer-Verlag (outside the USA).
Original languageEnglish (US)
Pages (from-to)139-147
Number of pages9
JournalComputer Science - Research and Development
Issue number2
StatePublished - Aug 30 2012

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'Power profiling of Cholesky and QR factorizations on distributed memory systems'. Together they form a unique fingerprint.

Cite this