TY - GEN
T1 - Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architectures
AU - Dongarra, Jack
AU - Ltaief, Hatem
AU - Luszczek, Piotr R.
AU - Weaver, Vincent M.
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2012/11
Y1 - 2012/11
N2 - We propose to study the impact on the energy footprint of two advanced algorithmic strategies in the context of high performance dense linear algebra libraries: (1) mixed precision algorithms with iterative refinement allow to run at the peak performance of single precision floating-point arithmetic while achieving double precision accuracy and (2) tree reduction technique exposes more parallelism when factorizing tall and skinny matrices for solving over determined systems of linear equations or calculating the singular value decomposition. Integrated within the PLASMA library using tile algorithms, which will eventually supersede the block algorithms from LAPACK, both strategies further excel in performance in the presence of a dynamic task scheduler while targeting multicore architecture. Energy consumption measurements are reported along with parallel performance numbers on a dual-socket quad-core Intel Xeon as well as a quad-socket quad-core Intel Sandy Bridge chip, both providing component-based energy monitoring at all levels of the system, through the Power Pack framework and the Running Average Power Limit model, respectively. © 2012 IEEE.
AB - We propose to study the impact on the energy footprint of two advanced algorithmic strategies in the context of high performance dense linear algebra libraries: (1) mixed precision algorithms with iterative refinement allow to run at the peak performance of single precision floating-point arithmetic while achieving double precision accuracy and (2) tree reduction technique exposes more parallelism when factorizing tall and skinny matrices for solving over determined systems of linear equations or calculating the singular value decomposition. Integrated within the PLASMA library using tile algorithms, which will eventually supersede the block algorithms from LAPACK, both strategies further excel in performance in the presence of a dynamic task scheduler while targeting multicore architecture. Energy consumption measurements are reported along with parallel performance numbers on a dual-socket quad-core Intel Xeon as well as a quad-socket quad-core Intel Sandy Bridge chip, both providing component-based energy monitoring at all levels of the system, through the Power Pack framework and the Running Average Power Limit model, respectively. © 2012 IEEE.
UR - http://hdl.handle.net/10754/575808
UR - http://ieeexplore.ieee.org/document/6382829/
UR - http://www.scopus.com/inward/record.url?scp=84874590024&partnerID=8YFLogxK
U2 - 10.1109/CGC.2012.113
DO - 10.1109/CGC.2012.113
M3 - Conference contribution
SN - 9780769548647
SP - 274
EP - 281
BT - 2012 Second International Conference on Cloud and Green Computing
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -