TY - JOUR
T1 - Hierarchical algorithms on hierarchical architectures
AU - Keyes, David E.
AU - Ltaief, Hatem
AU - Turkiyyah, G.
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: Many projects have contributed to HiCMA over the period 2015–2019. For results contributing to this paper, the authors acknowledge the contributions of doctoral students Ahmad Abdelfattah, Wajih Boukaram, and Ali Charara and postdoctoral fellows Sameh Abdulah, Kadir Akbudak and Alexander Mikhalev. For computer time, this research used the resources of the Supercomputing Laboratory at King Abdullah University of Science & Technology
PY - 2020/1/20
Y1 - 2020/1/20
N2 - A traditional goal of algorithmic optimality, squeezing out flops, has been superseded by evolution in architecture. Flops no longer serve as a reasonable proxy for all aspects of complexity. Instead, algorithms must now squeeze memory, data transfers, and synchronizations, while extra flops on locally cached data represent only small costs in time and energy. Hierarchically low-rank matrices realize a rarely achieved combination of optimal storage complexity and high-computational intensity for a wide class of formally dense linear operators that arise in applications for which exascale computers are being constructed. They may be regarded as algebraic generalizations of the fast multipole method. Methods based on these hierarchical data structures and their simpler cousins, tile low-rank matrices, are well proportioned for early exascale computer architectures, which are provisioned for high processing power relative to memory capacity and memory bandwidth. They are ushering in a renaissance of computational linear algebra. A challenge is that emerging hardware architecture possesses hierarchies of its own that do not generally align with those of the algorithm. We describe modules of a software toolkit, hierarchical computations on manycore architectures, that illustrate these features and are intended as building blocks of applications, such as matrix-free higher-order methods in optimization and large-scale spatial statistics. Some modules of this open-source project have been adopted in the software libraries of major vendors. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.
AB - A traditional goal of algorithmic optimality, squeezing out flops, has been superseded by evolution in architecture. Flops no longer serve as a reasonable proxy for all aspects of complexity. Instead, algorithms must now squeeze memory, data transfers, and synchronizations, while extra flops on locally cached data represent only small costs in time and energy. Hierarchically low-rank matrices realize a rarely achieved combination of optimal storage complexity and high-computational intensity for a wide class of formally dense linear operators that arise in applications for which exascale computers are being constructed. They may be regarded as algebraic generalizations of the fast multipole method. Methods based on these hierarchical data structures and their simpler cousins, tile low-rank matrices, are well proportioned for early exascale computer architectures, which are provisioned for high processing power relative to memory capacity and memory bandwidth. They are ushering in a renaissance of computational linear algebra. A challenge is that emerging hardware architecture possesses hierarchies of its own that do not generally align with those of the algorithm. We describe modules of a software toolkit, hierarchical computations on manycore architectures, that illustrate these features and are intended as building blocks of applications, such as matrix-free higher-order methods in optimization and large-scale spatial statistics. Some modules of this open-source project have been adopted in the software libraries of major vendors. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.
UR - http://hdl.handle.net/10754/661111
UR - https://royalsocietypublishing.org/doi/10.1098/rsta.2019.0055
UR - http://www.scopus.com/inward/record.url?scp=85078003864&partnerID=8YFLogxK
U2 - 10.1098/rsta.2019.0055
DO - 10.1098/rsta.2019.0055
M3 - Article
C2 - 31955677
SN - 1364-503X
VL - 378
SP - 20190055
JO - Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
JF - Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
IS - 2166
ER -