TY - CHAP
T1 - Autotuning of Adaptive Mesh Refinement PDE Solvers on Shared Memory Architectures
AU - Nogina, Svetlana
AU - Unterweger, Kristof
AU - Weinzierl, Tobias
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): UK-c0020
Acknowledgements: This publication is partially based on work supportedby Award No. UK-c0020, made by the King Abdullah University of Science andTechnology (KAUST). Computing resources for the present work have also beenprovided by the Gauss Centre for Supercomputing under grant pr63no.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.
PY - 2012
Y1 - 2012
N2 - Many multithreaded, grid-based, dynamically adaptive solvers for partial differential equations permanently have to traverse subgrids (patches) of different and changing sizes. The parallel efficiency of this traversal depends on the interplay of the patch size, the architecture used, the operations triggered throughout the traversal, and the grain size, i.e. the size of the subtasks the patch is broken into. We propose an oracle mechanism delivering grain sizes on-the-fly. It takes historical runtime measurements for different patch and grain sizes as well as the traverse's operations into account, and it yields reasonable speedups. Neither magic configuration settings nor an expensive pre-tuning phase are necessary. It is an autotuning approach. © 2012 Springer-Verlag.
AB - Many multithreaded, grid-based, dynamically adaptive solvers for partial differential equations permanently have to traverse subgrids (patches) of different and changing sizes. The parallel efficiency of this traversal depends on the interplay of the patch size, the architecture used, the operations triggered throughout the traversal, and the grain size, i.e. the size of the subtasks the patch is broken into. We propose an oracle mechanism delivering grain sizes on-the-fly. It takes historical runtime measurements for different patch and grain sizes as well as the traverse's operations into account, and it yields reasonable speedups. Neither magic configuration settings nor an expensive pre-tuning phase are necessary. It is an autotuning approach. © 2012 Springer-Verlag.
UR - http://hdl.handle.net/10754/597640
UR - http://link.springer.com/10.1007/978-3-642-31464-3_68
UR - http://www.scopus.com/inward/record.url?scp=84865210278&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-31464-3_68
DO - 10.1007/978-3-642-31464-3_68
M3 - Chapter
SN - 9783642314636
SP - 671
EP - 680
BT - Parallel Processing and Applied Mathematics
PB - Springer Nature
ER -