TY - GEN
T1 - Optimization Techniques for Dimensionally Truncated Sparse Grids on Heterogeneous Systems
AU - Deftu, A.
AU - Murarasu, A.
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): UK-C0020
Acknowledgements: This publication is based on work supported by Award No.UK-C0020, made by King Abdullah University of Science andTechnology (KAUST).
This publication acknowledges KAUST support, but has no KAUST affiliated authors.
PY - 2013/2
Y1 - 2013/2
N2 - Given the existing heterogeneous processor landscape dominated by CPUs and GPUs, topics such as programming productivity and performance portability have become increasingly important. In this context, an important question refers to how can we develop optimization strategies that cover both CPUs and GPUs. We answer this for fastsg, a library that provides functionality for handling efficiently high-dimensional functions. As it can be employed for compressing and decompressing large-scale simulation data, it finds itself at the core of a computational steering application which serves us as test case. We describe our experience with implementing fastsg's time critical routines for Intel CPUs and Nvidia Fermi GPUs. We show the differences and especially the similarities between our optimization strategies for the two architectures. With regard to our test case for which achieving high speedups is a "must" for real-time visualization, we report a speedup of up to 6.2x times compared to the state-of-the-art implementation of the sparse grid technique for GPUs. © 2013 IEEE.
AB - Given the existing heterogeneous processor landscape dominated by CPUs and GPUs, topics such as programming productivity and performance portability have become increasingly important. In this context, an important question refers to how can we develop optimization strategies that cover both CPUs and GPUs. We answer this for fastsg, a library that provides functionality for handling efficiently high-dimensional functions. As it can be employed for compressing and decompressing large-scale simulation data, it finds itself at the core of a computational steering application which serves us as test case. We describe our experience with implementing fastsg's time critical routines for Intel CPUs and Nvidia Fermi GPUs. We show the differences and especially the similarities between our optimization strategies for the two architectures. With regard to our test case for which achieving high speedups is a "must" for real-time visualization, we report a speedup of up to 6.2x times compared to the state-of-the-art implementation of the sparse grid technique for GPUs. © 2013 IEEE.
UR - http://hdl.handle.net/10754/599103
UR - http://ieeexplore.ieee.org/document/6498575/
UR - http://www.scopus.com/inward/record.url?scp=84877641577&partnerID=8YFLogxK
U2 - 10.1109/PDP.2013.57
DO - 10.1109/PDP.2013.57
M3 - Conference contribution
SN - 9781467353212
SP - 351
EP - 358
BT - 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -