TY - JOUR
T1 - Assembly of finite element methods on graphics processors
AU - Cecka, Cris
AU - Lew, Adrian J.
AU - Darve, E.
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work was partially supported by a research grant from the Academic Excellence Alliance program between King Abdullah University of Science and Technology (KAUST) and the Stanford University.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.
PY - 2010/8/23
Y1 - 2010/8/23
N2 - Recently, graphics processing units (GPUs) have had great success in accelerating many numerical computations. We present their application to computations on unstructured meshes such as those in finite element methods. Multiple approaches in assembling and solving sparse linear systems with NVIDIA GPUs and the Compute Unified Device Architecture (CUDA) are created and analyzed. Multiple strategies for efficient use of global, shared, and local memory, methods to achieve memory coalescing, and optimal choice of parameters are introduced. We find that with appropriate preprocessing and arrangement of support data, the GPU coprocessor using single-precision arithmetic achieves speedups of 30 or more in comparison to a well optimized double-precision single core implementation. We also find that the optimal assembly strategy depends on the order of polynomials used in the finite element discretization. © 2010 John Wiley & Sons, Ltd.
AB - Recently, graphics processing units (GPUs) have had great success in accelerating many numerical computations. We present their application to computations on unstructured meshes such as those in finite element methods. Multiple approaches in assembling and solving sparse linear systems with NVIDIA GPUs and the Compute Unified Device Architecture (CUDA) are created and analyzed. Multiple strategies for efficient use of global, shared, and local memory, methods to achieve memory coalescing, and optimal choice of parameters are introduced. We find that with appropriate preprocessing and arrangement of support data, the GPU coprocessor using single-precision arithmetic achieves speedups of 30 or more in comparison to a well optimized double-precision single core implementation. We also find that the optimal assembly strategy depends on the order of polynomials used in the finite element discretization. © 2010 John Wiley & Sons, Ltd.
UR - http://hdl.handle.net/10754/597607
UR - http://doi.wiley.com/10.1002/nme.2989
UR - http://www.scopus.com/inward/record.url?scp=78650691046&partnerID=8YFLogxK
U2 - 10.1002/nme.2989
DO - 10.1002/nme.2989
M3 - Article
SN - 0029-5981
VL - 85
SP - 640
EP - 669
JO - International Journal for Numerical Methods in Engineering
JF - International Journal for Numerical Methods in Engineering
IS - 5
ER -