Accelerating turbulent reacting flow simulations on many-core/GPUs using matrix-based kinetics

Harshavardhana Ashoka Uranakara, Shivam Barwey, Francisco Hernandez Perez, Vijayamanikandan Vijayarangan, Venkat Raman, Hong G. Im

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The present work assesses the impact - in terms of time to solution, throughput analysis, and hardware scalability - of transferring computationally intensive tasks, found in compressible reacting flow solvers, to the GPU. Attention is focused on outlining the workflow and data transfer penalties associated with “plugging in” a recently developed GPU-based chemistry library into (a) a purely CPU-based solver and (b) a GPU-based solver, where, except for the chemistry, all other variables are computed on the GPU. This comparison allows quantification of host-to-device (and vice versa) data transfer penalties on the overall solver speedup as a function of mesh and reaction mechanism size. To this end, a recently developed GPU-based chemistry library known as UMChemGPU is employed to treat the kinetics in the flow solver KARFS. UMChemGPU replaces conventional CPU-based Cantera routines using a matrix-based formulation. The impact of i) data transfer times, ii) chemistry acceleration, and iii) the hardware architecture is studied in detail in the context of GPU saturation limits. Hydrogen and dimethyl ether (DME) reaction mechanisms are used to assess the impact of the number of species/reactions on overall/chemistry-only speedup. It was found that offloading the source term computation to UMChemGPU results in up to 7X reduction in overall time to solution and four orders of magnitude faster source term computation compared to conventional CPU-based methods. Furthermore, the metrics for achieving maximum performance gain using GPU chemistry with an MPI + CUDA solver are explained using the Roofline model. Integrating the UMChemGPU with an MPI + OpenMP solver does not improve the overall performance due to the associated data copy time between the device (GPU) and host (CPU) memory spaces. The performance portability was demonstrated using three different GPU architectures, and the findings are expected to translate to a wide variety of high-performance codes in the combustion community.
Original languageEnglish (US)
JournalProceedings of the Combustion Institute
DOIs
StatePublished - Sep 22 2022

ASJC Scopus subject areas

  • General Chemical Engineering
  • Mechanical Engineering
  • Physical and Theoretical Chemistry

Fingerprint

Dive into the research topics of 'Accelerating turbulent reacting flow simulations on many-core/GPUs using matrix-based kinetics'. Together they form a unique fingerprint.

Cite this