TY - GEN
T1 - Scalable fast multipole methods for vortex element methods
AU - Hu, Qi
AU - Gumerov, Nail A.
AU - Yokota, Rio
AU - Barba, Lorena A.
AU - Duraiswami, Ramani
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2012/11
Y1 - 2012/11
N2 - We use a particle-based method to simulate incompressible flows, where the Fast Multipole Method (FMM) is used to accelerate the calculation of particle interactions. The most time-consuming kernelsâ'the Biot-Savart equation and stretching term of the vorticity equationâ'are mathematically reformulated so that only two Laplace scalar potentials are used instead of six, while automatically ensuring divergence-free far-field computation. Based on this formulation, and on our previous work for a scalar heterogeneous FMM algorithm, we develop a new FMM-based vortex method capable of simulating general flows including turbulence on heterogeneous architectures, which distributes the work between multi-core CPUs and GPUs to best utilize the hardware resources and achieve excellent scalability. The algorithm also uses new data structures which can dynamically manage inter-node communication and load balance efficiently but with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff functions induced by the vortex particle method. Our implementation can perform one time step of the velocity+stretching for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s. © 2012 IEEE.
AB - We use a particle-based method to simulate incompressible flows, where the Fast Multipole Method (FMM) is used to accelerate the calculation of particle interactions. The most time-consuming kernelsâ'the Biot-Savart equation and stretching term of the vorticity equationâ'are mathematically reformulated so that only two Laplace scalar potentials are used instead of six, while automatically ensuring divergence-free far-field computation. Based on this formulation, and on our previous work for a scalar heterogeneous FMM algorithm, we develop a new FMM-based vortex method capable of simulating general flows including turbulence on heterogeneous architectures, which distributes the work between multi-core CPUs and GPUs to best utilize the hardware resources and achieve excellent scalability. The algorithm also uses new data structures which can dynamically manage inter-node communication and load balance efficiently but with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff functions induced by the vortex particle method. Our implementation can perform one time step of the velocity+stretching for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s. © 2012 IEEE.
UR - http://hdl.handle.net/10754/564623
UR - http://ieeexplore.ieee.org/document/6496004/
UR - http://www.scopus.com/inward/record.url?scp=84876573131&partnerID=8YFLogxK
U2 - 10.1109/SC.Companion.2012.221
DO - 10.1109/SC.Companion.2012.221
M3 - Conference contribution
SN - 9780769549569
SP - 1408
EP - 1409
BT - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -