TY - CHAP
T1 - Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions
AU - AbdulJabbar, Mustafa Abdulmajeed
AU - Markomanolis, Georgios
AU - Ibeid, Huda
AU - Yokota, Rio
AU - Keyes, David E.
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work was supported by JSPS KAKENHI Grant-in-Aid for Young Scientists A Grant Number 16H05859. This work is partially supported by “Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures” and “High Performance Computing Infrastructure” in Japan. The authors are grateful to the KAUST Supercomputing Laboratory for the use of the Shaheen XC40 system.
PY - 2017/5/12
Y1 - 2017/5/12
N2 - Reduction of communication and efficient partitioning are key issues for achieving scalability in hierarchical N-Body algorithms like Fast Multipole Method (FMM). In the present work, we propose three independent strategies to improve partitioning and reduce communication. First, we show that the conventional wisdom of using space-filling curve partitioning may not work well for boundary integral problems, which constitute a significant portion of FMM’s application user base. We propose an alternative method that modifies orthogonal recursive bisection to relieve the cell-partition misalignment that has kept it from scaling previously. Secondly, we optimize the granularity of communication to find the optimal balance between a bulk-synchronous collective communication of the local essential tree and an RDMA per task per cell. Finally, we take the dynamic sparse data exchange proposed by Hoefler et al. [1] and extend it to a hierarchical sparse data exchange, which is demonstrated at scale to be faster than the MPI library’s MPI_Alltoallv that is commonly used.
AB - Reduction of communication and efficient partitioning are key issues for achieving scalability in hierarchical N-Body algorithms like Fast Multipole Method (FMM). In the present work, we propose three independent strategies to improve partitioning and reduce communication. First, we show that the conventional wisdom of using space-filling curve partitioning may not work well for boundary integral problems, which constitute a significant portion of FMM’s application user base. We propose an alternative method that modifies orthogonal recursive bisection to relieve the cell-partition misalignment that has kept it from scaling previously. Secondly, we optimize the granularity of communication to find the optimal balance between a bulk-synchronous collective communication of the local essential tree and an RDMA per task per cell. Finally, we take the dynamic sparse data exchange proposed by Hoefler et al. [1] and extend it to a hierarchical sparse data exchange, which is demonstrated at scale to be faster than the MPI library’s MPI_Alltoallv that is commonly used.
UR - http://hdl.handle.net/10754/624979
UR - http://link.springer.com/chapter/10.1007/978-3-319-58667-0_5
UR - http://www.scopus.com/inward/record.url?scp=85026253623&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-58667-0_5
DO - 10.1007/978-3-319-58667-0_5
M3 - Chapter
SN - 9783319586663
SP - 79
EP - 96
BT - Lecture Notes in Computer Science
PB - Springer Nature
ER -