TY - JOUR
T1 - Topology-oblivious optimization of MPI broadcast algorithms on extreme-scale platforms
AU - Hasanov, Khalid
AU - Quintin, Jean-Noël
AU - Lastovetsky, Alexey
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work has emanated from research conducted with the financial support of IRCSET (Irish Research Council for Science, Engineering and Technology) and IBM, Grant No. EPSPG/2011/188 and Science Foundation Ireland, Grant No. 08/IN.1/I2054.Some of the experiments presented in this publication were carried out using the Grid’5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://www.grid5000.fr).Another part of the experiments were carried out using the resources of the Supercomputing Laboratory at King Abdullah University of Science&Technology (KAUST) in Thuwal, Saudi Arabia.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.
PY - 2015/11
Y1 - 2015/11
N2 - © 2015 Elsevier B.V. All rights reserved. Significant research has been conducted in collective communication operations, in particular in MPI broadcast, on distributed memory platforms. Most of the research efforts aim to optimize the collective operations for particular architectures by taking into account either their topology or platform parameters. In this work we propose a simple but general approach to optimization of the legacy MPI broadcast algorithms, which are widely used in MPICH and Open MPI. The proposed optimization technique is designed to address the challenge of extreme scale of future HPC platforms. It is based on hierarchical transformation of the traditionally flat logical arrangement of communicating processors. Theoretical analysis and experimental results on IBM BlueGene/P and a cluster of the Grid'5000 platform are presented.
AB - © 2015 Elsevier B.V. All rights reserved. Significant research has been conducted in collective communication operations, in particular in MPI broadcast, on distributed memory platforms. Most of the research efforts aim to optimize the collective operations for particular architectures by taking into account either their topology or platform parameters. In this work we propose a simple but general approach to optimization of the legacy MPI broadcast algorithms, which are widely used in MPICH and Open MPI. The proposed optimization technique is designed to address the challenge of extreme scale of future HPC platforms. It is based on hierarchical transformation of the traditionally flat logical arrangement of communicating processors. Theoretical analysis and experimental results on IBM BlueGene/P and a cluster of the Grid'5000 platform are presented.
UR - http://hdl.handle.net/10754/600038
UR - https://linkinghub.elsevier.com/retrieve/pii/S1569190X15000465
UR - http://www.scopus.com/inward/record.url?scp=84944275700&partnerID=8YFLogxK
U2 - 10.1016/j.simpat.2015.03.005
DO - 10.1016/j.simpat.2015.03.005
M3 - Article
SN - 1569-190X
VL - 58
SP - 30
EP - 39
JO - Simulation Modelling Practice and Theory
JF - Simulation Modelling Practice and Theory
ER -