TY - GEN
T1 - Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization
AU - Belli, Roberto
AU - Hoefler, Torsten
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: We thank Hatem Ltaief (Kaust) for providing theCholesky example. We thank the GASPI team for inspir-ing discussions about RMA interfaces and Christian Sim-mendinger for numerous clarifications about the GASPIspecification. We thank James Dinan (Intel), Jeff Ham-mond (Intel), Kathy Yelick (LBNL), Edgar Solomonik, TimoSchneider, and Salvatore Di Girolamo for helpful discus-sions, Larry Kaplan (Cray) for help with uGNI, and theSwiss National Supercomputing Centre (CSCS) for accessto Piz Daint.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.
PY - 2015/5
Y1 - 2015/5
N2 - Remote Memory Access (RMA) programming enables direct access to low-level hardware features to achieve high performance for distributed-memory programs. However, the design of RMA programming schemes focuses on the memory access and less on the synchronization. For example, in contemporary RMA programming systems, the widely used producer-consumer pattern can only be implemented inefficiently, incurring in an overhead of an additional round-trip message. We propose Notified Access, a scheme where the target process of an access can receive a completion notification. This scheme enables direct and efficient synchronization with a minimum number of messages. We implement our scheme in an open source MPI-3 RMA library and demonstrate lower overheads (two cache misses) than other point-to-point synchronization mechanisms for each notification. We also evaluate our implementation on three real-world benchmarks, a stencil computation, a tree computation, and a Colicky factorization implemented with tasks. Our scheme always performs better than traditional message passing and other existing RMA synchronization schemes, providing up to 50% speedup on small messages. Our analysis shows that Notified Access is a valuable primitive for any RMA system. Furthermore, we provide guidance for the design of low-level network interfaces to support Notified Access efficiently.
AB - Remote Memory Access (RMA) programming enables direct access to low-level hardware features to achieve high performance for distributed-memory programs. However, the design of RMA programming schemes focuses on the memory access and less on the synchronization. For example, in contemporary RMA programming systems, the widely used producer-consumer pattern can only be implemented inefficiently, incurring in an overhead of an additional round-trip message. We propose Notified Access, a scheme where the target process of an access can receive a completion notification. This scheme enables direct and efficient synchronization with a minimum number of messages. We implement our scheme in an open source MPI-3 RMA library and demonstrate lower overheads (two cache misses) than other point-to-point synchronization mechanisms for each notification. We also evaluate our implementation on three real-world benchmarks, a stencil computation, a tree computation, and a Colicky factorization implemented with tasks. Our scheme always performs better than traditional message passing and other existing RMA synchronization schemes, providing up to 50% speedup on small messages. Our analysis shows that Notified Access is a valuable primitive for any RMA system. Furthermore, we provide guidance for the design of low-level network interfaces to support Notified Access efficiently.
UR - http://hdl.handle.net/10754/599007
UR - http://ieeexplore.ieee.org/document/7161573/
UR - http://www.scopus.com/inward/record.url?scp=84971422104&partnerID=8YFLogxK
U2 - 10.1109/ipdps.2015.30
DO - 10.1109/ipdps.2015.30
M3 - Conference contribution
SN - 9781479986491
SP - 871
EP - 881
BT - 2015 IEEE International Parallel and Distributed Processing Symposium
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -