OmNICCL: Zero-cost Sparse AllReduce with Direct Cache Access and SmartNICs

Tongzhou Gu, Jiawei Fei, Marco Canini

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

AllReduce is a collective communication pattern commonly used in Distributed Deep Learning (DDL) and High Performance Computing (HPC). Sparse AllReduce, which compresses the data transmitted, achieves significant acceleration on specific workloads. However, compression introduces a non-negligible performance overhead. Therefore, we propose the OmNICreduce algorithm, an efficient inter-node sparse AllReduce method, as well as its implementation, OmNICCL. It utilizes Direct Cache Access (DCA) to achieve zero-overhead lossless compression and employs SmartNICs for aggregation on the data plane. We demonstrate that our method can provide up to a 7.24× speedup over conventional dense AllReduce methods under a 100Gbps RoCEv2 network and 1.76-17.37× performance improvement over state-of-the-art implementations when performing sparse AllReduce.

Original languageEnglish (US)
Title of host publicationNAIC 2024 - Proceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing
PublisherAssociation for Computing Machinery, Inc
Pages75-83
Number of pages9
ISBN (Electronic)9798400707131
DOIs
StatePublished - Aug 4 2024
Event1st Workshop on Networks for AI Computing, NAIC 2024 - Sydney, Australia
Duration: Aug 4 2024Aug 8 2024

Publication series

NameNAIC 2024 - Proceedings of the 2024 SIGCOMM Workshop on Networks for AI Computing

Conference

Conference1st Workshop on Networks for AI Computing, NAIC 2024
Country/TerritoryAustralia
CitySydney
Period08/4/2408/8/24

Keywords

  • Collective Communication
  • DCA
  • DPU
  • In-Network Aggregation
  • SmartNIC

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'OmNICCL: Zero-cost Sparse AllReduce with Direct Cache Access and SmartNICs'. Together they form a unique fingerprint.

Cite this