DONNA: Distributed Optimized Neural Network Allocation on CIM-Based Heterogeneous Accelerators

Mojtaba F. Alshams*, Kamilya S. Smagulova, Suhaib A. Fahmy, Mohammed E. Fouda, Ahmed M. Eltawil

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The continued development of neural network architectures continues to drive demand for computing power. While data center scaling continues, inference away from the cloud will increasingly rely on distributed inference on multiple devices. Most prior efforts have focused on optimizing singledevice inference or partitioning models to enhance inference throughput. Meanwhile, energy consumption continues to grow in importance as a factor of consideration. This work proposes a framework that searches for optimal model splits and distributes the partitions across the combination of devices taking into account throughput and energy. Participating devices are strategically grouped into homogeneous and heterogeneous clusters consisting of general-purpose CPU and GPU architectures, as well as emerging Compute-In-Memory (CIM) accelerators. The framework simultaneously optimizes inference throughput and energy consumption. It is able to demonstrate up to 4× speedup with approximately 4× per-device energy reduction in a heterogeneous setup compared to single GPU inference. The algorithm also finds a smooth Pareto-like curve in the energy-throughput space for CIM devices.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 IEEE International Conference on Edge Computing and Communications, EDGE 2024
EditorsRong N. Chang, Carl K. Chang, Jingwei Yang, Zhi Jin, Michael Sheng, Jing Fan, Kenneth K. Fletcher, Qiang He, Nimanthi Atukorala, Hongyue Wu, Shiqiang Wang, Shuiguang Deng, Nirmit Desai, Gopal Pingali, Javid Taheri, K. V. Subramaniam, Feras Awaysheh, Kaouta El Maghaouri, Yingjie Wang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages149-156
Number of pages8
ISBN (Electronic)9798350368499
DOIs
StatePublished - 2024
Event8th IEEE International Conference on Edge Computing and Communications, EDGE 2024 - Shenzhen, China
Duration: Jul 7 2024Jul 13 2024

Publication series

NameProceedings - IEEE International Conference on Edge Computing
ISSN (Print)2767-9918

Conference

Conference8th IEEE International Conference on Edge Computing and Communications, EDGE 2024
Country/TerritoryChina
CityShenzhen
Period07/7/2407/13/24

Keywords

  • CIM
  • Compute-in-memory
  • CPU
  • Distributed Inference
  • GPU
  • Heterogeneous Devices
  • Heterogeneous Hardware
  • Model splitting
  • ReRAM

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'DONNA: Distributed Optimized Neural Network Allocation on CIM-Based Heterogeneous Accelerators'. Together they form a unique fingerprint.

Cite this