Hardware-Aware Distributed Pipelined Neural Network Models Inference

  • Mojtaba Alshams

Student thesis: Master's Thesis

Abstract

Neural Network models got the attention of the scientific community for their increasing accuracy in predictions and good emulation of some human tasks. This led to extensive enhancements in their architecture, resulting in models with fast-growing memory and computation requirements. Due to hardware constraints such as memory and computing capabilities, the inference of a large neural network model can be distributed across multiple devices by a partitioning algorithm. The proposed framework finds the optimal model splits and chooses which device shall compute a corresponding split to minimize inference time and energy. The framework is based on PipeEdge algorithm and extends it by not only increasing inference throughput but also simultaneously minimizing inference energy consumption. Another thesis contribution is the augmentation of the emerging technology Compute-in-memory (CIM) devices to the system. To the best of my knowledge, no one studied the effect of including CIM, specifically DNN+NeuroSim simulator, devices in a distributed inference. My proposed framework could partition VGG8 and ResNet152 on ImageNet and achieve a comparable trade-off between inference slowest stage increase and energy reduction when it tried to decrease inference energy (e.g. 19% energy reduction with 34% time increase) and when CIM devices were augmenting the system (e.g. 34% energy reduction with 45% time increase).
Date of AwardJul 2023
Original languageEnglish (US)
Awarding Institution
  • Computer, Electrical and Mathematical Sciences and Engineering
SupervisorAhmed Eltawil (Supervisor)

Keywords

  • Distributed Inference
  • Neural Networks
  • Machined Learning
  • Compute-in-memory
  • throughput energy trade-off

Cite this

'