TY - GEN
T1 - A hybrid systolic-dataflow architecture for inductive matrix algorithms
AU - Weng, Jian
AU - Liu, Sihao
AU - Wang, Zhengrong
AU - Dadu, Vidushi
AU - Nowatzki, Tony
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/2
Y1 - 2020/2
N2 - Dense linear algebra kernels are critical for wireless, and the oncoming proliferation of 5G only amplifies their importance. Due to the inductive nature of many such algorithms, parallelism is difficult to exploit: parallel regions have fine-grain producer/consumer interaction with iteratively changing depen-dence distance, reuse rate, and memory access patterns. This makes multi-threading impractical due to fine-grain synchronization, and vectorization ineffective due to the non-rectangular iteration domain. CPUs, DSPs, and GPUs perform order-of-magnitude below peak. Our insight is that if the nature of inductive dependences and memory accesses were explicit in the hardware/software interface, then a spatial architecture could efficiently execute parallel code regions. To this end, we first develop a novel execution model, inductive dataflow, where inductive dependence patterns and memory access patterns (streams) are first-order primitives. Second, we develop a hybrid spatial architecture combining systolic and tagged dataflow execution to attain high utilization at low energy and area cost. Finally, we create a scalable design through a novel vector-stream control model which amortizes control overhead both in time and spatially across architecture lanes. We evaluate our design, REVEL, with a full stack (compiler, ISA, simulator, RTL). Across a suite of linear algebra kernels, REVEL outperforms equally-provisioned DSPs by 4.6×-37×. Compared to state-of-the-art spatial architectures, REVEL is mean 3× faster. Compared to a set of ASICs, REVEL is only 2× the power and half the area.
AB - Dense linear algebra kernels are critical for wireless, and the oncoming proliferation of 5G only amplifies their importance. Due to the inductive nature of many such algorithms, parallelism is difficult to exploit: parallel regions have fine-grain producer/consumer interaction with iteratively changing depen-dence distance, reuse rate, and memory access patterns. This makes multi-threading impractical due to fine-grain synchronization, and vectorization ineffective due to the non-rectangular iteration domain. CPUs, DSPs, and GPUs perform order-of-magnitude below peak. Our insight is that if the nature of inductive dependences and memory accesses were explicit in the hardware/software interface, then a spatial architecture could efficiently execute parallel code regions. To this end, we first develop a novel execution model, inductive dataflow, where inductive dependence patterns and memory access patterns (streams) are first-order primitives. Second, we develop a hybrid spatial architecture combining systolic and tagged dataflow execution to attain high utilization at low energy and area cost. Finally, we create a scalable design through a novel vector-stream control model which amortizes control overhead both in time and spatially across architecture lanes. We evaluate our design, REVEL, with a full stack (compiler, ISA, simulator, RTL). Across a suite of linear algebra kernels, REVEL outperforms equally-provisioned DSPs by 4.6×-37×. Compared to state-of-the-art spatial architectures, REVEL is mean 3× faster. Compared to a set of ASICs, REVEL is only 2× the power and half the area.
KW - Digital Signal Processor
KW - Reconfigurable Accelerator
KW - Software/Hardware Codesign
KW - Spatial Architecture
UR - http://www.scopus.com/inward/record.url?scp=85084182544&partnerID=8YFLogxK
U2 - 10.1109/HPCA47549.2020.00063
DO - 10.1109/HPCA47549.2020.00063
M3 - Conference contribution
AN - SCOPUS:85084182544
T3 - Proceedings - 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020
SP - 703
EP - 716
BT - Proceedings - 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 26th IEEE International Symposium on High Performance Computer Architecture, HPCA 2020
Y2 - 22 February 2020 through 26 February 2020
ER -