TY - GEN
T1 - OverGen
T2 - 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022
AU - Liu, Sihao
AU - Weng, Jian
AU - Kupsh, Dylan
AU - Sohrabizadeh, Atefeh
AU - Wang, Zhengrong
AU - Guo, Licheng
AU - Liu, Jiuyang
AU - Zhulin, Maxim
AU - Mani, Rishabh
AU - Zhang, Lucheng
AU - Cong, Jason
AU - Nowatzki, Tony
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - FPGAs have been proven to be powerful computational accelerators across many types of workloads. The mainstream programming approach is high level synthesis (HLS), which maps high-level languages (e.g. C+ #pragmas) to hardware. Unfortunately, HLS leaves a significant programmability gap in terms of reconfigurability, customization and versatility: Although HLS compilation is fast, the downstream physical design takes hours to days; FPGA reconfiguration time limits the time-multiplexing ability of hardware, and tools do not reason about cross-workload flexibility. Overlay architectures mitigate the above by mapping a programmable design (e.g. CPU, GPU, etc.) on top of FPGAs. However, the abstraction gap between overlay and FPGA leads to low efficiency/utilization. Our essential idea is to develop a hardware generation framework targeting a highly-customizable overlay, so that the abstraction gap can be lowered by tuning the design instance to applications of interest. We leverage and extend prior work on customizable spatial architectures, SoC generation, accelerator compilers, and design space explorers to create an end-to-end FPGA acceleration system. Our novel techniques address inefficient networks between on-chip memories and processing elements, as well as improving DSE by reducing the amount of recompilation required. Our framework, OverGen, is highly competitive with fixed-function HLS-based designs, even though the generated designs are programmable with fast reconfiguration. We compared to a state-of-the-art DSE-based HLS framework, AutoDSE. Without kernel-tuning for AutoDSE, OverGen gets 1.2 × geomean performance, and even with manual kernel-tuning for the baseline, OverGen still gets 0.55 × geomean performance - all while providing runtime flexibility across workloads.
AB - FPGAs have been proven to be powerful computational accelerators across many types of workloads. The mainstream programming approach is high level synthesis (HLS), which maps high-level languages (e.g. C+ #pragmas) to hardware. Unfortunately, HLS leaves a significant programmability gap in terms of reconfigurability, customization and versatility: Although HLS compilation is fast, the downstream physical design takes hours to days; FPGA reconfiguration time limits the time-multiplexing ability of hardware, and tools do not reason about cross-workload flexibility. Overlay architectures mitigate the above by mapping a programmable design (e.g. CPU, GPU, etc.) on top of FPGAs. However, the abstraction gap between overlay and FPGA leads to low efficiency/utilization. Our essential idea is to develop a hardware generation framework targeting a highly-customizable overlay, so that the abstraction gap can be lowered by tuning the design instance to applications of interest. We leverage and extend prior work on customizable spatial architectures, SoC generation, accelerator compilers, and design space explorers to create an end-to-end FPGA acceleration system. Our novel techniques address inefficient networks between on-chip memories and processing elements, as well as improving DSE by reducing the amount of recompilation required. Our framework, OverGen, is highly competitive with fixed-function HLS-based designs, even though the generated designs are programmable with fast reconfiguration. We compared to a state-of-the-art DSE-based HLS framework, AutoDSE. Without kernel-tuning for AutoDSE, OverGen gets 1.2 × geomean performance, and even with manual kernel-tuning for the baseline, OverGen still gets 0.55 × geomean performance - all while providing runtime flexibility across workloads.
KW - CGRA
KW - Design Automation
KW - Domain-specific Accelerators
KW - FPGA
KW - Reconfigurable architectures
UR - http://www.scopus.com/inward/record.url?scp=85138955850&partnerID=8YFLogxK
U2 - 10.1109/MICRO56248.2022.00018
DO - 10.1109/MICRO56248.2022.00018
M3 - Conference contribution
AN - SCOPUS:85138955850
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 35
EP - 56
BT - Proceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022
PB - IEEE Computer Society
Y2 - 1 October 2022 through 5 October 2022
ER -