OverGen: Improving FPGA Usability through Domain-specific Overlay Generation

Sihao Liu, Jian Weng, Dylan Kupsh, Atefeh Sohrabizadeh, Zhengrong Wang, Licheng Guo, Jiuyang Liu, Maxim Zhulin, Rishabh Mani, Lucheng Zhang, Jason Cong, Tony Nowatzki

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations


FPGAs have been proven to be powerful computational accelerators across many types of workloads. The mainstream programming approach is high level synthesis (HLS), which maps high-level languages (e.g. C+ #pragmas) to hardware. Unfortunately, HLS leaves a significant programmability gap in terms of reconfigurability, customization and versatility: Although HLS compilation is fast, the downstream physical design takes hours to days; FPGA reconfiguration time limits the time-multiplexing ability of hardware, and tools do not reason about cross-workload flexibility. Overlay architectures mitigate the above by mapping a programmable design (e.g. CPU, GPU, etc.) on top of FPGAs. However, the abstraction gap between overlay and FPGA leads to low efficiency/utilization. Our essential idea is to develop a hardware generation framework targeting a highly-customizable overlay, so that the abstraction gap can be lowered by tuning the design instance to applications of interest. We leverage and extend prior work on customizable spatial architectures, SoC generation, accelerator compilers, and design space explorers to create an end-to-end FPGA acceleration system. Our novel techniques address inefficient networks between on-chip memories and processing elements, as well as improving DSE by reducing the amount of recompilation required. Our framework, OverGen, is highly competitive with fixed-function HLS-based designs, even though the generated designs are programmable with fast reconfiguration. We compared to a state-of-the-art DSE-based HLS framework, AutoDSE. Without kernel-tuning for AutoDSE, OverGen gets 1.2 × geomean performance, and even with manual kernel-tuning for the baseline, OverGen still gets 0.55 × geomean performance - all while providing runtime flexibility across workloads.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022
PublisherIEEE Computer Society
Number of pages22
ISBN (Electronic)9781665462723
StatePublished - 2022
Event55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022 - Chicago, United States
Duration: Oct 1 2022Oct 5 2022

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
ISSN (Print)1072-4451


Conference55th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2022
Country/TerritoryUnited States


  • CGRA
  • Design Automation
  • Domain-specific Accelerators
  • FPGA
  • Reconfigurable architectures

ASJC Scopus subject areas

  • Hardware and Architecture


Dive into the research topics of 'OverGen: Improving FPGA Usability through Domain-specific Overlay Generation'. Together they form a unique fingerprint.

Cite this