TY - GEN
T1 - Near-Stream Computing
T2 - 28th Annual IEEE International Symposium on High-Performance Computer Architecture, HPCA 2022
AU - Wang, Zhengrong
AU - Weng, Jian
AU - Liu, Sihao
AU - Nowatzki, Tony
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Data movement and communication have become the primary bottlenecks in large multicore systems. The near-data computing paradigm provides a solution: move computation to where the data resides on-chip. Two challenges keep near-data computing from the mainstream: lack of programmer transparency and applicability. Programmer transparency requires providing sequential memory semantics with distributed computation, which requires burdensome coordination. Broad applicability requires support for combinations of address patterns (e.g. affine, indirect, multi-operand) and computation types (loads, stores, reductions, atomics).We find that streams - coarse grain memory access patterns - are a powerful ISA abstraction for near data offloading. Tracking data access at stream-granularity heavily reduces the burden of coordination for providing sequential semantics. Decomposing the problem using streams means that arbitrary combinations of address and computation patterns can be combined for broad generality.With this insight, we develop a paradigm called near-stream computing, comprising a compiler, CPU ISA extension, and a microarchitecture that facilitate programmer transparent computation offloading to shared caches. We evaluate our system on OpenMP kernels that stress broad addressing and compute behavior, and find that 46% of dynamic instructions can be offloaded to remote banks, reducing the network traffic by 76%. Overall it achieves 2.13× speedup over a state-of-the-art near-data computing technique, with a 1.90× energy efficiency gain.
AB - Data movement and communication have become the primary bottlenecks in large multicore systems. The near-data computing paradigm provides a solution: move computation to where the data resides on-chip. Two challenges keep near-data computing from the mainstream: lack of programmer transparency and applicability. Programmer transparency requires providing sequential memory semantics with distributed computation, which requires burdensome coordination. Broad applicability requires support for combinations of address patterns (e.g. affine, indirect, multi-operand) and computation types (loads, stores, reductions, atomics).We find that streams - coarse grain memory access patterns - are a powerful ISA abstraction for near data offloading. Tracking data access at stream-granularity heavily reduces the burden of coordination for providing sequential semantics. Decomposing the problem using streams means that arbitrary combinations of address and computation patterns can be combined for broad generality.With this insight, we develop a paradigm called near-stream computing, comprising a compiler, CPU ISA extension, and a microarchitecture that facilitate programmer transparent computation offloading to shared caches. We evaluate our system on OpenMP kernels that stress broad addressing and compute behavior, and find that 46% of dynamic instructions can be offloaded to remote banks, reducing the network traffic by 76%. Overall it achieves 2.13× speedup over a state-of-the-art near-data computing technique, with a 1.90× energy efficiency gain.
KW - Near-Data Computing
KW - Programmer-Transparent Acceleration
KW - Stream-Based ISAs
UR - http://www.scopus.com/inward/record.url?scp=85126396478&partnerID=8YFLogxK
U2 - 10.1109/HPCA53966.2022.00032
DO - 10.1109/HPCA53966.2022.00032
M3 - Conference contribution
AN - SCOPUS:85126396478
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 331
EP - 345
BT - Proceedings - 2022 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2022
PB - IEEE Computer Society
Y2 - 2 April 2022 through 6 April 2022
ER -