Out-of-Core simulation systems often produce a massive amount of data that cannot
t on the aggregate fast memory of the compute nodes, and they also require to
read back these data for computation. As a result, I/O data movement can be a
bottleneck in large-scale simulations. Advances in memory architecture have made
it feasible and a ordable to integrate hierarchical storage media on large-scale systems,
starting from the traditional Parallel File Systems (PFSs) to intermediate fast
disk technologies (e.g., node-local and remote-shared NVMe and SSD-based Burst
Bu ers) and up to CPU main memory and GPU High Bandwidth Memory (HBM).
However, while adding additional and faster storage media increases I/O bandwidth,
it pressures the CPU, as it becomes responsible for managing and moving data between
these layers of storage. Simulation systems are thus vulnerable to being blocked
by I/O operations. The Multilayer Bu er System (MLBS) proposed in this research
demonstrates a general and versatile method for overlapping I/O with computation
that helps to ameliorate the strain on the processors through asynchronous access.
The main idea consists in decoupling I/O operations from computational phases using
dedicated hardware resources to perform expensive context switches. MLBS monitors
I/O tra c in each storage layer allowing fair utilization of shared resources. By
continually prefetching up and down across all hardware layers of the memory and
storage subsystems, MLBS transforms the original I/O-bound behavior of evaluated
applications and shifts it closer to a memory-bound or compute-bound regime. The evaluation on the Cray XC40 Shaheen-2 supercomputer for a representative I/Obound
application, seismic inversion, shows that MLBS outperforms state-of-the-art
PFSs, i.e., Lustre, Data Elevator and DataWarp by 6.06X, 2.23X, and 1.90X, respectively.
On the IBM-built Summit supercomputer, using 2048 compute nodes equipped
with a total of 12288 GPUs, MLBS achieves up to 1.4X performance speedup compared
to the reference PFS-based implementation. MLBS is also demonstrated on
applications from cosmology, combustion, and a classic out-of-core computational
physics and linear algebra routines.
|Date of Award||Sep 30 2020|
|Original language||English (US)|
- Computer, Electrical and Mathematical Sciences and Engineering
|Supervisor||David Keyes (Supervisor)|
- Burst Buffer
- Heterogeneous Computing