TY - GEN
T1 - Query-driven parallel exploration of large datasets
AU - Atanasov, Atanas
AU - Srinivasan, Madhusudhanan
AU - Weinzierl, Tobias
PY - 2012
Y1 - 2012
N2 - Recent advances in supercomputing capabilities pose a multi-faceted data retrieval challenge to the exploration and visualisation of the obtained results: the bandwidth between visualisation devices and the high-performance computing (HPC) clusters neither scales with the simulation data nor with the compute power, the total memory footprint of the data on the supercomputer often exceeds the aggregate memory on the visualisation, and the data has to be distributed among several visualisation nodes working in parallel to render a visual. In the present paper, we introduce an on-demand data exploration paradigm that leverages HPC capabilities and distributed visualisation without requiring a large memory footprint on the visualisation cluster. Regions of interest within the data are specified by the user in the form of queries. These queries, augmented by node identifiers on the visualisation cluster, are automatically distributed among multiple compute nodes of the HPC cluster. The compute nodes work in parallel to assemble and merge data in response to the user query until the data distribution matches the visualisation cluster's topology. Query results are then simultaneously streamed to the right visualisation nodes. Our approach allows for interactive exploration of data residing on HPC resources, irrespective of memory footprint. The streaming of data to the visualisation nodes scales with the bandwidth of the interconnecting network and the HPC cluster's domain decomposition, while the latter is hidden from the visualisation and can change dynamically. We demonstrate the capability of our query-driven approach with a turbulent mixing dataset, and show that it supports interactive data exploration on HPC systems.
AB - Recent advances in supercomputing capabilities pose a multi-faceted data retrieval challenge to the exploration and visualisation of the obtained results: the bandwidth between visualisation devices and the high-performance computing (HPC) clusters neither scales with the simulation data nor with the compute power, the total memory footprint of the data on the supercomputer often exceeds the aggregate memory on the visualisation, and the data has to be distributed among several visualisation nodes working in parallel to render a visual. In the present paper, we introduce an on-demand data exploration paradigm that leverages HPC capabilities and distributed visualisation without requiring a large memory footprint on the visualisation cluster. Regions of interest within the data are specified by the user in the form of queries. These queries, augmented by node identifiers on the visualisation cluster, are automatically distributed among multiple compute nodes of the HPC cluster. The compute nodes work in parallel to assemble and merge data in response to the user query until the data distribution matches the visualisation cluster's topology. Query results are then simultaneously streamed to the right visualisation nodes. Our approach allows for interactive exploration of data residing on HPC resources, irrespective of memory footprint. The streaming of data to the visualisation nodes scales with the bandwidth of the interconnecting network and the HPC cluster's domain decomposition, while the latter is hidden from the visualisation and can change dynamically. We demonstrate the capability of our query-driven approach with a turbulent mixing dataset, and show that it supports interactive data exploration on HPC systems.
KW - On-demand data exploration
KW - computational steering
KW - distributed visualisation
KW - large-scale data
UR - http://www.scopus.com/inward/record.url?scp=84872170687&partnerID=8YFLogxK
U2 - 10.1109/LDAV.2012.6378972
DO - 10.1109/LDAV.2012.6378972
M3 - Conference contribution
AN - SCOPUS:84872170687
SN - 9781467347334
T3 - IEEE Symposium on Large Data Analysis and Visualization 2012, LDAV 2012 - Proceedings
SP - 23
EP - 30
BT - IEEE Symposium on Large Data Analysis and Visualization 2012, LDAV 2012 - Proceedings
T2 - 2nd Symposium on Large-Scale Data Analysis and Visualization, LDAV 2012
Y2 - 14 October 2012 through 19 October 2012
ER -