Query-driven parallel exploration of large datasets

Atanas Atanasov*, Madhusudhanan Srinivasan, Tobias Weinzierl

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Recent advances in supercomputing capabilities pose a multi-faceted data retrieval challenge to the exploration and visualisation of the obtained results: the bandwidth between visualisation devices and the high-performance computing (HPC) clusters neither scales with the simulation data nor with the compute power, the total memory footprint of the data on the supercomputer often exceeds the aggregate memory on the visualisation, and the data has to be distributed among several visualisation nodes working in parallel to render a visual. In the present paper, we introduce an on-demand data exploration paradigm that leverages HPC capabilities and distributed visualisation without requiring a large memory footprint on the visualisation cluster. Regions of interest within the data are specified by the user in the form of queries. These queries, augmented by node identifiers on the visualisation cluster, are automatically distributed among multiple compute nodes of the HPC cluster. The compute nodes work in parallel to assemble and merge data in response to the user query until the data distribution matches the visualisation cluster's topology. Query results are then simultaneously streamed to the right visualisation nodes. Our approach allows for interactive exploration of data residing on HPC resources, irrespective of memory footprint. The streaming of data to the visualisation nodes scales with the bandwidth of the interconnecting network and the HPC cluster's domain decomposition, while the latter is hidden from the visualisation and can change dynamically. We demonstrate the capability of our query-driven approach with a turbulent mixing dataset, and show that it supports interactive data exploration on HPC systems.

Original languageEnglish (US)
Title of host publicationIEEE Symposium on Large Data Analysis and Visualization 2012, LDAV 2012 - Proceedings
Pages23-30
Number of pages8
DOIs
StatePublished - 2012
Event2nd Symposium on Large-Scale Data Analysis and Visualization, LDAV 2012 - Seattle, WA, United States
Duration: Oct 14 2012Oct 19 2012

Publication series

NameIEEE Symposium on Large Data Analysis and Visualization 2012, LDAV 2012 - Proceedings

Other

Other2nd Symposium on Large-Scale Data Analysis and Visualization, LDAV 2012
Country/TerritoryUnited States
CitySeattle, WA
Period10/14/1210/19/12

Keywords

  • On-demand data exploration
  • computational steering
  • distributed visualisation
  • large-scale data

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Information Systems

Fingerprint

Dive into the research topics of 'Query-driven parallel exploration of large datasets'. Together they form a unique fingerprint.

Cite this