Parallel space-time likelihood optimization for air pollution prediction on large-scale systems

Mary Lai O. Salvaña, Sameh Abdulah, Hatem Ltaief, Ying Sun, Marc G. Genton, David E. Keyes

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Gaussian geostatistical space-time modeling is an effective tool for performing statistical inference of field data evolving in space and time, generalizing spatial modeling alone at the cost of the greater complexity of operations and storage, and pushing geostatistical modeling even further into the arms of high-performance computing. It makes inferences for missing data by leveraging space-time measurements of one or more fields. We propose a high-performance implementation of a widely applied space-time model for large-scale systems using a two-level parallelization technique. At the inner level, we rely on state-of-the-art dense linear algebra libraries and parallel runtime systems to perform complex matrix operations required to evaluate the maximum likelihood estimation (MLE). At the outer level, we parallelize the optimization process using a distributed implementation of the particle swarm optimization (PSO) algorithm. At this level, parallelization is accomplished using MPI sub-communicators, such that the nodes in each sub-communicator perform a single MLE iteration at a time. To evaluate the effectiveness of the proposed methodology, we assess the accuracy of the newly implemented space-time model on a set of large-scale synthetic space-time datasets. Moreover, we use the proposed implementation to model two air pollution datasets from the Middle East and US regions with 550 spatial locations X730 time slots and 945 spatial locations X500 time slots, respectively. The evaluation shows that the proposed approach satisfies high prediction accuracy on both synthetic datasets and real particulate matter (PM) datasets in the context of the air pollution problem. We achieve up to 757.16 TFLOPS/s using 1024 nodes (75% of the peak performance) using 490K geospatial locations on Shaheen-II Cray XC40 system.

Original languageEnglish (US)
Title of host publicationProceedings of the Platform for Advanced Scientific Computing Conference, PASC 2022
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450394109
DOIs
StatePublished - Jun 27 2022
Event2022 Platform for Advanced Scientific Computing Conference, PASC 2022 - Basel, Switzerland
Duration: Jun 27 2022Jun 29 2022

Publication series

NameProceedings of the Platform for Advanced Scientific Computing Conference, PASC 2022

Conference

Conference2022 Platform for Advanced Scientific Computing Conference, PASC 2022
Country/TerritorySwitzerland
CityBasel
Period06/27/2206/29/22

Keywords

  • geostatistics
  • high-performance computing
  • space-time modeling

ASJC Scopus subject areas

  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Parallel space-time likelihood optimization for air pollution prediction on large-scale systems'. Together they form a unique fingerprint.

Cite this