TY - GEN
T1 - Parallel space-time likelihood optimization for air pollution prediction on large-scale systems
AU - Salvaña, Mary Lai O.
AU - Abdulah, Sameh
AU - Ltaief, Hatem
AU - Sun, Ying
AU - Genton, Marc G.
AU - Keyes, David E.
N1 - Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/6/27
Y1 - 2022/6/27
N2 - Gaussian geostatistical space-time modeling is an effective tool for performing statistical inference of field data evolving in space and time, generalizing spatial modeling alone at the cost of the greater complexity of operations and storage, and pushing geostatistical modeling even further into the arms of high-performance computing. It makes inferences for missing data by leveraging space-time measurements of one or more fields. We propose a high-performance implementation of a widely applied space-time model for large-scale systems using a two-level parallelization technique. At the inner level, we rely on state-of-the-art dense linear algebra libraries and parallel runtime systems to perform complex matrix operations required to evaluate the maximum likelihood estimation (MLE). At the outer level, we parallelize the optimization process using a distributed implementation of the particle swarm optimization (PSO) algorithm. At this level, parallelization is accomplished using MPI sub-communicators, such that the nodes in each sub-communicator perform a single MLE iteration at a time. To evaluate the effectiveness of the proposed methodology, we assess the accuracy of the newly implemented space-time model on a set of large-scale synthetic space-time datasets. Moreover, we use the proposed implementation to model two air pollution datasets from the Middle East and US regions with 550 spatial locations X730 time slots and 945 spatial locations X500 time slots, respectively. The evaluation shows that the proposed approach satisfies high prediction accuracy on both synthetic datasets and real particulate matter (PM) datasets in the context of the air pollution problem. We achieve up to 757.16 TFLOPS/s using 1024 nodes (75% of the peak performance) using 490K geospatial locations on Shaheen-II Cray XC40 system.
AB - Gaussian geostatistical space-time modeling is an effective tool for performing statistical inference of field data evolving in space and time, generalizing spatial modeling alone at the cost of the greater complexity of operations and storage, and pushing geostatistical modeling even further into the arms of high-performance computing. It makes inferences for missing data by leveraging space-time measurements of one or more fields. We propose a high-performance implementation of a widely applied space-time model for large-scale systems using a two-level parallelization technique. At the inner level, we rely on state-of-the-art dense linear algebra libraries and parallel runtime systems to perform complex matrix operations required to evaluate the maximum likelihood estimation (MLE). At the outer level, we parallelize the optimization process using a distributed implementation of the particle swarm optimization (PSO) algorithm. At this level, parallelization is accomplished using MPI sub-communicators, such that the nodes in each sub-communicator perform a single MLE iteration at a time. To evaluate the effectiveness of the proposed methodology, we assess the accuracy of the newly implemented space-time model on a set of large-scale synthetic space-time datasets. Moreover, we use the proposed implementation to model two air pollution datasets from the Middle East and US regions with 550 spatial locations X730 time slots and 945 spatial locations X500 time slots, respectively. The evaluation shows that the proposed approach satisfies high prediction accuracy on both synthetic datasets and real particulate matter (PM) datasets in the context of the air pollution problem. We achieve up to 757.16 TFLOPS/s using 1024 nodes (75% of the peak performance) using 490K geospatial locations on Shaheen-II Cray XC40 system.
KW - geostatistics
KW - high-performance computing
KW - space-time modeling
UR - http://www.scopus.com/inward/record.url?scp=85134851148&partnerID=8YFLogxK
U2 - 10.1145/3539781.3539800
DO - 10.1145/3539781.3539800
M3 - Conference contribution
AN - SCOPUS:85134851148
T3 - Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2022
BT - Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2022
PB - Association for Computing Machinery, Inc
T2 - 2022 Platform for Advanced Scientific Computing Conference, PASC 2022
Y2 - 27 June 2022 through 29 June 2022
ER -