A comparison of machine learning methods for ozone pollution prediction

Fouzi Harrou, Ying Sun, Qilong Pan

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


Precise and efficient ozone (O3) concentration prediction is crucial for weather monitoring and environmental policymaking due to the harmful effects of high O3 pollution levels on human health and ecosystems. However, the complexity of O3 formation mechanisms in the troposphere presents a significant challenge in modeling O3 accurately and quickly, especially in the absence of a process model. Data-driven machine-learning techniques have demonstrated promising performance in modeling air pollution, mainly when a process model is unavailable. This study evaluates the predictive performance of nineteen machine learning models for ozone pollution prediction. Specifically, we assess how incorporating features using Random Forest affects O3 concentration prediction and investigate using time-lagged measurements to improve prediction accuracy. Air pollution and meteorological data collected at King Abdullah University of Science and Technology are used. Results show that dynamic models using time-lagged data outperform static and reduced machine learning models. Incorporating time-lagged data improves the accuracy of machine learning models by 300% and 200%, respectively, compared to static and reduced models, under RMSE metrics. And importantly, the best dynamic model with time-lagged information only requires 0.01 s, indicating its practical use. The Diebold-Mariano Test, a statistical test used to compare the forecasting accuracy of models, is also conducted.
Original languageEnglish (US)
JournalJournal of Big Data
Issue number1
StatePublished - May 15 2023


Dive into the research topics of 'A comparison of machine learning methods for ozone pollution prediction'. Together they form a unique fingerprint.

Cite this