TY - JOUR
T1 - Explainable machine learning methods for predicting water treatment plant features under varying weather conditions
AU - Saleem, Mohammed Al
AU - Harrou, Fouzi
AU - Sun, Ying
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2024/3
Y1 - 2024/3
N2 - Accurately predicting key features in WWTPs is essential for optimizing plant performance and minimizing operational costs. This study assesses the potential of various machine learning models for predicting the inflow to anoxic sludge reactors. Firstly, it conducts a comprehensive evaluation of diverse machine learning models, including k-Nearest Neighbors (kNN), Random Forest (RF), XGBoost, CatBoost, LightGBM, and Decision Tree Regression (DTR), for predicting the flow into the Anoxic section under various weather conditions (dry, rainy, and stormy). Secondly, the study introduces parsimonious models guided by variable importance from the XGBoost algorithm. Furthermore, the study employs SHAP (SHapley Additive exPlanations) to elucidate model predictions, providing insights into the contribution of each feature. Data from the COST Benchmark Simulation Model (BSM1) is used to verify the investigated models' effectiveness. Each dataset consists of 14 days of influent data at 15-minute intervals, with 80% of the data used for model training. Results show that ensemble learning methods, particularly CatBoost and XGBoost, demonstrate satisfactory predictive results for Anoxic section flow in the presence of increased variability under rainy and stormy conditions. Notably, the CatBoost and XGBoost models achieve average Mean Absolute Percentage Error values of 1.33% and 1.59%, outperforming the other methods.
AB - Accurately predicting key features in WWTPs is essential for optimizing plant performance and minimizing operational costs. This study assesses the potential of various machine learning models for predicting the inflow to anoxic sludge reactors. Firstly, it conducts a comprehensive evaluation of diverse machine learning models, including k-Nearest Neighbors (kNN), Random Forest (RF), XGBoost, CatBoost, LightGBM, and Decision Tree Regression (DTR), for predicting the flow into the Anoxic section under various weather conditions (dry, rainy, and stormy). Secondly, the study introduces parsimonious models guided by variable importance from the XGBoost algorithm. Furthermore, the study employs SHAP (SHapley Additive exPlanations) to elucidate model predictions, providing insights into the contribution of each feature. Data from the COST Benchmark Simulation Model (BSM1) is used to verify the investigated models' effectiveness. Each dataset consists of 14 days of influent data at 15-minute intervals, with 80% of the data used for model training. Results show that ensemble learning methods, particularly CatBoost and XGBoost, demonstrate satisfactory predictive results for Anoxic section flow in the presence of increased variability under rainy and stormy conditions. Notably, the CatBoost and XGBoost models achieve average Mean Absolute Percentage Error values of 1.33% and 1.59%, outperforming the other methods.
KW - Boosting approaches
KW - CatBoost
KW - Explainable machine learning
KW - Key feature prediction
KW - LightGBM
KW - SHAP
KW - Water treatment plants
KW - XGBoost
UR - http://www.scopus.com/inward/record.url?scp=85187208328&partnerID=8YFLogxK
U2 - 10.1016/j.rineng.2024.101930
DO - 10.1016/j.rineng.2024.101930
M3 - Article
AN - SCOPUS:85187208328
SN - 2590-1230
VL - 21
JO - Results in Engineering
JF - Results in Engineering
M1 - 101930
ER -