TY - JOUR
T1 - Monitoring influent conditions of wastewater treatment plants by nonlinear data-based techniques
AU - Cheng, Tuoyuan
AU - Dairi, Abdelkader
AU - Harrou, Fouzi
AU - Sun, Ying
AU - Leiknes, TorOve
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2019/8/6
Y1 - 2019/8/6
N2 - To operate wastewater treatment plants (WWTPs) with optimized efficiency, influent conditions (ICs) as initial states of inflow fed to WWTPs were monitored to identify potential anomalies that would trigger adverse events or system crash. To employ voluminous measurements for data-driven decisions, the non-linear, non-Gaussian, non-stationary, auto-correlated, cross-correlated, hetero-skedastic, case-specific nature of multivariate environmental datasets must be considered. This research proposed kernel machine learning models, the kernel principal components analysis based one-class support vector machine (KPCA-OCSVM) with various kernels, to learn anomaly-free training set then classify the testing set. A seven-years multivariate ICs time series was introduced with exploratory analysis performed to reveal temporal behaviors and statistical properties. KPCA with polynomial kernels sufficiently output representative features, based on which OCSVM with Gaussian kernels sensitively and specifically identified anomalies in ICs that were previously omitted by WWTP operators. The proposed kernel algorithms surpassed previous linear PCA-based K-nearest-neighbors models, and improved outcomes with limited increase in computation cost. Without requiring linear, Gaussian, stationary, independent, and homo-skedastic qualities from data, the proposed flexible environmental data science approach could be transferred, rebuilt, and tuned conveniently for ICs from different WWTPs.
AB - To operate wastewater treatment plants (WWTPs) with optimized efficiency, influent conditions (ICs) as initial states of inflow fed to WWTPs were monitored to identify potential anomalies that would trigger adverse events or system crash. To employ voluminous measurements for data-driven decisions, the non-linear, non-Gaussian, non-stationary, auto-correlated, cross-correlated, hetero-skedastic, case-specific nature of multivariate environmental datasets must be considered. This research proposed kernel machine learning models, the kernel principal components analysis based one-class support vector machine (KPCA-OCSVM) with various kernels, to learn anomaly-free training set then classify the testing set. A seven-years multivariate ICs time series was introduced with exploratory analysis performed to reveal temporal behaviors and statistical properties. KPCA with polynomial kernels sufficiently output representative features, based on which OCSVM with Gaussian kernels sensitively and specifically identified anomalies in ICs that were previously omitted by WWTP operators. The proposed kernel algorithms surpassed previous linear PCA-based K-nearest-neighbors models, and improved outcomes with limited increase in computation cost. Without requiring linear, Gaussian, stationary, independent, and homo-skedastic qualities from data, the proposed flexible environmental data science approach could be transferred, rebuilt, and tuned conveniently for ICs from different WWTPs.
UR - http://hdl.handle.net/10754/656724
UR - https://ieeexplore.ieee.org/document/8789409/
UR - http://www.scopus.com/inward/record.url?scp=85071236068&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2019.2933616
DO - 10.1109/ACCESS.2019.2933616
M3 - Article
SN - 2169-3536
VL - 7
SP - 108827
EP - 108837
JO - IEEE Access
JF - IEEE Access
ER -