Predictive modeling of per- and polyfluoroalkyl substances (PFAS) in surface water using machine learning approaches
Details
Chemische waterkwaliteit
Rapporten
“Contamination of per- and polyfluoroalkyl substances (PFAS) in surface water is a critical concern, posing risks to both public health and wildlife. PFAS exposure has been linked to various health issues including liver damage, decreased birth weight, and increased risk of cancer. Traditional monitoring methods have fallen short in accurately assessing the extent of PFAS contamination in the Netherlands due to cost and time constraints, as well as variations in analysis methods leading to different detection limits. Consequently, monitoring data gaps have potentially lead to unrecognized risk areas. This study investigates the application of supervised machine learning models, including XGBoost and random forest regressor, for continuous prediction of perfluorooctanoic acid (PFOA), a prominent PFAS compound. These models are compared with a dummy baseline model for their relative predictive ability by leveraging a dataset that combines water contaminant measurements with meteorological, geological, and hydrological site characteristics.
In general, the XGBoost model demonstrates superior results compared to the baseline dummy model and the random forest model, achieving an adjusted R2 score exceeding 0.70. Feature importance analysis reveals strong interrelations among PFOA and other PFAS compounds, and underscores the significance of non-PFAS substances, like nickel, in model prediction. Furthermore, the study demonstrates the effectiveness of a pared-down feature subset in enhancing the model’s applicability. This streamlined model bridges gaps in historical monitoring data and generates hazard maps pinpointing potential high-risk areas across the Netherlands. The current study proposes integrating the XGBoost model into
risk-based monitoring to prioritize future testing initiatives and mitigate the risks posed by PFOA in aquatic systems. Consequently, this approach holds promise in reinforcing water quality management strategies and enhancing the understanding of contamination events in the context of environmental forensics.”
(Citation: Pan, X. – Predictive modeling of per- and polyfluoroalkyl substances (PFAS) in surface water using machine learning approaches – Master Thesis (27 August 2023) – Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica (FNWI))