Comparing machine learning models and physics-based models in groundwater science




Boerman, Thomas Christiaan

Journal Title

Journal ISSN

Volume Title



The use of machine learning techniques in tackling hydrological problems has significantly increased over the last decade. Machine learning tools can provide alternatives or surrogates to complex and comprehensive methodologies such as physics-based numerical models. Machine learning algorithms have been used in hydrology for estimating streamflow, runoff, water table fluctuations and calculating the impacts of climate change on nutrient loading among many other applications. In recent years we have also seen arguments for and advances in combining physics-based models and machine learning algorithms for mutual benefit. This thesis contributes to these advances by addressing two different groundwater problems by developing a machine learning approach and comparing this previously developed physics-based models: i) estimating groundwater and surface water depletion caused by groundwater pumping using artificial neural networks and ii) estimating a global steady-state map of water table depth using random forests. The first chapter of this thesis outlines the purpose of this thesis and how this thesis is a contribution to the overall scientific knowledge on the topic. The results of this research contribute to three of the twenty-three major unsolved problems in hydrology, as has been summarized by a collective of hundreds of hydrologists. In the second chapter, we tested the potential of artificial neural networks (ANNs), a deeplearning tool, as an alternative method for estimating source water of groundwater abstraction compared to conventional methods (analytical solutions and numerical models). Surrogate ANN models of three previously calibrated numerical groundwater models were developed using hydrologically meaningful input parameters (e.g., well-stream distance and hydraulic diffusivity) selected by predictor parameter optimization, combining hydrological expertise and statistical methodologies (ANCOVA). The output parameters were three transient sources of groundwater abstraction (shallow and deep storage release, and local surface-water depletion). We found that the optimized ANNs have a predictive skill of up to 0.84 (R2, 2σ = ± 0.03) when predicting water sources compared to physics-based numerical (MODFLOW) models. Optimal ANN skill was obtained when using between five and seven predictor parameters, with hydraulic diffusivity and mean aquifer thickness being the most important predictor parameters. Even though initial results are promising and computationally frugal, we found that the deep learning models were not yet sufficient or outperforming numerical model simulations. The third chapter used random forests in mapping steady-state water table depth on a global scale (0.1°-spatial resolution) and to integrate the results to improve our understanding on scale and perceptual modeling of global water table depth. In this study we used a spatially biased ~1.5-million-point database of water table depth observations with a variety of iv globally distributed above- and below-ground predictor variables with causal relationships to steady-state water table depth. We mapped water table depth globally as well as at regional to continental scales to interrogate performance, feature importance and hydrologic process across scales and regions with varying hydrogeological landscapes and climates. The global water table depth map has a correlation (cross validation error) of R2 = 0.72 while our highest continental correlation map (Australia) has a correlation of R2 = 0.86. The results of this study surprisingly show that above-ground variables such as surface elevation, slope, drainage density and precipitation are among the most important predictor parameters while subsurface parameters such as permeability and porosity are notably less important. This is contrary to conventional thought among hydrogeologists, who would assume that subsurface parameters are very important. Machine learning results overall underestimate water table depth similar to existing global physics-based groundwater models which also have comparable differences between existing physics-based groundwater models themselves. The feature importance derived from our random forest models was used to develop alternative perceptual models that highlight different water table depth controls between areas with low relief and high relief. Finally, we considered the representativeness of the prediction domain and the predictor database and found that 90% of the prediction domain has a dissimilarity index lower than 0.75. We conclude that we see good extrapolation potential for our random forest models to regions with unknown water table depth, except for some high elevation regions. Finally in chapter four, the most important findings of chapters two and three are considered as contributions to the unresolved questions in hydrology. Overall, this thesis has contributed to advancing hydrological sciences through: i) mapping of global steady-state water table depth using machine learning; ii) advancing hybrid modeling by using synthetic data derived from physics-based models to train an artificial neural network for estimating storage depletion; and (iii) it contributing to answering three unsolved problems in hydrology involving themes of parameter scaling across temporal and spatial scales, extracting hydrological insight from data, the use of innovative modeling techniques to estimate hydrological fluxes/states and extrapolation of models to no-data regions.



machine learning, hydrology, groundwater, numerical models, random forests, neural networks, data science, earth science, artificial intelligence