Journal Paper Digests

Journal Paper Digests 2026 #15

NASA’s EMIT hyperspectral observations of phytoplankton ecology in estuarine waters
Bayesian uncertainty analysis of soil organic carbon stocks and stock changes from croplands in the U.S. Midwest
The role of soil structure for the response of CO2 emissions to soil moisture and its relevance for modelling soil carbon dynamics
SMART-Soil: Satellite-based machine learning approach for reliable tracking of soil moisture across CONUS
Coupling near infrared spectroscopy with machine learning algorithms for simultaneously detecting multiple microplastics in soil
Causal Discovery Methods for Functional Performance of Evapotranspiration Models
An independent evaluation of global 1 km soil moisture products using in-situ and airborne observations
Interpolation of large-scale airborne geophysical data with uncertainty quantification

Interpolation of large-scale airborne geophysical data with uncertainty quantification

The collection of airborne geophysical data, ranging from ice sheet thickness measurements to magnetic anomaly detection for critical mineral exploration, continues to be of increasing importance. Interpolation of airborne geophysical data is essential, due to wide gaps between survey flight lines. Traditional interpolations are deterministic, meaning they produce a single map of predicted values between the data locations. This substantially limits the utility of the interpolation, as there is no acknowledgment of the fact that there is no true knowledge (e.g., no data was sampled) at these locations. These deterministic interpolation methods are thus limiting for downstream actions, hindering decision making and obscuring reality, which is uncertain. Geostatistical tools, such as sequential Gaussian simulation, have been deployed to quantify spatial uncertainty via stochastic rastering, but suffer from interpolation artifacts due to the geometric anisotropy of airborne survey data acquisition (e.g., dense along-track, sparse across-track). Existing methods that aim to reduce interpolation artifacts have been developed for deterministic interpolation schemes, but have yet to be carefully considered for stochastic schemes, leaving a clear gap in the interpolation toolbox: a method which properly quantifies spatial uncertainty while simultaneously suppressing unphysical interpolation artifacts. To address this gap, we introduce a new multigrid-based simulation method. This multigrid simulation is a stochastic geostatistical approach that mitigates interpolation artifacts while generating an ensemble of many realizations by treating the domain as a random field. Multigrid simulation is able to handle interpolation across large-scale domains, including non-stationary domains. We demonstrate its effectiveness on two datasets, one synthetic and one from a real field survey.

An independent evaluation of global 1 km soil moisture products using in-situ and airborne observations

High-resolution soil moisture data are essential for applications in agriculture, hydrology, and disaster management. Four global daily SM products at 1 km resolution have recently been developed, being the Seamless Soil Moisture (SSM), Global Surface Soil Moisture (GSSM), Global Land Surface Satellite (GLASS), and a downscaled SMAP product (DSMAP). These products rely on either machine learning or empirical regression models, offering significant potential but raising concerns regarding their generalization capability and spatial fidelity. Previous evaluations of these high-resolution products have relied predominantly on point-scale comparisons using the same in-situ networks employed for model training. Consequently, this study provides an independent evaluation using 1545 global in-situ stations excluded from product development and airborne passive microwave measurements from five field campaigns across North America and Australia. Results reveal that none of the evaluated products met the target unbiased Root Mean Square Error (ubRMSE) of 0.04–0.06 m3/m3, with observed values ranging from 0.097 to 0.104 m3/m3. All products exhibited narrower dynamic ranges (0.10–0.30 m3/m3) than those of in-situ observations (0.05–0.40 m3/m3), particularly underestimating wet and overestimating dry extremes. GLASS (R = 0.576) and DSMAP (R = 0.556) generally outperformed GSSM (R = 0.504) and SSM (R = 0.399) in capturing temporal dynamics relative to ground measurements. Spatially, airborne-based evaluation highlighted limitations in capturing fine-scale heterogeneity, particularly for SSM (mean R = 0.19) and GSSM (mean R = 0.31), which showed a narrow dynamic range and nearly static spatial pattern with weak response to regional rainfall. In contrast, DSMAP effectively captured the temporal dynamics of airborne data (mean R = 0.57) but retained coarse resolution artifacts from its downscaling process. Expanding training datasets, enhancing the generalization capability of the machine learning methods employed, and conducting rigorous spatial evaluations are identified as critical steps to ensure the reliability of high-resolution soil moisture products for operational applications.

Causal Discovery Methods for Functional Performance of Evapotranspiration Models

Evapotranspiration (ET) plays a key role in agricultural water resources management. However, it is challenging to predict as it is driven by water and energy availability as well as soil, vegetation, and meteorological factors, and models vary widely in complexity and assumptions. Causal discovery methods can identify drivers and interactions based on time-series data from both observations and models, and can be used as metrics of model “functional performance” that evaluate how models capture source-target relationships. With many approaches to causal discovery, it is important to compare how functional performance metrics align with predictive accuracy and behave across temporal scales. We compare four methods (Granger causality, Transfer Entropy, PCMCI, and Convergent Cross Mapping) to analyze the functional performance of ET models in a corn-soybean agricultural landscape based on 7 years of eddy covariance measurements, which we use as an empirical reference benchmark. We identify causal sources, among observed weather and soil variables, for Priestly-Taylor (PT), Surface Flux Equilibrium (SFE), Soil Water Balance (SWB), and satellite-based ET products from OpenET, and evaluate how closely model-derived and observation-based causal structures align. Methods consistently identify model forcings as sources, but otherwise vary widely in terms of sources and strengths across sub-hourly to weekly timescales. OpenET products have high functional performance, indicating that they capture key processes although they are not forced by tower observations. Finally, some functional metrics align better with predictive performance than others, which highlights the importance of selecting robust metrics that both capture interactions and align with predictive accuracy.

Coupling near infrared spectroscopy with machine learning algorithms for simultaneously detecting multiple microplastics in soil

Purpose Aiming at the problems of spectral overlap caused by the coexistence of multiple microplastics (MPs) in soil and low efficiency of traditional detection methods, this study explores the feasibility of an efficient detection method combining near-infrared (NIR) spectroscopy and machine learning (ML) for the simultaneous qualitative and quantitative analysis of multiple MPs in soil.

Methods Taking polypropylene (PP), polyethylene terephthalate (PET) and polyvinyl chloride (PVC) as target pollutants, NIR spectra were collected for eight types of samples (MPs-Free, single/two/three types of MPs-contaminated). After spectral preprocessing with the multivariate scatter correction + standard normal variate (MSC + SNV) method, four ML models, namely partial least squares (PLS), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost), were developed for the qualitative and quantitative analysis of soil MPs, and their performance was systematically compared.

Results In qualitative classification, all ML models achieved excellent performance with overall accuracies higher than 94%. Among them, PLS performed best, with a macro-averaged F1-score of 98.50 ± 1.16% and an accuracy of 98.55 ± 0.61%, followed by Linear-SVM with 98.02 ± 1.20% accuracy. XGBoost and RF yielded accuracies of 95.00 ± 1.95% and 94.14 ± 1.99%, respectively. In quantitative prediction, model performance was significantly affected by the type and number of coexisting MPs. For single-type MPs, SVM and PLS showed optimal accuracy and stability; for two-type MPs, SVM outperformed other models with higher RPD, lower limit of detection (LOD), and better low-concentration prediction; for three-type MPs, RF and PLS were more suitable for PP and PET, while SVM achieved the best performance for PVC (RPD = 4.6503). Overall, model prediction accuracy and cross-validation stability decreased gradually, while LOD increased slightly with an increasing number of coexisting MPs types.

Conclusion The combination of NIR spectroscopy and ML algorithms can reliably achieve simultaneous qualitative and quantitative analysis of multiple MPs in soil. Different models show distinct adaptability to the complexity of mixed MPs systems: linear models (PLS) excel in single-type and simple mixture scenarios, while SVM presents strong robustness in both binary and ternary mixtures. The established NIR-ML framework offers a simple, efficient, and low-cost strategy for rapid screening of multi-component MPs pollution in soil, with good application potential in environmental monitoring and risk assessment.

SMART-Soil: Satellite-based machine learning approach for reliable tracking of soil moisture across CONUS

Estimation of daily soil moisture (SM) is crucial because it directly affects agricultural productivity, irrigation scheduling, and drought risk assessment. However, most existing satellite-based SM products suffer from two key limitations: (i) strong reliance on radar retrievals, which restricts temporal coverage to the post-2015 period and therefore prevents long-term historical analysis; and (ii) fine-resolution SM maps that are typically obtained via statistical or dynamical downscaling of coarse products, introducing systematic errors and scale-dependent biases. To overcome these limitations, this study proposes a novel framework that estimates daily SM at multiple depths using only land and atmospheric variables from the Moderate Resolution Imaging Spectroradiometer (MODIS). By fully bypassing radar-based inputs and avoiding downscaling of coarse-resolution SM products, our approach natively generates 1-km SM estimates extending back to the early 2000s, enabling consistent, long-term, high-resolution monitoring over the Contiguous United States (CONUS). This study aims to estimate SM in four different depths. The Light Gradient Boosting Machine (LightGBM) optimized with Covariance Matrix Adaptation Evolution Strategy (CMA-ES) was selected to perform the soil moisture estimation on 5, 10, 20 and 50 cm depths. While the increase in depth results in slight accuracy drops, the LightGBM-CMA-ES could perform an accurate estimation. Based on results and considering the surface and deepest soil layer, RMSE ranges from 0.029 to 0.037 m3/m3 while the R2 spans from 0.86 to 0.79, respectively. Spatial analysis of estimations and their associated errors reveals that despite the models’ good estimation accuracy, soils with fine and coarse textures have respectively negative and positive biases. Temporal evaluation showed that the model can perfectly estimate monthly mean SM levels but has some evidence of under-estimation in modelling monthly maxima’s specifically in spring months. Output of this estimation framework would be SM maps with MODIS’s reference spatial resolution of 1 km, which enables better and more sustainable agricultural productivity, allows better water resources management and facilitates more efficient environmental monitoring and disaster risk management.

The role of soil structure for the response of CO2 emissions to soil moisture and its relevance for modelling soil carbon dynamics

Most soil carbon (C) models describe the effects of soil moisture on C mineralization rates using empirical response functions derived from laboratory incubations carried out on sieved soils. This pre-treatment alters the pore space structure controlling solute and oxygen diffusion and may also disturb the spatial distribution of microbial activity. Our objective was therefore to investigate the effects of disruption of the natural soil structure on the soil moisture response function for C mineralization. We measured CO2 emissions at soil water pressure heads ranging from zero to −600 cm for both sieved and intact soil samples taken from tilled and untilled soil horizons at a field site in northern France. The derived soil moisture response curves were then combined with a simple analytical water balance model to predict CO2 emissions in contrasting rainfall climates. We also explored the relationships between the parameters of the moisture response function and soil physical properties as well as metrics of soil structure quantified by X-ray scanning. Sieving significantly affected the shape of the moisture response function. In particular, the optimal degree of saturation for soil CO2 emissions lay much closer to saturation (> 0.8) in the case of intact soil structures. The effects of repeated tillage were like those of sieving, although less pronounced. We identified relationships between some indicators of soil structure and the optimal degree of saturation, which suggests that it may be possible to derive pedotransfer functions linking soil properties to this critical model parameter. Finally, our modelling demonstrated that the use of moisture response functions derived from sieved soils in soil C models can lead to an underestimation of CO2 emissions in wet climates.

Bayesian uncertainty analysis of soil organic carbon stocks and stock changes from croplands in the U.S. Midwest

Quantifying uncertainty in process-based model predictions is essential for evaluating confidence in model predictions and identifying priorities for model improvements. This study applied a Bayesian model analysis framework to quantify and partition multiple sources of uncertainty in DayCent model estimates of soil organic carbon (SOC) stocks and stock changes (0–30 cm) for croplands across the U.S. Midwest from 1990 to 2020. The region gained SOC at an average rate of 10.38 (95% prediction interval (PI) of 4.37–17.83) Tg C year−1, equivalent to 0.27 (95% PI of 0.11–0.46) t C ha−1 year−1 or a relative increase of 0.46% (95% PI of 0.42%–0.57%). Using Monte Carlo simulation, total predictive uncertainty was decomposed into four components: composite structural uncertainty, parameter uncertainty, input uncertainty associated with management practice adoption, and spatial scaling uncertainty associated with National Resources Inventory (NRI) sample design. At the point level, uncertainties for SOC stocks and stock changes were 29.5 and 11.7 t C ha−1, respectively, while at the regional scale they were 10.6 and 0.09 t C ha−1. Uncertainty decomposition showed that the composite structural component was the dominant source of uncertainty for regional SOC stock changes (57.5%), followed by parameters (38.8%), inputs (3.2%), and scaling (0.5%). For SOC stocks, parameter uncertainty dominated at the regional scale (69.8%), followed by composite structural uncertainty (26.0%), inputs (3.7%), and scaling (0.6%). Furthermore, temporal aggregation substantially reduced uncertainty, stabilizing the level of reduction after approximately five years of averaging, whereas spatial uncertainty required aggregation of a relatively large number of sites, about 5,000 to 10,000 sites, to reduce the uncertainty to a stable level. These findings highlight two approaches to reduce parameter and composite structural uncertainties in model-based assessments: advance model development to improve process representation and expand the quantity and quality of SOC observations.

NASA’s EMIT hyperspectral observations of phytoplankton ecology in estuarine waters

NASA’s EMIT hyperspectral spectrometer provides high spectral (∼7.4 nm) and spatial (60 m) resolution, capabilities that have not been routinely applied to water quality monitoring in shallow aquatic systems, where the bio-optical properties are highly complex. In this study, we developed Hyper-MoE-VAE, a deep-learning inversion framework integrating a mixture-of-experts architecture with variational autoencoders for globally applicable hyperspectral water-quality retrievals. The model accommodates diverse water types and addresses one-to-many inversions to retrieve chlorophyll-a (Chl a) and phytoplankton absorption coefficient ( ) from hyperspectral remote sensing reflectance ( ). Hyper-MoE-VAE was trained on a global bio-optical dataset and applied to EMIT imagery over Lake Pontchartrain. Same-day field matchups acquired on 14 April 2025 provide the first validation of EMIT-derived Chl a (MAPE = 7.66 %; MAE = 1.13 in log10 space, unitless) and across all EMIT bands, with representative performance (NRMSE = 0.09–0.11 m−1; ε = 29–36 %), revealing a muted Chl a response under highly turbid freshwater conditions and a clear shift toward chlorophyte dominance following fresh inputs. In addition, PACE-OCI, also hyperspectral with lower spatial resolution, was compared with EMIT for Chl a retrievals. Both sensors show comparable spatial patterns despite differences in spectral and spatial resolution, supporting the Hyper-MoE-VAE’s cross-mission applicability. These findings demonstrate EMIT’s strong potential for characterizing phytoplankton dynamics, both in abundance and community composition, and for monitoring harmful algal blooms (HABs) in aquatic systems.