Journal Paper Digests

Journal Paper Digests 2017 #16

Bayesian modelling of Dupuytren disease by using Gaussian copula graphical models
Quantification of Soil Permanganate Oxidizable C (POXC) Using Infrared Spectroscopy
Construction of Membership Functions for Soil Mapping using the Partial Dependence of Soil on Environmental Covariates Calculated by Random Forest
Neighborhood Size of Training Data Influences Soil Map Disaggregation

Bayesian modelling of Dupuytren disease by using Gaussian copula graphical models

Authors: Mohammadi, A; Abegaz, F; van den Heuvel, E; Wit, EC

Source: JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 66 (3):629-645; APR 2017

Abstract: Dupuytren disease is a fibroproliferative disorder with unknown aetiology that often progresses and eventually can cause permanent contractures of the fingers affected. We provide a computationally efficient Bayesian framework to discover potential risk factors and investigate which fingers are jointly affected. Our Bayesian approach is based on Gaussian copula graphical models, which provide a way to discover the underlying conditional independence structure of variables in multivariate data of mixed types. In particular, we combine the semiparametric Gaussian copula with extended rank likelihood to analyse multivariate data of mixed types with arbitrary marginal distributions. For structural learning, we construct a computationally efficient search algorithm by using a transdimensional Markov chain Monte Carlo algorithm based on a birth-death process. In addition, to make our statistical method easily accessible to other researchers, we have implemented our method in C++ and provide an interface with R software as an R package BDgraph, which is freely available from http://CRAN.R-project.org/package=BDgraph.

Quantification of Soil Permanganate Oxidizable C (POXC) Using Infrared Spectroscopy

Authors: Calderon, FJ; Culman, S; Six, J; Franzluebbers, AJ; Schipanski, M; Beniston, J; Grandy, S; Kong, AYY

Source: SOIL SCIENCE SOCIETY OF AMERICA JOURNAL, 81 (2):277-288; MAR-APR 2017

Abstract: Labile soil carbon is an important component of soil organic matter because it embodies the mineralizable material that is associated with short-term fertility. Permanganate-oxidizable C (POXC) is a widely used method for the study of labile C dynamics in soils. Rapid methods are needed to measure labile C, and better understand how this pool varies with soil C at regional scales. Infrared spectroscopy is an inexpensive way to quantify SOC and observe fluctuations in C functional groups. Using a sample set that encompassed several soil types and plant communities (seven different research projects, n = 496), soils were analyzed via diffuse reflectance Fourier transformed mid-infrared (MidIR, 4000-400 cm(-1)) and near-infrared (NIR, 10000-4000 cm(-1)) spectroscopy. Spectral data were used to develop calibrations for POXC, soil organic C (SOC), and total N (TN) using partial least squares (PLS) regression. The MidIR predicted POXC slightly better than the NIR, with calibration and/or validation R-2 values ranging from 0.77 to 0.81 depending on spectral pretreatments. Predictions for POXC were better than SOC and TN, but site variability influenced the calibration quality for SOC and TN. Using a selected MidIR region, which included bands correlated to POXC (3225-2270 cm(-1)), reduced the calibration quality, but still gave acceptable R-2 values of 0.76 to 0.77 for the calibration and validation sets. We show that POXC can be predicted using NIR and MidIR spectra. Selecting informative spectral bands offers an alternative to using full spectra for PLS regressions.

Construction of Membership Functions for Soil Mapping using the Partial Dependence of Soil on Environmental Covariates Calculated by Random Forest

Authors: Zeng, CY; Yang, L; Zhu, AX

Source: SOIL SCIENCE SOCIETY OF AMERICA JOURNAL, 81 (2):341-353; MAR-APR 2017

Abstract: Partial dependence plots generated by Random Forest (RF) imply an association between soil and environmental variables. This study develops a method to construct membership functions representing knowledge of soil-environment relationships from partial dependence. Key parameters were obtained from normalized partial dependence to define class limits and membership gradation. Seven environmental variables were selected on the basis of the variable’s importance within RF. Two cases were conducted to test the effectiveness of our method using different training samples. Case 1 used 33 representative locations as training samples and 50 locations as validations. Case 2 randomly split all 83 samples into training and validation subsets at a proportion of 2: 1; the splits were repeated seven times. For each case, the generated membership functions were used for mapping soil subgroups in Heshan, China, under the Soil Landscape Inference Model framework; RF was conducted for comparison. The results showed that mapping accuracy based on the membership functions (78%) was much higher than that of RF only (60%) in Case 1. In Case 2, the mapping accuracies using membership functions (an average of 67%, SD = 6.5%) were not always higher than those by RF (an average of 67%, SD = 8.0%). The constructed membership functions were impacted by the training samples. Use of representative training samples is recommended when applying the proposed method. However, training samples (including representative samples and other samples) with good coverage in the environmental feature space would allow RF to obtain more accurate soil maps than using representative samples.

Neighborhood Size of Training Data Influences Soil Map Disaggregation

Authors: Levi, MR

Source: SOIL SCIENCE SOCIETY OF AMERICA JOURNAL, 81 (2):354-368; MAR-APR 2017

Abstract: Soil class mapping relies on the ability of sample locations to represent portions of the landscape with similar soil types; however, most digital soil mapping (DSM) approaches intersect sample locations with one raster pixel per covariate layer regardless of pixel size. This approach does not take the variability of covariate information adjacent to the training data into account. The objective here was to disaggregate a soil map in a semiarid Arizona rangeland (78,569 ha) by exploring different neighborhood sizes for extracting covariate data to points. Eight machine learning algorithms were compared to assess the influence of summarizing covariate data in 0-, 15-, 30-, 60-, 90-, 120-, 150-, and 180-m circular neighborhoods and a multiscale model. K values of all models ranged between 0.24 and 0.44 and increased with neighborhood size up to 150 m. Support vector machine and random forest algorithms performed best across all scales. The radial support vector machine model using a 150-m neighborhood had the highest K and produced a more generalized map compared with the best multiscale model (random forest), which resulted in a mix of general and detailed soil features. Evaluating a range of neighborhood sizes for aggregating covariate data provides a method of accounting for multiscale processes that are important for predicting soil patterns without modifying the pixel size of the final maps. Incorporating concepts from traditional soil surveys with DSM approaches can strengthen ties between them and optimize the extraction of landscape information for predicting soil properties.