Using R for Digital Soil Mapping: Book materials, data and R scripts.
This book describes and provides many detailed examples of implementing Digital Soil Mapping (DSM) using R. The work adheres to Digital Soil Mapping theory, and presents a strong focus on how to apply it. DSM exercises are also included and cover procedures for handling and manipulating soil and spatial data in R. The book also introduces the basic concepts and practices for building spatial soil prediction functions, and then ultimately producing digital soil maps.
For a hands on experience of covering the materials of the book, the following sections contain documents and R code of topics covered in the book. Topics covered include:
- R Literacy for digital soil mapping
- Getting spatial in R
- Preparatory and exploratory analysis for digital soil mapping
- Continuous soil attribute modelling and mapping
- Categorical soil attribute modelling and mapping
- Some methods for uncertainty quantification relevant for digital soil mapping
- Soil map disaggregation
- Combining continuous and categorical soil attribute modelling and mapping
- Digital soil assessments
The ithir R package
The ithir R package is a necessary installation for being able to use the various R for digital soil mapping examples below. This is because it has all the required data sets and some of the required R functions.
One you have installed a version of R onto your computer (see part 1 of R literacy for digital soil mapping section for instructions on how to do this if needed), you will need to use the following lines of R code:
> install.packages("devtools") > library(devtools) > install_bitbucket("brendo1001/ithir/pkg") #ithir package
An example script can also be downloaded from here
There are many places on the internet to get a crash course in R. You can also try what is located here. The R examples are illustrated using soil data which is helpful if you want the contextual learning. The course is broken in to 8 sections.
- R basics: commands, expressions, assignments, operators, objects
- R Data Types
- R data structures
- Functions, arguments, and packages
- Getting help.
- Vectors, matrices, and arrays
- Vector arithmetic, some common functions and vectorised formats
- Matrices and arrays.
- Data frames, data import, and data export
- Creating data frames manually
- Working with data frames
- Graphics: the basics
- Manipulating data. Modes, classes, attributes, length, and coercion
- Indexing, sub-setting, sorting and locating data
- Combining data
- Exploratory data analysis
- Summary statistics
- Histograms and boxplots
- Normal quantile and cumulative probability plots
- The basics of linear models
The section introduces how to construct a function. A function is at the heart of R and is akin to a set of instructions to run a particular task. Functions are incredibly powerful and the fact that one can create their own functions leaves open the door for some very creative thinking about how to solve a particular problem or to conduct a nuanced task. The example here is about how one would go about designing a soil sample down along a toposequence. The starting point could be the top of a hill, at the bottom of a hill or anywhere between. This seems pretty intuitive to do in your mind, but to code this in R or any language requires a bit of logical thought and creativity. These are important for learning R and for doing digital soil mapping things.
This section will introduce some basic concepts of using the R software for GIS operations. Nothing over the top but just a simple introduction to converting data to spatial objects, doing some spatial transformations, and data import and export of GIS data.
Another useful thing to do in R in the GIS context are the procedures around resampling and reprojections. Such tasks are useful in the situation of aligning spatial data to common resolutions and extents.
In this chapter some common methods for soil data preparation and exploration are covered. Of note there is some work around the fitting of soil profile depth functions to harmonise soil profile data. Then there is some work around statistical summaries of soil data, followed by some basic geostatistical analysis of soil property data.
Wanting to map soil properties such as soil carbon, and pH? Such properties are continuous soil variables. A number of model types can be considered for this task. In this chapter we will run through a few common model types:
- Multiple linear regression. This section also has a bit about how to apply models spatially. There are a few different ways to do this. Code is here and documentation is here.
- Decision trees. Code is here and documentation is here.
- Cubist Model. Code is here and documentation is here.
- Random Forest model. Code is here and documentation is here.
Another useful tool for digital soil mapping is the R Caret package. It hosts a number of model types that could be of use in addition to the ones specifically investigated above. Code is here and documentation is here.
Lastly we can then look at regression kriging. We can do this in two main ways:
- Formally via Universal kriging. Code is here and documentation is here.
- Hybrid approach where the trend is modeled separately to the auto correlated residuals. Code is here and documentation is here.
Soil classes are considered categorical variables. There are numerous other categorical variables that are in the purview of soil science too. The spatial prediction of categorical variables calls for a different suite of models that are used for continuous variables. This chapter assesses a few of these. There is also some discussion about how one might validate models where the target variable is a categorical variable.
- Model validation where the target variable is categorical. Code is here and documentation is here.
- Multinomial logistic regression. Code is here and documentation is here.
- C5 decision trees. Code is here and documentation is here.
- Random Forest models. Code is here and documentation is here.
The uncertainty of the digital soil maps that are created is a crucial exercise. One needs to know, and users need to know how reliable the maps they are using. Analytic approaches the quantifying uncertainties are rarely encountered in digital soil mapping because the models themselves are usually highly parametised. Numeric approaches appear to be growing and some of these are encountered in this chapter.
Specifically, the uncertainty approaches that are considered include:
- Universal kriging prediction variance. Code is here and documentation is here.
- Bootstrapping. Code is here and documentation is here.
- Data partitioning and cross validation approach. Code is here and documentation is here.
- Empirical uncertainty quantification through fuzzy clustering and cross validation. Code is here and documentation is here.
This chapter largely introduces the DSMART algorithm. This was designed for the purposes of disaggregating soil mapping units into unitised soil classes or soil series with the ultimate purpose of extracting spatially explicit soil attribute information. Some notes about the DSMART algorithm can be found at the software page. Some further notes can be found here. An associated R script can be found here.
Sometimes the target variable under consideration necessitates the use of a non-conventional modelling approach. One such attribute includes soil depth or even soil horizon depth. For soil horizons, first you may want to determine whether the horizon actually exists, then secondly, determine its depth (if it exists). You may want to consider a hybrid approach where both categorical and continuous modelling approaches are used.
Digital soil assessment is the inference of difficult to measure soil variables such as soil functions, from readily available digital soil maps. An example is provided here about crop suitability given available soil and climatic information. A associated R script of this example is provided here. Data can be found here.
While not strictly a digital soil assessment approach, the Homosoil approach is one where we want to determine the soil pattern in areas of minimal data. Using a soil forming factor-likeness procedure, the Homosoil approach exploits data rich areas to determine soil homologues with relatively minimal data. Some further notes can be found here. An associated R script can be found here.