### Using R for Digital Soil Mapping: Book materials, data and R scripts.

This book describes and provides many detailed examples of implementing Digital Soil Mapping (DSM) using R. The work adheres to Digital Soil Mapping theory, and presents a strong focus on how to apply it. DSM exercises are also included and cover procedures for handling and manipulating soil and spatial data in R. The book also introduces the basic concepts and practices for building spatial soil prediction functions, and then ultimately producing digital soil maps.

For a hands on experience of covering the materials of the book, the following sections contain documents and R code of topics covered in the book. Topics covered include:

## The ithir R package

The ithir R package is a necessary installation for being able to use the various R for digital soil mapping examples below. This is because it has all the required data sets and some of the required R functions.

One you have installed a version of R onto your computer (see part 1 of R literacy for digital soil mapping section for instructions on how to do this if needed), you will need to use the following lines of R code:

> install.packages("devtools")
> library(devtools)
> install_bitbucket("brendo1001/ithir/pkg") #ithir package


## R Literacy for digital soil mapping

There are many places on the internet to get a crash course in R. You can also try what is located here. The R examples are illustrated using soil data which is helpful if you want the contextual learning. The course is broken in to 8 sections.

#### Part 1

• R basics: commands, expressions, assignments, operators, objects
• R Data Types
• R data structures
• Functions, arguments, and packages
• Getting help.

Code is here and documentation is here

#### Part 2

• Vectors, matrices, and arrays
• Vector arithmetic, some common functions and vectorised formats
• Matrices and arrays.

Code is here and documentation is here

#### Part 3

• Data frames, data import, and data export
• Creating data frames manually
• Working with data frames

Code is here and documentation is here

#### Part 4

• Graphics: the basics

Code is here and documentation is here

#### Part 5

• Manipulating data. Modes, classes, attributes, length, and coercion
• Indexing, sub-setting, sorting and locating data
• Factors
• Combining data

Code is here and documentation is here

#### Part 6

• Exploratory data analysis
• Summary statistics
• Histograms and boxplots
• Normal quantile and cumulative probability plots

Code is here and documentation is here

#### Part 7

• The basics of linear models

Code is here and documentation is here

#### Part 8

The section introduces how to construct a function. A function is at the heart of R and is akin to a set of instructions to run a particular task. Functions are incredibly powerful and the fact that one can create their own functions leaves open the door for some very creative thinking about how to solve a particular problem or to conduct a nuanced task. The example here is about how one would go about designing a soil sample down along a toposequence. The starting point could be the top of a hill, at the bottom of a hill or anywhere between. This seems pretty intuitive to do in your mind, but to code this in R or any language requires a bit of logical thought and creativity. These are important for learning R and for doing digital soil mapping things.

Code is here and documentation is here

## Getting spatial in R

This section will introduce some basic concepts of using the R software for GIS operations. Nothing over the top but just a simple introduction to converting data to spatial objects, doing some spatial transformations, and data import and export of GIS data.

Code is here and and documentation is here.

Another useful thing to do in R in the GIS context are the procedures around resampling and reprojections. Such tasks are useful in the situation of aligning spatial data to common resolutions and extents.

Code is here. The specific data with which to do the exercises is found here.

## Preparatory and exploratory analysis for digital soil mapping

In this chapter some common methods for soil data preparation and exploration are covered. Of note there is some work around the fitting of soil profile depth functions to harmonise soil profile data. Then there is some work around statistical summaries of soil data, followed by some basic geostatistical analysis of soil property data.

Code is here and documentation is here.

## Continuous soil attribute modelling and mapping

Wanting to map soil properties such as soil carbon, and pH? Such properties are continuous soil variables. A number of model types can be considered for this task. In this chapter we will run through a few common model types:

1. Multiple linear regression. This section also has a bit about how to apply models spatially. There are a few different ways to do this. Code is here and documentation is here.
2. Decision trees. Code is here and documentation is here.
3. Cubist Model. Code is here and documentation is here.
4. Random Forest model. Code is here and documentation is here.

Before diving in these it is probably useful to get grounded in the concepts behind model evaluation and validation. Code is here and documentation is here.

Another useful tool for digital soil mapping is the R Caret package. It hosts a number of model types that could be of use in addition to the ones specifically investigated above. Code is here and documentation is here.

Lastly we can then look at regression kriging. We can do this in two main ways:

1. Formally via Universal kriging. Code is here and documentation is here.
2. Hybrid approach where the trend is modeled separately to the auto correlated residuals. Code is here and documentation is here.

## Categorical soil attribute modelling and mapping

Soil classes are considered categorical variables. There are numerous other categorical variables that are in the purview of soil science too. The spatial prediction of categorical variables calls for a different suite of models that are used for continuous variables. This chapter assesses a few of these. There is also some discussion about how one might validate models where the target variable is a categorical variable.

1. Model validation where the target variable is categorical. Code is here and documentation is here.
2. Multinomial logistic regression. Code is here and documentation is here.
3. C5 decision trees. Code is here and documentation is here.
4. Random Forest models. Code is here and documentation is here.

## Some methods for uncertainty quantification relevant for digital soil mapping

The uncertainty of the digital soil maps that are created is a crucial exercise. One needs to know, and users need to know how reliable the maps they are using. Analytic approaches the quantifying uncertainties are rarely encountered in digital soil mapping because the models themselves are usually highly parametised. Numeric approaches appear to be growing and some of these are encountered in this chapter.

Some introductory notes can be found here. There is also some additional data that will aid in running through the various uncertainty approaches here.

Specifically, the uncertainty approaches that are considered include:

1. Universal kriging prediction variance. Code is here and documentation is here.
2. Bootstrapping. Code is here and documentation is here.
3. Data partitioning and cross validation approach. Code is here and documentation is here.
4. Empirical uncertainty quantification through fuzzy clustering and cross validation. Code is here and documentation is here.

## Soil map disaggregation

This chapter largely introduces the DSMART algorithm. This was designed for the purposes of disaggregating soil mapping units into unitised soil classes or soil series with the ultimate purpose of extracting spatially explicit soil attribute information. Some notes about the DSMART algorithm can be found at the software page. Some further notes can be found here. An associated R script can be found here.

## Combining continuous and categorical soil attribute modelling and mapping

Sometimes the target variable under consideration necessitates the use of a non-conventional modelling approach. One such attribute includes soil depth or even soil horizon depth. For soil horizons, first you may want to determine whether the horizon actually exists, then secondly, determine its depth (if it exists). You may want to consider a hybrid approach where both categorical and continuous modelling approaches are used.

Some further notes can be found here. An associated R script can be found here. Data can be found here.