Intersecting soil point observations with environmental covariates
In order to carry out digital soil mapping in terms of evaluating the significance of environmental variables in explaining the spatial variation of the target soil variable under investigation, we need to link both sets of data together and extract the values of the covariates at the locations of the soil point data.
The first task is to bring in to our working environment some soil point
data. We will be using a data set of soil samples collected from the
Hunter Valley, NSW Australia. The area in question is an approximately
22k**m2 sub-area of the Hunter Wine Country Private
Irrigation District (HWCPID), situated in the Lower Hunter Valley, NSW
(32.8329S,151.354E).
The HWCPID is approximately 140 km north of Sydney, NSW, Australia. The
target variable is soil pH. The data was preprocessed such that the
predicted values are outputs of the mass-preserving depth
function.
You will find that most examples in this section and other sections of
this course that this data set is used quite a bit. The data is loaded
in from the ithir
package with the following script:
library(ithir)
data(HV_subsoilpH)
str(HV_subsoilpH)
## 'data.frame': 506 obs. of 14 variables:
## $ X : num 340386 340345 340559 340483 340734 ...
## $ Y : num 6368690 6368491 6369168 6368740 6368964 ...
## $ pH60_100cm : num 4.47 5.42 6.26 8.03 8.86 ...
## $ Terrain_Ruggedness_Index: num 1.34 1.42 1.64 1.04 1.27 ...
## $ AACN : num 1.619 0.281 2.301 1.74 3.114 ...
## $ Landsat_Band1 : int 57 47 59 52 62 53 47 52 53 63 ...
## $ Elevation : num 103.1 103.7 99.9 101.9 99.8 ...
## $ Hillshading : num 1.849 1.428 0.934 1.517 1.652 ...
## $ Light_insolation : num 1689 1701 1722 1688 1735 ...
## $ Mid_Slope_Positon : num 0.876 0.914 0.844 0.848 0.833 ...
## $ MRVBF : num 3.85 3.31 3.66 3.92 3.89 ...
## $ NDVI : num -0.143 -0.386 -0.197 -0.14 -0.15 ...
## $ TWI : num 17.5 18.2 18.8 18 17.8 ...
## $ Slope : num 1.79 1.42 1.01 1.49 1.83 ...
As you will note, there are 506 observations of soil pH. These data are associated with a spatial coordinate and also have associated environmental covariate data that have been intersected with the point data. The environmental covariates have been sourced from a digital elevation model and Landsat 7 satellite data. For the demonstration purposes of the exercise, we will firstly remove this already intersected data and start fresh - essentially this is an opportunity to recall earlier work on dataframe manipulation and indexing.
# round pH data to 2 decimal places
HV_subsoilpH$pH60_100cm <- round(HV_subsoilpH$pH60_100cm, 2)
# remove already intersected data
HV_subsoilpH <- HV_subsoilpH[, 1:3]
# add an id column
HV_subsoilpH$id <- seq(1, nrow(HV_subsoilpH), by = 1)
# re-arrange order of columns
HV_subsoilpH <- HV_subsoilpH[, c(4, 1, 2, 3)]
# Change names of coordinate columns
names(HV_subsoilpH)[2:3] <- c("x", "y")
Also in the ithir
package are a collection of rasters that correspond
to environmental covariates that cover the extent of the just loaded
soil point data. These can be loaded using the script:
data(hunterCovariates_sub)
hunterCovariates_sub
## class : RasterStack
## dimensions : 249, 210, 52290, 11 (nrow, ncol, ncell, nlayers)
## resolution : 25, 25 (x, y)
## extent : 338422.3, 343672.3, 6364203, 6370428 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=56 +south +ellps=WGS84 +datum=WGS84 +units=m +no_defs
## names : Terrain_Ruggedness_Index, AACN, Landsat_Band1, Elevation, Hillshading, Light_insolation, Mid_Slope_Positon, MRVBF, NDVI, TWI, Slope
## min values : 0.194371, 0.000000, 26.000000, 72.217499, 0.000677, 1236.662840, 0.000009, 0.000002, -0.573034, 8.224325, 0.001708
## max values : 15.945321, 106.665482, 140.000000, 212.632507, 32.440960, 1934.199950, 0.956529, 4.581594, 0.466667, 20.393652, 21.809752
This data set is a stack of 11 rasterLayers
which correspond to mainly
indices derived from a digital elevation model: elevation(elevation
),
terrain wetness index (TWI
), altitude above channel network (AACN
),
terrain ruggedness index (Terrain_Ruggedness_Index
), hillshading
(Hillshading
), incoming solar radiation (Light_insolation
),
mid-slope position (Mid_Slope_Positon
), multi-resolution valley bottom
flatness index (MRVBF
), and slope gradient (Slope
). Landsat 7
satellite spectral data, specifically band 1 (Landsat_Band1
} and
normalised difference vegetation index (NDVI
) are provided too, and
give some indication of the spatial variation of vegetation patterns
across the study area.
You may notice that there is a commonality between these rasterlayers
in terms of their CRS, dimensions and resolution. When you have multiple
rasterlayers
as individual rasters, but they are common to each other
in terms of resolution and extent, rather than working with each raster
independently it is much more efficient to stack them all into a single
object. The stack
function from the raster
package is ready-made for
this, of which the hunterCovariates_sub
stack is an output. This
harmony is an ideal situation for DSM where there may often be instances
where rasters from the some area under investigation may have different
resolutions, extents and even CRSs. It these situations it is common to
reproject and or resample to a common projection and resolution. The
functions from the raster
package which may be of use in these
situations are projectRaster
and resample
. The Reprojections,
resampling and rasterisation
page takes you through a number of different scenarios and workflows
using these functions and others.
While the example is a little contrived, it is useful to always determine whether or not the available covariates have complete coverage of the soil point data. This might be done with the following script which will produce a map like in the figure below.
# plot raster
plot(hunterCovariates_sub[["Elevation"]], main = "Hunter Valley elevation map with overlayed point locations")
## plot points
coordinates(HV_subsoilpH) <- ~x + y
plot(HV_subsoilpH, add = T)

With the soil point data and covariates prepared, it is time to perform the intersection between the soil observations and covariate layers using the script:
coordinates(HV_subsoilpH) <- ~x + y
DSM_data <- extract(x = hunterCovariates_sub, y = HV_subsoilpH, sp = 1, method = "simple")
The extract
function is quite useful. Essentially the function ingests
the rasterStack
object, together with the SpatialPointsDataFrame
object HV_subsoilpH
. The sp
parameter set to 1 means that the
extracted covariate data gets appended to the existing
SpatialPointsDataFrame
object. While the method
object specifies the
extraction method which in our case is simple which likened to get the
covariate value nearest to the points i.e it is likened to drilling
down.
A good practice is to then export the soil and covariate data intersect
object to file for later use. First we convert the spatial object to a
data.frame
, then export as a comma separated file.
DSM_data <- as.data.frame(DSM_data)
write.csv(DSM_data, "hunterValley_SoilCovariates_pH.csv", row.names = FALSE)
In the previous example the rasters we wanted to use are available data
from the ithir
package. More generally we will have the raster data we
need sitting on our computer, a USB or even on the cloud somewhere. The
steps for intersecting the soil observation data with the covariates are
the same as before, except we now need to specify the location where our
raster covariate data is located. We need not even have to load in the
rasters to memory, just point R
to where they are, and then run the
raster extract
function. This utility is obviously a very handy
feature when we are dealing with an inordinately large or large number
of rasters. The workhorse function we need is list.files
. First you
will need to download the testGrids.zip
folder, and unzip them into a directory of your choosing. Now try the
following script:
list.files(path = "~/testGrids", pattern = "\\.tif$", full.names = TRUE)
## [1] "~/AACN.tif"
## [2] "~/Elevation.tif"
## [3] "~/Hillshading.tif"
## [4] "~/Landsat_Band1.tif"
## [5] "~/Light_insolation.tif"
## [6] "~/Mid_Slope_Positon.tif"
## [7] "~/MRVBF.tif"
## [8] "~/NDVI.tif"
## [9] "~/Slope.tif"
## [10] "~/Terrain_Ruggedness_Index.tif"
## [11] "~/TWI.tif"
The parameter path
is essentially the directory location where the
raster files are sitting. If needed, we may also want to do recursive
listing into directories that are within that path directory. We want
list.files()
to return all the files (in our case) that have the
.tif extension. This criteria is set via the pattern
parameter, such
that $
at the end means that this is end of string, and but adding
\\
. ensures that you match only files with extension .tif, otherwise
it may list (if they exist), files that end in .atif as an example.
You may guess that any other type of pattern matching criteria could be
used to suit your own specific data. The full.names
logical parameter
is just a question of whether we want to return the full pathway address
of the raster file, in which case, we do.
All we then need to do is perform a raster stack of these individual rasters, then perform the intersection. This is really the handy feature where to perform the stack, we still need not require the loading of the rasters into the R memory. They are still on file!.
files <- list.files(path = "~/testGrids", pattern = "\\.tif$", full.names = TRUE)
# stack rasters
r1 <- raster(files[1])
for (i in 2:length(files)) {
r1 <- stack(r1, files[i])}
r1
## class : RasterStack
## dimensions : 249, 210, 52290, 11 (nrow, ncol, ncell, nlayers)
## resolution : 25, 25 (x, y)
## extent : 338422.3, 343672.3, 6364203, 6370428 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=56 +south +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
## names : AACN, Elevation, Hillshading, Landsat_Band1, Light_insolation, Mid_Slope_Positon, MRVBF, NDVI, Slope, Terrain_Ruggedness_Index, TWI
## min values : 0.000000, 72.217499, 0.000677, 26.000000, 1236.662840, 0.000009, 0.000002, -0.573034, 0.001708, 0.194371, 8.224325
## max values : 106.665482, 212.632507, 32.440960, 140.000000, 1934.199950, 0.956529, 4.581594, 0.466667, 21.809752, 15.945321, 20.393652
Note that the stacking of rasters can only be possible if they are all equivalent in terms of resolution and extent. The Reprojections, resampling and rasterisation page takes you through a number of different scenarios and workflows for getting your data layers all harmonised. With the stacked rasters, we can perform the soil point data intersection as done previously.
DSM_data <- extract(r1, HV_subsoilpH, sp = 1, method = "simple")