Spatial Prediction

CODE:

Applying models spatially

Here we will provide a brief overview of the process of applying models to unseen data. The general idea is that one has fitted a model to predict some soil feature using a suite of environmental covariates. Now you need to use that model to apply to a the same covariates, but now these covariates have continuous coverage across the area.

Essentially you want to create a soil map. There are a couple of ways to go about this. First things first we need to prepare some data and fit a simple model.

Data preparation
Model fitting
Apply model: Covariate data frame
Apply model: terra::predict function

Data preparation

Recall from before in the data preparatory exercises that we were working with the soil point data and environmental covariates for the Hunter Valley area. These data are stored in the HV_subsoilpH point data and hunterCovariates_sub raster files from the ithir package.

# load libraries
library(ithir)
library(terra)
library(sf)

# point data
data(HV_subsoilpH)

# Start afresh round pH data to 2 decimal places
HV_subsoilpH$pH60_100cm <- round(HV_subsoilpH$pH60_100cm, 2)

# remove already intersected data
HV_subsoilpH <- HV_subsoilpH[, 1:3]

# add an id column
HV_subsoilpH$id <- seq(1, nrow(HV_subsoilpH), by = 1)

# re-arrange order of columns
HV_subsoilpH <- HV_subsoilpH[, c(4, 1, 2, 3)]

# Change names of coordinate columns
names(HV_subsoilpH)[2:3] <- c("x", "y")
# save a copy of coordinates
HV_subsoilpH$x2 <- HV_subsoilpH$x
HV_subsoilpH$y2 <- HV_subsoilpH$y

# convert data to sf object
HV_subsoilpH <- sf::st_as_sf(x = HV_subsoilpH, coords = c("x", "y"))

# grids (covariate rasters from ithir package)
hv.sub.rasters <- list.files(path = system.file("extdata/", package = "ithir"), pattern = "hunterCovariates_sub", full.names = TRUE)

# read them into R as SpatRaster objects
hv.sub.rasters <- terra::rast(hv.sub.rasters)

Perform the covariate intersection.

DSM_data <- terra::extract(x = hv.sub.rasters, y = HV_subsoilpH, bind = T, method = "simple")
DSM_data <- as.data.frame(DSM_data)

Model fitting

hv.MLR.Full <- lm(pH60_100cm ~ +Terrain_Ruggedness_Index + AACN + Landsat_Band1 +
Elevation + Hillshading + Light_insolation + Mid_Slope_Positon + MRVBF + NDVI +
TWI + Slope, data = DSM_data)
summary(hv.MLR.Full)

## 
## Call:
## lm(formula = pH60_100cm ~ +Terrain_Ruggedness_Index + AACN + 
##     Landsat_Band1 + Elevation + Hillshading + Light_insolation + 
##     Mid_Slope_Positon + MRVBF + NDVI + TWI + Slope, data = DSM_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2380 -0.7843 -0.1225  0.7057  3.4641 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               5.372451   2.147673   2.502 0.012689 *  
## Terrain_Ruggedness_Index  0.075084   0.054893   1.368 0.171991    
## AACN                      0.034747   0.007241   4.798 2.12e-06 ***
## Landsat_Band1            -0.037712   0.009355  -4.031 6.42e-05 ***
## Elevation                -0.013535   0.005550  -2.439 0.015079 *  
## Hillshading               0.152819   0.053655   2.848 0.004580 ** 
## Light_insolation          0.001329   0.001178   1.127 0.260081    
## Mid_Slope_Positon         0.928823   0.268625   3.458 0.000592 ***
## MRVBF                     0.324041   0.084942   3.815 0.000154 ***
## NDVI                      4.982413   0.887322   5.615 3.28e-08 ***
## TWI                       0.085150   0.045976   1.852 0.064615 .  
## Slope                    -0.102262   0.062391  -1.639 0.101838    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.178 on 494 degrees of freedom
## Multiple R-squared:  0.2501, Adjusted R-squared:  0.2334 
## F-statistic: 14.97 on 11 and 494 DF,  p-value: < 2.2e-16

Applying the model spatially using data frames

The traditional approach has been to collate a grid table where there would be two columns for the coordinates followed by other columns for each of the available covariates that were sourced. This was seen as an efficient way to organize all the covariate data as it ensured that a common grid was used which also meant that all the covariates are of the same scale in terms of resolution and extent. We can simulate the covariate table approach using the stack of collated covariate data.

tempD <- data.frame(cellNos = seq(1:terra::ncell(hv.sub.rasters)))
vals <- as.data.frame(terra::values(hv.sub.rasters))
tempD <- cbind(tempD, vals)
tempD <- tempD[complete.cases(tempD), ]
cellNos <- c(tempD$cellNos)
gXY <- data.frame(terra::xyFromCell(hv.sub.rasters, cellNos))
tempD <- cbind(gXY, tempD)
str(tempD)

## 'data.frame':    33252 obs. of  14 variables:
##  $ x                       : num  340935 340960 340985 341010 341035 ...
##  $ y                       : num  6370416 6370416 6370416 6370416 6370416 ...
##  $ cellNos                 : int  101 102 103 104 105 106 107 108 109 110 ...
##  $ AACN                    : num  9.78 9.86 10.04 10.27 10.53 ...
##  $ Elevation               : num  103 103 102 102 102 ...
##  $ Hillshading             : num  0.94 0.572 0.491 0.515 0.568 ...
##  $ Landsat_Band1           : num  68 63 59 62 56 54 59 62 54 56 ...
##  $ Light_insolation        : num  1712 1706 1701 1699 1697 ...
##  $ Mid_Slope_Positon       : num  0.389 0.387 0.386 0.386 0.386 ...
##  $ MRVBF                   : num  0.376 0.765 1.092 1.54 1.625 ...
##  $ NDVI                    : num  -0.178 -0.18 -0.164 -0.169 -0.172 ...
##  $ Slope                   : num  0.968 0.588 0.503 0.527 0.581 ...
##  $ Terrain_Ruggedness_Index: num  0.745 0.632 0.535 0.472 0.486 ...
##  $ TWI                     : num  16.9 17.2 17.2 17.2 17.2 ...

The result shown above is that the covariate table contains 33252 rows and has 14 variables. It is always necessary to have the coordinate columns, but some saving of memory could be earned if only the required covariates are appended to the table.

It will quickly become obvious however that the covariate table approach could be limiting when mapping extents get very large or the grid resolution of mapping becomes more fine-grained, or both.

With the covariate table arranged it then becomes a matter of using the MLR predict function.

# Model predict
map.MLR <- predict(object = hv.MLR.Full, newdata = tempD)
# append coordinates
map.MLR <- cbind(data.frame(tempD[, c("x", "y")]), map.MLR)

Now we can rasterise the predictions for mapping and grid export. In the example below we set the CRS to WGS84 Zone 56 before exporting the raster file out as a Geotiff file.

# create raster
map.MLR.r <- terra::rast(x = map.MLR, type = "xyz")
plot(map.MLR.r, main = "MLR predicted soil pH (60-100cm)")

# set the projection
crs(map.MLR.r) <- "+init=epsg:32756"
# write to file
terra::writeRaster(x = map.MLR.r, filename = "soilpH_60_100_MLR.tif", datatype = "FLT4S", overwrite = TRUE)
# check working directory for presence of raster

rconsole — MLR predicted soil pH 60-100cm across the Hunter Valley.

Some of the parameters used within the writeRaster function that are worth noting. The parameter datatype is specified as FLT4S which indicates that a 4 byte, signed floating point values are to be written to file. Look at the function dataType to look at other alternatives, for example for categorical data where we may be interested in logical or integer values.

Applying the model spatially using `raster predict`

Probably a more efficient way of applying the fitted model is to apply it directly to the rasters themselves. This avoids the step of arranging all covariates into table format. If multiple rasters are being used, it is necessary to have them arranged as a stacked SpatRaster object. This is useful as it also ensures all the rasters are of the same extent and resolution. Here we can use the terra::predict function such as below using the hv.sub.rasters as input.

# apply model to rasters directly
map.MLR.r1 <- terra::predict(object = hv.sub.rasters, model = hv.MLR.Full, filename = "soilpH_60_100_MLR.tif", datatype = "FLT4S", overwrite = TRUE)
# check working directory for presence of raster