CODE:

Get the code used for this section here

Hybrid Regression Kriging

Introduction

The following example will provide the steps one might use to perform regression kriging that incorporates a complex model structure such as a data mining algorithm.

Here we will use the Cubist model that was also used earlier. Lets start from the top.

Back to top

Data preparation

So lets firstly get the data organized. Recall from before in the data preparatory exercises that we were working with the soil point data and environmental covariates for the Hunter Valley area. These data are stored in the HV_subsoilpH object together with associated rasters supplied with the ithir package.

For the succession of models examined in these various pages, we will concentrate on modelling and mapping the soil pH for the 60-100cm depth interval. To refresh, lets load the data in, then intersect the data with the available covariates.

# libraries
library(ithir)
library(MASS)
library(terra)
library(sf)
library(gstat)
library(Cubist)

# point data
data(HV_subsoilpH)

# Start afresh round pH data to 2 decimal places
HV_subsoilpH$pH60_100cm <- round(HV_subsoilpH$pH60_100cm, 2)

# remove already intersected data
HV_subsoilpH <- HV_subsoilpH[, 1:3]

# add an id column
HV_subsoilpH$id <- seq(1, nrow(HV_subsoilpH), by = 1)

# re-arrange order of columns
HV_subsoilpH <- HV_subsoilpH[, c(4, 1, 2, 3)]

# Change names of coordinate columns
names(HV_subsoilpH)[2:3] <- c("x", "y")
# save a copy of coordinates
HV_subsoilpH$x2 <- HV_subsoilpH$x
HV_subsoilpH$y2 <- HV_subsoilpH$y

# convert data to sf object
HV_subsoilpH <- sf::st_as_sf(x = HV_subsoilpH, coords = c("x", "y"))

# grids (covariate rasters from ithir package)
hv.sub.rasters <- list.files(path = system.file("extdata/", package = "ithir"), pattern = "hunterCovariates_sub",
full.names = TRUE)

# read them into R as SpatRaster objects
hv.sub.rasters <- terra::rast(hv.sub.rasters)

Back to top

Perform the covariate intersection

# extract covariate data
DSM_data <- terra::extract(x = hv.sub.rasters, y = HV_subsoilpH, bind = T, method = "simple")
DSM_data <- as.data.frame(DSM_data)

Often it is handy to check to see whether there are missing values both in the target variable and of the covariates. It is possible that a point location does not fit within the extent of the available covariates. In these cases the data should be excluded. A quick way to assess whether there are missing or NA values in the data is to use the complete.cases function.

which(!complete.cases(DSM_data))

## integer(0)

DSM_data <- DSM_data[complete.cases(DSM_data), ]

There do not appear to be any missing data as indicated by the integer(0) output above i.e there are zero rows with missing information.

Back to top

Model Fitting

For the structural part of the regression kriging model, the cubist model algorithm is used and is effectively the same as what was covered here).

## Cubist modeling of environmental trend
set.seed(875)
training <- sample(nrow(DSM_data), 0.7 * nrow(DSM_data))
mDat <- DSM_data[training, ]
names(mDat)[3:4] <- c("x", "y")

# fit the model
hv.cub.Exp <- cubist(x = mDat[, c("AACN", "Landsat_Band1", "Elevation", "Hillshading", "Mid_Slope_Positon", "MRVBF", "NDVI", "TWI")], y = mDat$pH60_100cm, cubistControl(rules = 100, extrapolation = 15), committees = 1)

Now derive the model residual which is the model prediction subtracted from the residual.

## Model residuals
mDat$residual <- mDat$pH60_100cm - predict(hv.cub.Exp, newdata = mDat)
mean(mDat$residual)

## [1] 0.1086524

hist(mDat$residual)

If you check the histogram of these residuals you will find that the mean is around zero and the data seems normally distributed. Now we can assess the residuals for any spatial structure using a variogram model.

## calculate variogram
vgm1 <- variogram(residual ~ 1, ~x + y, mDat, width = 200, cutoff = 3000)

# initialise variogram model
mod <- vgm(psill = var(mDat$residual), "Sph", range = 3000, nugget = 0)

# fit model
model_1 <- fit.variogram(vgm1, mod)
model_1

##   model     psill    range
## 1   Nug 0.8030644    0.000
## 2   Sph 0.5676551 1111.159

plot(vgm1, model = model_1)

# Residual kriging model
gRK <- gstat(NULL, "RKresidual", residual ~ 1, locations = ~x + y, mDat, model = model_1)

Back to top

Out-of-bag model evaluation

With the two model components together, we can now compare the model evaluations of using the Cubist model only to the hybrid estimate based on Cubist model predictions together with modeled residuals.

## Model evaluation on out-of-bag data Cubist model predictions
Cubist.pred.V <- predict(hv.cub.Exp, newdata = DSM_data[-training, ])

# Cubist model with residual variogram
vDat <- DSM_data[-training, ]
names(vDat)[3:4] <- c("x", "y")

# make the residual predictions
RK.preds.V <- krige(residual ~ 1, mDat, locations = ~x + y, model = model_1, newdata = vDat)

# Sum the two components together
RK.preds.fin <- Cubist.pred.V + RK.preds.V[, 3]

# evaluation based on cubist model only
goof(observed = DSM_data$pH60_100cm[-training], predicted = Cubist.pred.V, plot.it = T)

##          R2 concordance      MSE     RMSE        bias
## 1 0.1978859     0.33224 1.506298 1.227313 -0.05218972

# validation regression kriging with cubist model
goof(observed = DSM_data$pH60_100cm[-training], predicted = RK.preds.fin, plot.it = T)

##          R2 concordance      MSE     RMSE       bias
## 1 0.3372345   0.5236584 1.258334 1.121755 0.09227384

These results confirm that there to be some advantage in performing regression kriging with these soil data.

Back to top

Spatial prediction

To apply the regression kriging model spatially, there are 3 basic steps.

  1. First apply the Cubist model,
  2. Then apply the residual kriging,
  3. Then finally add both maps together.

The script below illustrates how this is done.

## Spatial prediction
par(mfrow = c(3, 1))

# Cubist model
map.RK1 <- terra::predict(object = hv.sub.rasters, model = hv.cub.Exp, filename = "soilpH_60_100_cubistRK.tif", datatype = "FLT4S", overwrite = TRUE)
plot(map.RK1, main = "Cubist model predicted soil pH")

## Interpolated residuals (note index of 1..get prediction only)
map.RK2 <- terra::interpolate(object = hv.sub.rasters, model = gRK, xyOnly = TRUE, index = 1, filename = "soilpH_60_100_residualRK.tif", datatype = "FLT4S", overwrite = TRUE)
plot(map.RK2, main = "Kriged residual")

## Sum of cubist predictions and residuals #Stack prediction and kriged
## residuals
pred.stack <- c(map.RK1, map.RK2)
map.RK3 <- sum(pred.stack)
plot(map.RK3, main = "Regression kriging prediction")
rconsole
Regression kriging predictions with cubist models. Hunter Valley soil pH (60-100cm).

Back to top

Updated on Smart Digital Agriculture