Some R packages that are useful for digital soil mapping

Notwithstanding to the rich statistical and analytical resource provide through the R base functionality, the following R packages (and their contained functions) are what I think are an invaluable resource for doing digital soil mapping.

There are four main groups of tasks that are critical for implementing DSM in general. These are:

  1. Soil science and pedometric type tasks
  2. Using GIS tools and related GIS tasks
  3. Calibrating models
  4. Making maps, plotting etc.

The following are short introductions about those packages that fall into these categories.

Soil science and pedometrics

Back to top

GIS stuff

  • sp is a highly useful package that provides classes and methods for spatial data. The classes document where the spatial location information resides, for 2D or 3D data. Utility functions are provided, e.g. for plotting data as maps, spatial selection, as well as methods for retrieving coordinates, for sub-setting, print, summary, etc.

  • raster. Reading, writing, manipulating, analyzing and modeling of gridded spatial data. The package implements basic and high-level functions and processing of very large files is supported.

  • rgdal provides bindings to Frank Warmerdam’s Geospatial Data Abstraction Library (GDAL) (>= 1.6.3) and access to projection/transformation operations from the PROJ.4 library. Both GDAL raster and OGR vector map data can be imported into R, and GDAL raster data and OGR vector data exported. Use is made of classes defined in the sp package.

  • RSAGA provides access to geocomputing and terrain analysis functions of SAGA GIS from within R by running the command line version of SAGA. RSAGA furthermore provides several R functions for handling ASCII grids, including a flexible framework for applying local functions (including predict methods of fitted models) and focal functions to multiple grids.

Back to top

Modelling

  • caret has an extensive range of functions for training and plotting classification and regression models. See the caret website for more detailed information.

  • Cubist does regression modeling using rules with added instance-based corrections. Cubist models were developed by Ross Quinlan. Further information can be found at Rulequest

  • C5.0 does C5.0 decision trees and rule-based models for pattern recognition. Another model structure developed by Ross Quinlan.

  • gam has functions for fitting and working with generalized additive models.

  • nnet is software for feed-forward neural networks with a single hidden layer, and for multinomial log-linear models.

  • gstat is for doing geostatistics. Variogram modelling, simple, ordinary and universal point or block (co)kriging, sequential Gaussian or indicator (co)simulation; variogram and variogram map plotting utility functions. A related and useful package is automap, which performs an automatic interpolation by automatically estimating the variogram and then calling gstat.

Back to top

Mapping and plotting

  • Both raster and sp have handy functions for plotting spatial data. Besides using the base plotting functionality, another useful plotting package is ggplot2. This package is an implementation of the grammar of graphics in R. It combines the advantages of both base and lattice graphics: conditioning and shared axes are handled automatically, and you can still build up a plot step by step from multiple data sources. It also implements a sophisticated multidimensional conditioning system and a consistent interface to map data to aesthetic attributes. See the ggplot2 website for more information, documentation and examples.

Back to top