---
title: "Predictive Modeling Engine"
author: "rassta: Raster-based Spatial Stratification Algorithms"
output: 
  rmarkdown::html_vignette:
    dev: "jpeg"
vignette: >
  %\VignetteIndexEntry{Predictive Modeling Engine}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

--------------------------------------------------------------------------------

Get the data required for this vignette
```{r data, fig.show = 'hold', fig.height = 2.5, fig.width = 2.8, message = FALSE, warning = FALSE, error = TRUE}
# Compressed folder with files from rassta’s installation folder
wasoil <- system.file("exdat/wasoil.zip", package = "rassta")
# Directory for temporary files
o <- tempdir()
# Copy compressed folder to directory for temporary files
file.copy(from = wasoil, to = o)
# Extract files to subfolder
d <- paste(o, "/rassta", sep = "")
unzip(paste(o, "/wasoil.zip", sep = ""), exdir = d)
```
  
--------------------------------------------------------------------------------

The approach of **rassta** for predictive modeling of response phenomena is
based on the landscape similarity to stratification units. If each
stratification unit across geographic space represents a distinct landscape
configuration, and if each configuration influences a phenomenon in a
distinctive way, then the spatial variability of that phenomenon can be assessed
across space by relating each geographic location to each distinct landscape
configuration. Therefore, the more similar a geographic location is to the
landscape configuration represented by a given stratification unit, then also
the more similar the response of a phenomenon will be at that location to the
typical response for conditions within the given stratification unit. Both
continuous and categorical response variables are supported. For categorical
responses, each category must be identified by an integer value.

There are three main inputs that **rassta** requires to perform predictive
modeling of response phenomena:

- A multi-layer SpatRaster of landscape similarity to stratification units.
- A data frame with *representative* observations/measurements of response
phenomena (multiple responses of the same type are allowed). One row (i.e.,
observation/measurement) per landscape similarity layer is required.
- A polygon SpatVector representing the boundaries of the area of interest. This
SpatVector can be split into tiles to speed-up modeling through parallel
processing.

The code below demonstrates the preparation of inputs required for the
prediction of soil organic carbon (SOC; continuous response) across a landscape.
```{r in, message = FALSE, warning = FALSE, error = TRUE}
# Load rassta and terra packages
library(rassta)
library(terra)
# Single-layer SpatRaster of stratification units
su <- rast(paste(d, "/su.tif", sep = ""))
# SpatVector with the boundaries of the area of interest
aoi <- vect(paste(d, "/aoi.shp", sep = "")) # Single area, no tiles
# Multi-layer SpatRaster with spatial signatures of classification units
clim.sig <- rast(list.files(d, pattern = "climate_", full.names = TRUE)) # For climatic units
mat.sig <- rast(list.files(d, pattern = "material_", full.names = TRUE)) # For material units
terr.sig <- rast(list.files(d, pattern = "terrain_", full.names = TRUE)) # For terrain units
# Landscape similarity to stratification units
su.ls <- similarity(su.rast = su, sig.rast = c(clim.sig, mat.sig, terr.sig),
                    su.code = list(climate = c(1, 1),
                                   material = c(2, 2),
                                   terrain = c(3, 3)
                                   )
                  )
# SpatVector with SOC observations for stratification units
socobs <- vect(paste(d, "/soc.shp", sep = ""))
# Representative SOC observation for each stratification unit
su.obs <- observation(su.rast = su, obs = socobs, col.id = 1, col.resp = 2,
                      method = "mls", ls.rast = su.ls$landsim
                      )
# Data frame with numeric codes of stratification units and representative SOC values
su.soc <- su.obs$su_repobs[, c("SU", "soc")]
```

The code below demonstrates the use of *engine()* to perform the prediction of
SOC values.
```{r pred, message = FALSE, warning = FALSE, error = TRUE}
# Prediction of SOC across the landscape based on 3 winning stratification units
soc <- engine(ls.rast = su.ls$landsim,
              n.win = 3, # n highest landscape similarity values
              su.repobs = su.soc,
              tiles = aoi,
              outdir = d, # engine() writes results directly on disk
              overwrite = TRUE
            )
```

The output SpatRaster with predicted SOC values and the statistical evaluation
of prediction performance can be plotted as demonstrated in the code below.
```{r ev, fig.height = 4, fig.width = 7, message = FALSE, warning = FALSE, error = TRUE}
# Set graphics arrangement
par(mfrow = c(1,2))
# Plot modeled soil organic carbon
plot(soc, col = hcl.colors(100, "Fall", rev = TRUE),
     main = "Modeled Soil Organic Carbon (%)", mar = c(3, 1.7, 3, 2)
    )
# Evaluate modeling performance
evalobs <- vect(paste(d, "/soc_valid.shp", sep = "")) # independent SOC measurements
evalmodel <- extract(soc, evalobs) # modeled SOC to independent sample locations 
names(evalmodel)[2] <- "soc_model"
evalobs <- cbind(evalobs, evalmodel[2]) # Table with measured and modeled SOC
eval.rmse <- sqrt(mean((evalobs$soc-evalobs$soc_model)^2)) # RMSE 
eval.mae <- mean(abs(evalobs$soc-evalobs$soc_model)) # MAE
# Set new graphics arrangement
par(mgp=c(1.2, 0.2, 0), mar = c(2.5, 1.7, 3.4, 1))
# Plot measured versus modeled SOC
plot(evalobs$soc, evalobs$soc_model, cex.axis = 0.83,
     xlab = "Measured Soil Organic Carbon (%)", ylab = "", yaxt = "n"
    )
axis(2, at = c(3, 3.5, 4, 4.5, 5, 5.5), labels = NA)
abline(0, 1, lty = "dashed")
text(x = 4, y = 5.8, cex = 0.8,
     paste("Root mean squared error: ", round(eval.rmse,2), "%", sep = "")
    )
text(x = 3.85, y = 5.6, cex = 0.8,
     paste("Mean Absolute Error: ", round(eval.mae,2), "%", sep = "")
    )
```

For each cell of the SpatRaster, the predicted response value is obtained
through a weighted average of the representative values from the *n*
stratification units with the highest landscape similarity values. The units
with the highest landscape similarity values at a given cell are termed *winning
stratification units*. The weight of a winning unit's representative value is
proportional to the winning unit’s landscape similarity value at the cell. For
categorical responses, the modal response value of the winning stratification
units replaces the weighted average.

---

Clean files from temporary directory
```{r clean, message = FALSE, warning = FALSE, error = TRUE}
unlink(c(paste(o, "/wasoil.zip", sep = ""), d), recursive = TRUE)
```

---

## References

- B.A. Fuentes, M.J. Dorantes, and J.R. Tipton. rassta: Raster-based Spatial
Stratification Algorithms. *EarthArXiv*, 2021.
https://doi.org/10.31223/X50S57