---
title: "Stratified Non-Probability Sampling"
author: "rassta: Raster-based Spatial Stratification Algorithms"
output: 
  rmarkdown::html_vignette:
    dev: "jpeg"
vignette: >
  %\VignetteIndexEntry{Stratified Non-Probability Sampling}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

--------------------------------------------------------------------------------

Get the data required for this vignette
```{r data, fig.show = 'hold', fig.height = 2.5, fig.width = 2.8, message = FALSE, warning = FALSE, error = TRUE}
# Compressed folder with files from rassta’s installation folder
wasoil <- system.file("exdat/wasoil.zip", package = "rassta")
# Directory for temporary files
o <- tempdir()
# Copy compressed folder to directory for temporary files
file.copy(from = wasoil, to = o)
# Extract files to subfolder
d <- paste(o, "/rassta", sep = "")
unzip(paste(o, "/wasoil.zip", sep = ""), exdir = d)
```
  
--------------------------------------------------------------------------------

The use of stratification units for sampling allows adequate representations of
the feature space, improved modeling performance, and cost-efficient field work.
Accordingly, **rassta** allows the stratified selection of observations from an
existing sample and the selection of *XY* locations to create a new sample. The
stratified sampling with **rassta** can include the landscape similarity layers.
Accounting for the landscape similarity will ensure that the
observation/location selected for a given stratification unit is the one which
best reflects the landscape configuration represented by the unit. The use of
landscape similarity layers in the sampling process results in a stratified,
non-probability sample.

## Selection of representative observations

The selection of observations from an existing sample with **rassta** results in
one observation per stratification unit in a set. This observation is termed the
*representative response observation*. The representative response observation
is assumed to best reflect the control that the landscape configuration
represented by a stratification unit exerts on response phenomena. Currently,
**rassta** allows to select a representative response observation for each
stratification unit in a set based on the following methods:

- *mls*: selects the observation at the location with the maximum landscape
similarity value.
- *mrv*: selects the observation whose response value is the median of all the
values.
- *random*: selects an observation at random.

The first two methods represent a stratified non-probability sampling, while the
third method represents a stratified random sampling. Moreover, the first method
is biased towards maximizing the representativeness of landscape configurations
because it uses landscape similarity layers. The codes below demonstrate the
selection of representative response observations with *observation()*. The
observations will be selected from a sample of *n* = 195 based on the *mls*
method.

- Creation of multi-layer SpatRaster with landscape similarity to stratification
units.
```{r sim, message = FALSE, warning = FALSE, error = TRUE}
# Load rassta and terra packages
library(rassta)
library(terra)
# Single-layer SpatRaster of stratification units
su <- rast(paste(d, "/su.tif", sep = ""))
# Multi-layer SpatRaster with spatial signatures of classification units
clim.sig <- rast(list.files(d, pattern = "climate_", full.names = TRUE)) # For climatic units
mat.sig <- rast(list.files(d, pattern = "material_", full.names = TRUE)) # For material units
terr.sig <- rast(list.files(d, pattern = "terrain_", full.names = TRUE)) # For terrain units
# Landscape similarity to stratification units
su.ls <- similarity(su.rast = su, sig.rast = c(clim.sig, mat.sig, terr.sig),
                    su.code = list(climate = c(1, 1),
                                   material = c(2, 2),
                                   terrain = c(3, 3)
                                   )
                  )
```
- Selection of the representative response observation for each stratification
unit. The response variable in this example is soil organic carbon (SOC).
```{r obs, message = FALSE, warning = FALSE, error = TRUE}
# SpatVector with SOC observations for stratification units
socobs <- vect(paste(d, "/soc.shp", sep = ""))
# Representative SOC observation for each stratification unit
su.obs <- observation(su.rast = su, obs = socobs, col.id = 1, col.resp = 2,
                      method = "mls", ls.rast = su.ls$landsim
                      )
# Information about sample and representative SOC observations
socobs
su.obs$su_repobs.sp
```

- The spatial and statistical distribution of the sample and representative
response observations can be plotted as demonstrated in the code below.
```{r dist, fig.show = 'hold', fig.height = 4, fig.width = 7, message = FALSE, warning = FALSE, error = TRUE}
# Set graphics arrangement
par(mfrow = c(1,2))
# Plot stratification units and response observations
plot(su, type = "classes", col = hcl.colors(56, "spectral"), legend = FALSE,
     mar = c(3, 2, 3, 1.5), main = paste("Soil Organic Carbon Observations"),
     fun = function() c(points(socobs, pch = 21, bg = rgb(0,1,0,1)),
                        points(su.obs$su_repobs.sp, pch = 21, bg = rgb(0,0,1,1))
                      )
    )
# Set new graphics arrangement
par(mar = c(2, 1.5, 1.5, 1.5))
# Plot histogram of soil organic carbon values from all observations
hist(socobs$soc, 4, col = rgb(0,1,0,0.8), main = "Soil Organic Carbon (%)", xlab = "")
# Plot histogram of soil organic carbon values from representative observations
hist(su.obs$su_repobs.sp$soc, 4, add = T, col = rgb(0,0,1,0.9))
# Add legend
legend("topright", legend = c("Sample", "Representative Observations"),
       col = c(rgb(0,1,0,0.8), rgb(0,0,1,0.9)), pch = 20, bty = "n", pt.cex = 2,
       cex = 0.6, text.col = "black", horiz = F, inset = c(0, 0.05)
      )
```

## Selection of representative sampling locations

For those cases when a sample with observations of a response phenomenon is not
available, **rassta** allows to define *XY* locations that can be used to
establish field sampling campaigns. The selection of representative sampling
locations is performed for each stratification unit in a set, and it utilizes
the landscape similarity layers to define the *XY* locations. Currently,
**rassta** allows to select representative sampling locations based on two
approaches:

- *buffer*: selects the *XY* location(s) within spatially contiguous areas with
landscape similarity values above a user-defined threshold.

- *absolute*: selects the *XY* location(s) with the *n* highest landscape
similarity values.

The code below demonstrates the selection of representative sampling locations
with the function *locations()*, based on the *buffer* method.

```{r locs, message = FALSE, warning = FALSE, error = TRUE}
# Representative sampling location and its buffer area for each stratification unit
su.samp <- locations(ls.rast = su.ls$landsim, su.rast = su, method = "buffer")
# Information about representative sampling locations and corresponding buffer areas
su.samp$locations
su.samp$buffers
```

The spatial distribution of representative sampling locations and the
statistical distribution of landscape similarity can be plotted as demonstrated
in the code below.
```{r spat, fig.show = 'hold', fig.height = 4, fig.width = 7, message = FALSE, warning = FALSE, error = TRUE}
# Set graphics arrangement
par(mfrow = c(1,2))
# Plot stratification units, representative sampling locations and buffer areas
plot(su, type = "classes", col = hcl.colors(56, "spectral"), legend = FALSE,
     mar = c(3, 2, 3, 1.5), main = "Representative Sampling Locations",
     fun = function() c(polys(su.samp$buffers, col = rgb(0,1,0,0.5)),
                        points(su.samp$locations, pch = 21, col = "black",
                               bg = rgb(0,1,0,1)
                              )
                      )
    )
# Set new graphics arrangement
par(mar = c(2, 1.5, 1.5, 1.5))
# Plot histogram of landscape similarity values at sampling locations
hist(su.samp$locations$land_sim, 4, main = "Landscape Similarity")
```

---

Clean files from temporary directory
```{r clean, message = FALSE, warning = FALSE, error = TRUE}
unlink(c(paste(o, "/wasoil.zip", sep = ""), d), recursive = TRUE)
```

---

## References

- B.A. Fuentes, M.J. Dorantes, and J.R. Tipton. rassta: Raster-based Spatial
Stratification Algorithms. *EarthArXiv*, 2021.
https://doi.org/10.31223/X50S57