--- title: "Stratified Non-Probability Sampling" author: "rassta: Raster-based Spatial Stratification Algorithms" output: rmarkdown::html_vignette: dev: "jpeg" vignette: > %\VignetteIndexEntry{Stratified Non-Probability Sampling} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` -------------------------------------------------------------------------------- Get the data required for this vignette ```{r data, fig.show = 'hold', fig.height = 2.5, fig.width = 2.8, message = FALSE, warning = FALSE, error = TRUE} # Compressed folder with files from rassta’s installation folder wasoil <- system.file("exdat/wasoil.zip", package = "rassta") # Directory for temporary files o <- tempdir() # Copy compressed folder to directory for temporary files file.copy(from = wasoil, to = o) # Extract files to subfolder d <- paste(o, "/rassta", sep = "") unzip(paste(o, "/wasoil.zip", sep = ""), exdir = d) ``` -------------------------------------------------------------------------------- The use of stratification units for sampling allows adequate representations of the feature space, improved modeling performance, and cost-efficient field work. Accordingly, **rassta** allows the stratified selection of observations from an existing sample and the selection of *XY* locations to create a new sample. The stratified sampling with **rassta** can include the landscape similarity layers. Accounting for the landscape similarity will ensure that the observation/location selected for a given stratification unit is the one which best reflects the landscape configuration represented by the unit. The use of landscape similarity layers in the sampling process results in a stratified, non-probability sample. ## Selection of representative observations The selection of observations from an existing sample with **rassta** results in one observation per stratification unit in a set. This observation is termed the *representative response observation*. The representative response observation is assumed to best reflect the control that the landscape configuration represented by a stratification unit exerts on response phenomena. Currently, **rassta** allows to select a representative response observation for each stratification unit in a set based on the following methods: - *mls*: selects the observation at the location with the maximum landscape similarity value. - *mrv*: selects the observation whose response value is the median of all the values. - *random*: selects an observation at random. The first two methods represent a stratified non-probability sampling, while the third method represents a stratified random sampling. Moreover, the first method is biased towards maximizing the representativeness of landscape configurations because it uses landscape similarity layers. The codes below demonstrate the selection of representative response observations with *observation()*. The observations will be selected from a sample of *n* = 195 based on the *mls* method. - Creation of multi-layer SpatRaster with landscape similarity to stratification units. ```{r sim, message = FALSE, warning = FALSE, error = TRUE} # Load rassta and terra packages library(rassta) library(terra) # Single-layer SpatRaster of stratification units su <- rast(paste(d, "/su.tif", sep = "")) # Multi-layer SpatRaster with spatial signatures of classification units clim.sig <- rast(list.files(d, pattern = "climate_", full.names = TRUE)) # For climatic units mat.sig <- rast(list.files(d, pattern = "material_", full.names = TRUE)) # For material units terr.sig <- rast(list.files(d, pattern = "terrain_", full.names = TRUE)) # For terrain units # Landscape similarity to stratification units su.ls <- similarity(su.rast = su, sig.rast = c(clim.sig, mat.sig, terr.sig), su.code = list(climate = c(1, 1), material = c(2, 2), terrain = c(3, 3) ) ) ``` - Selection of the representative response observation for each stratification unit. The response variable in this example is soil organic carbon (SOC). ```{r obs, message = FALSE, warning = FALSE, error = TRUE} # SpatVector with SOC observations for stratification units socobs <- vect(paste(d, "/soc.shp", sep = "")) # Representative SOC observation for each stratification unit su.obs <- observation(su.rast = su, obs = socobs, col.id = 1, col.resp = 2, method = "mls", ls.rast = su.ls$landsim ) # Information about sample and representative SOC observations socobs su.obs$su_repobs.sp ``` - The spatial and statistical distribution of the sample and representative response observations can be plotted as demonstrated in the code below. ```{r dist, fig.show = 'hold', fig.height = 4, fig.width = 7, message = FALSE, warning = FALSE, error = TRUE} # Set graphics arrangement par(mfrow = c(1,2)) # Plot stratification units and response observations plot(su, type = "classes", col = hcl.colors(56, "spectral"), legend = FALSE, mar = c(3, 2, 3, 1.5), main = paste("Soil Organic Carbon Observations"), fun = function() c(points(socobs, pch = 21, bg = rgb(0,1,0,1)), points(su.obs$su_repobs.sp, pch = 21, bg = rgb(0,0,1,1)) ) ) # Set new graphics arrangement par(mar = c(2, 1.5, 1.5, 1.5)) # Plot histogram of soil organic carbon values from all observations hist(socobs$soc, 4, col = rgb(0,1,0,0.8), main = "Soil Organic Carbon (%)", xlab = "") # Plot histogram of soil organic carbon values from representative observations hist(su.obs$su_repobs.sp$soc, 4, add = T, col = rgb(0,0,1,0.9)) # Add legend legend("topright", legend = c("Sample", "Representative Observations"), col = c(rgb(0,1,0,0.8), rgb(0,0,1,0.9)), pch = 20, bty = "n", pt.cex = 2, cex = 0.6, text.col = "black", horiz = F, inset = c(0, 0.05) ) ``` ## Selection of representative sampling locations For those cases when a sample with observations of a response phenomenon is not available, **rassta** allows to define *XY* locations that can be used to establish field sampling campaigns. The selection of representative sampling locations is performed for each stratification unit in a set, and it utilizes the landscape similarity layers to define the *XY* locations. Currently, **rassta** allows to select representative sampling locations based on two approaches: - *buffer*: selects the *XY* location(s) within spatially contiguous areas with landscape similarity values above a user-defined threshold. - *absolute*: selects the *XY* location(s) with the *n* highest landscape similarity values. The code below demonstrates the selection of representative sampling locations with the function *locations()*, based on the *buffer* method. ```{r locs, message = FALSE, warning = FALSE, error = TRUE} # Representative sampling location and its buffer area for each stratification unit su.samp <- locations(ls.rast = su.ls$landsim, su.rast = su, method = "buffer") # Information about representative sampling locations and corresponding buffer areas su.samp$locations su.samp$buffers ``` The spatial distribution of representative sampling locations and the statistical distribution of landscape similarity can be plotted as demonstrated in the code below. ```{r spat, fig.show = 'hold', fig.height = 4, fig.width = 7, message = FALSE, warning = FALSE, error = TRUE} # Set graphics arrangement par(mfrow = c(1,2)) # Plot stratification units, representative sampling locations and buffer areas plot(su, type = "classes", col = hcl.colors(56, "spectral"), legend = FALSE, mar = c(3, 2, 3, 1.5), main = "Representative Sampling Locations", fun = function() c(polys(su.samp$buffers, col = rgb(0,1,0,0.5)), points(su.samp$locations, pch = 21, col = "black", bg = rgb(0,1,0,1) ) ) ) # Set new graphics arrangement par(mar = c(2, 1.5, 1.5, 1.5)) # Plot histogram of landscape similarity values at sampling locations hist(su.samp$locations$land_sim, 4, main = "Landscape Similarity") ``` --- Clean files from temporary directory ```{r clean, message = FALSE, warning = FALSE, error = TRUE} unlink(c(paste(o, "/wasoil.zip", sep = ""), d), recursive = TRUE) ``` --- ## References - B.A. Fuentes, M.J. Dorantes, and J.R. Tipton. rassta: Raster-based Spatial Stratification Algorithms. *EarthArXiv*, 2021. https://doi.org/10.31223/X50S57