--- title: "Overview of unmarked: an R Package for the Analysis of Data from Unmarked Animals" author: - name: Ian Fiske - name: Richard Chandler date: February 5, 2019 bibliography: unmarked.bib csl: ecology.csl output: rmarkdown::html_vignette: fig_width: 5 fig_height: 3.5 number_sections: true toc: true vignette: > %\VignetteIndexEntry{Overview of unmarked} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r,echo=FALSE} options(rmarkdown.html_vignette.check_title = FALSE) ``` # Abstract `unmarked` aims to be a complete environment for the statistical analysis of data from surveys of unmarked animals. Currently, the focus is on hierarchical models that separately model a latent state (or states) and an observation process. This vignette provides a brief overview of the package - for a more thorough treatment see @fiskeChandler_2011. # Overview of unmarked Unmarked provides methods to estimate site occupancy, abundance, and density of animals (or possibly other organisms/objects) that cannot be detected with certainty. Numerous models are available that correspond to specialized survey methods such as temporally replicated surveys, distance sampling, removal sampling, and double observer sampling. These data are often associated with metadata related to the design of the study. For example, in distance sampling, the study design (line- or point-transect), distance class break points, transect lengths, and units of measurement need to be accounted for in the analysis. Unmarked uses S4 classes to store data and metadata in a way that allows for easy data manipulation, summarization, and model specification. Table 1 lists the currently implemented models and their associated fitting functions and data classes. ```{r, echo=FALSE} tab1 <- data.frame( Model=c("Occupancy", "Royle-Nichols", "Point Count", "Distance-sampling", "Generalized distance-sampling", "Arbitrary multinomial-Poisson", "Colonization-extinction", "Generalized multinomial-mixture"), `Fitting Function`=c("occu","occuRN","pcount","distsamp","gdistsamp", "multinomPois","colext","gmultmix"), Data=c("unmarkedFrameOccu","unmarkedFrameOccu","unmarkedFramePCount", "unmarkedFrameDS","unmarkedFrameGDS","unmarkedFrameMPois", "unmarkedMultFrame","unmarkedFrameGMM"), Citation=c("@mackenzie_estimating_2002","@royle_estimating_2003", "@royle_n-mixture_2004","@royle_modeling_2004", "@chandlerEA_2011","@royle_generalized_2004", "@mackenzie_estimating_2003","@royle_generalized_2004"), check.names=FALSE) knitr::kable(tab1, format='markdown', align="lccc", caption="Table 1. Models handled by unmarked.") ``` Each data class can be created with a call to the constructor function of the same name as described in the examples below. # Typical unmarked session The first step is to import the data into R, which we do below using the `read.csv` function. Next, the data need to be formatted for use with a specific model fitting function. This can be accomplished with a call to the appropriate type of `unmarkedFrame`. For example, to prepare the data for a single-season site-occupancy analysis, the function `unmarkedFrameOccu` is used. ## Importing and formatting data ```{r} library(unmarked) wt <- read.csv(system.file("csv","widewt.csv", package="unmarked")) y <- wt[,2:4] siteCovs <- wt[,c("elev", "forest", "length")] obsCovs <- list(date=wt[,c("date.1", "date.2", "date.3")], ivel=wt[,c("ivel.1", "ivel.2", "ivel.3")]) wt <- unmarkedFrameOccu(y = y, siteCovs = siteCovs, obsCovs = obsCovs) summary(wt) ``` Alternatively, the convenience function `csvToUMF` can be used ```{r} wt <- csvToUMF(system.file("csv","widewt.csv", package="unmarked"), long = FALSE, type = "unmarkedFrameOccu") ``` If not all sites have the same numbers of observations, then manual importation of data in long format can be tricky. `csvToUMF` seamlessly handles this situation. ```{r} pcru <- csvToUMF(system.file("csv","frog2001pcru.csv", package="unmarked"), long = TRUE, type = "unmarkedFrameOccu") ``` To help stabilize the numerical optimization algorithm, we recommend standardizing the covariates. ```{r} obsCovs(pcru) <- scale(obsCovs(pcru)) ``` ## Fitting models Occupancy models can then be fit with the occu() function: ```{r} fm1 <- occu(~1 ~1, pcru) fm2 <- occu(~ MinAfterSunset + Temperature ~ 1, pcru) fm2 ``` Here, we have specified that the detection process is modeled with the `MinAfterSunset` and `Temperature` covariates. No covariates are specified for occupancy here. See `?occu` for more details. ## Back-transforming parameter estimates `unmarked` fitting functions return `unmarkedFit` objects which can be queried to investigate the model fit. Variables can be back-transformed to the unconstrained scale using `backTransform`. Standard errors are computed using the delta method. ```{r} backTransform(fm2, 'state') ``` The expected probability that a site was occupied is 0.823. This estimate applies to the hypothetical population of all possible sites, not the sites found in our sample. For a good discussion of population-level vs finite-sample inference, see @royle_dorazio:2008 page 117. Note also that finite-sample quantities can be computed in `unmarked` using empirical Bayes methods as demonstrated at the end of this document. Back-transforming the estimate of $\psi$ was easy because there were no covariates. Because the detection component was modeled with covariates, $p$ is a function, not just a scalar quantity, and so we need to be provide values of our covariates to obtain an estimate of $p$. Here, we request the probability of detection given a site is occupied and all covariates are set to 0. ```{r} backTransform(linearComb(fm2, coefficients = c(1,0,0), type = 'det')) ``` Thus, we can say that the expected probability of detection was 0.552 when time of day and temperature are fixed at their mean value. A `predict` method also exists, which can be used to obtain estimates of parameters at specific covariate values. ```{r} newData <- data.frame(MinAfterSunset = 0, Temperature = -2:2) round(predict(fm2, type = 'det', newdata = newData, appendData=TRUE), 2) ``` Confidence intervals are requested with `confint`, using either the asymptotic normal approximation or profiling. ```{r, eval=FALSE} confint(fm2, type='det') confint(fm2, type='det', method = "profile") ``` ```{r, echo=FALSE} confint(fm2, type='det') nul <- capture.output(ci <- confint(fm2, type='det', method = "profile")) ci ``` ## Model selection and model fit Model selection and multi-model inference can be implemented after organizing models using the `fitList` function. ```{r} fms <- fitList('psi(.)p(.)' = fm1, 'psi(.)p(Time+Temp)' = fm2) modSel(fms) predict(fms, type='det', newdata = newData) ``` The parametric bootstrap can be used to check the adequacy of model fit. Here we use a $\chi^2$ statistic appropriate for binary data. ```{r, warning=FALSE} chisq <- function(fm) { umf <- fm@data y <- umf@y y[y>1] <- 1 sr <- fm@sitesRemoved if(length(sr)>0) y <- y[-sr,,drop=FALSE] fv <- fitted(fm, na.rm=TRUE) y[is.na(fv)] <- NA sum((y-fv)^2/(fv*(1-fv)), na.rm=TRUE) } (pb <- parboot(fm2, statistic=chisq, nsim=100, parallel=FALSE)) ``` We fail to reject the null hypothesis, and conclude that the model fit is adequate. ## Derived parameters and empirical Bayes methods The `parboot` function can be also be used to compute confidence intervals for estimates of derived parameters, such as the proportion of $N$ sites occupied $\mbox{PAO} = \frac{\sum_i z_i}{N}$ where $z_i$ is the true occurrence state at site $i$, which is unknown at sites where no individuals were detected. The `colext` vignette shows examples of using `parboot` to obtain confidence intervals for such derived quantities. An alternative way achieving this goal is to use empirical Bayes methods, which were introduced in `unmarked` version 0.9-5. These methods estimate the posterior distribution of the latent variable given the data and the estimates of the fixed effects (the MLEs). The mean or the mode of the estimated posterior distibution is referred to as the empirical best unbiased predictor (EBUP), which in `unmarked` can be obtained by applying the `bup` function to the estimates of the posterior distributions returned by the `ranef` function. The following code returns an estimate of PAO using EBUP. ```{r} re <- ranef(fm2) EBUP <- bup(re, stat="mode") sum(EBUP) / numSites(pcru) ``` Note that this is similar, but slightly lower than the population-level estimate of $\psi$ obtained above. A plot method also exists for objects returned by `ranef`, but distributions of binary variables are not so pretty. Try it out on a fitted abundance model instead. # References