--- title: "entropart" subtitle: "Entropy Partitioning to Measure Diversity" bibliography: entropart.bib output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to entropart} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r global_options, include=FALSE} set.seed(97310) ``` _entropart_ is a package for _R_ designed to estimate diversity based on HCDT entropy or similarity-based entropy. This is a short introduction to its use. The _entropart_ package allows estimating biodiversity according to the framework based on HCDT entropy, the correction of its estimation-bias [@Grassberger1988; @Chao2003; @Chao2015] and its transformation into equivalent numbers of species [@Hill1973; @Jost2006; @Marcon2014a]. Estimation of diversity at arbitrary levels of sampling, requiring interpolation or extrapolation [@Chao2014] is available Phylogenetic or functional diversity [@Marcon2014b] can be estimated, considering phyloentropy as the average species-neutral diversity over slices of a phylogenetic or functional tree [@Pavoine2009]. Similarity-based diversity [@Leinster2012] can be used to estimate [@Marcon2014e] functional diversity from a similarity or dissimilarity matrix between species without requiring building a dendrogram and thus preserving the topology of species [@Pavoine2005a; @Podani2007]. The classical diversity estimators (Shannon and Simpson entropy) can be found in many R packages. _vegetarian_ [@Charney2009] allows calculating Hill numbers and partitioning them according to Jost's framework. Bias correction is rarely available except in the _EntropyEstimation_ [@Cao2014] package which provides the Zhang and Grabchak's estimators of entropy and diversity and their asymptotic variance (not included in _entropart_). # Estimating the diversity of a community ## Community data Community data is a numeric vector containing abundances of species (the number of individual of each species) or their probabilities (the proportion of individuals of each species, summing to 1). Example data is provided in the dataset `paracou618`. Let's get the abundances of tree species in the 1-ha tropical forest plot #18 from Paracou forest station in French Guiana: ```{r LoadParacou18} library("entropart") data("Paracou618") N18 <- Paracou618.MC$Nsi[, "P018"] ``` The data in `Paracou618.MC` is a `MetaCommunity`, to be discovered later. `N18` is a vector containing the abundances of 425 tree species, among them some zero values. This is the most simple and common format to provide data to estimate diversity. It can be used directly by the functions presented here, but it may be declared explicitly as an abundance vector to plot it, and possibly fit a well-known, e.g. log-normal [@Preston1948], distribution of species abundance (the red curve): ```{r PlotN18} Abd18 <- as.AbdVector(N18) autoplot(Abd18, Distribution="lnorm") ``` Abundance vectors can also be converted to probability vectors, summing to 1: ```{r PN18} P18 <- as.ProbaVector(N18) ``` The `rCommunity` function allows drawing random communities: ```{r rCommunity} rc <- rCommunity(1, size=10000, Distribution = "lseries", alpha = 30) autoplot(rc, Distribution="lseries") ``` The Whittaker plot of a random log-series [@Fisher1943] distribution of 10000 individuals simulated with parameter $\alpha=30$ is produced. ## Diversity estimation The classical indices of diversity are richness (the number of species), Shannon's and Simpson's entropies: ```{r IndicesP} Richness(P18) Shannon(P18) Simpson(P18) ``` When applied to a probability vector (created with `as.ProbaVector` or a numeric vector summing to 1), no estimation-bias correction is applied: this means that indices are just calculated by applying their definition function to the probabilities (that is the plugin estimator). When abundances are available (a numeric vector of integer values or an object created by `as.ProbaVector`), several estimators are available [@Marcon2015a] to address unobserved species and the non-linearity of the indices: ```{r IndicesAbd} Richness(Abd18) Shannon(Abd18) Simpson(Abd18) ``` The best available estimator is chosen by default: its name is returned. Those indices are special cases of the Tsallis entropy [-@Tsallis1988] or order $q$ (respectively $q=0,1,2$ for richness, Shannon, Simpson): ```{r Tsallis} Tsallis(Abd18, q=1) ``` Entropy should be converted to its effective number of species, i.e. the number of species with equal probabilities that would yield the observed entropy, called @Hill1973 numbers or simply diversity [@Jost2006]. ```{r Diversity} Diversity(Abd18, q=1) ``` Diversity is the deformed exponential of order $q$ of entropy, and entropy is the deformed logarithm of of order $q$ of diversity: ```{r lnq} (d2 <- Diversity(Abd18,q=2)) lnq(d2, q=2) (e2 <-Tsallis(Abd18,q=2)) expq(e2, q=2) ``` Diversity can be plotted against its order to provide a diversity profile. Order 0 corresponds to richness, 1 to Shannon's and 2 to Simpson's diversities: ```{r DiversityProfile} DP <- CommunityProfile(Diversity, Abd18) autoplot(DP) ``` If an ultrametric dendrogram describing species' phylogeny (here, a mere taxonomy with family, genus and species) is available, phylogenetic entropy and diversity [@Marcon2014b] can be calculated: ```{r PhyloDiversity} summary(PhyloDiversity(Abd18,q=1,Tree=Paracou618.Taxonomy)) ``` With a Euclidian distance matrix between species, similarity-based diversity [@Leinster2012; @Marcon2014e] is available: ```{r SBDiversity} # Prepare the similarity matrix DistanceMatrix <- as.matrix(Paracou618.dist) # Similarity can be 1 minus normalized distances between species Z <- 1 - DistanceMatrix/max(DistanceMatrix) # Calculate diversity of order 2 Dqz(Abd18, q=2, Z) ``` Profiles of phylogenetic diversity and similarity-based diversity are obtained the same way. `PhyloDiversity` is an object with a lot of information so an intermediate function is necessary to extract its `$Total` component: ```{r PDiversityProfile} sbDP <- CommunityProfile(Dqz, Abd18, Z=Z) pDP <- CommunityProfile(function(X, ...) PhyloDiversity(X, ...)$Total, Abd18, Tree=Paracou618.Taxonomy) autoplot(pDP) ``` # Estimating the diversity of a meta-community ## Meta-community data A meta-community is an object defined by the package. It is a set of communities, each of them decribed by the abundance of their species and their weight. Species probabilities in the meta-community are by definition the weighted average of their probabilities in the communities. The easiest way to build a meta-community consists of preparing a dataframe whose columns are communities and lines are species, and define weights in a vector (by default, all weights are equal): ```{r MetaCommunitydf} library("entropart") (df <- data.frame(C1 = c(10, 10, 10, 10), C2 = c(0, 20, 35, 5), C3 = c(25, 15, 0, 2), row.names = c("sp1", "sp2", "sp3", "sp4"))) w <- c(1, 2, 1) ``` The `MetaCommunity` function creates the meta-community. It can be plotted: ```{r MetaCommunityMC} MC <- MetaCommunity(Abundances = df, Weights = w) plot(MC) ``` Each shade of grey represents a species. Heights correspond to the probability of species and the width of each community is its weight. `Paracou618.MC` is an example meta-community provided by the package. It is made of two 1-ha communities (plots #6 and #18) of tropical forest. ## Diversity estimation High level functions allow computing diversity of all communities ($\alpha$ diversity), of the meta-community ($\gamma$ diversity), and $\beta$ diversity, i.e.\ the number of effective communities (the number of communities with equal weights and no common species that would yield the observed $\beta$ diversity). The `DivPart` function calculates everything at once, for a given order of diversity $q$: ```{r DivPart} p <- DivPart(q = 1, MC = Paracou618.MC) summary(p) ``` The $\alpha$ diversity of communities is `r round(p$TotalAlphaDiversity, 0)` effective species. $\gamma$ diversity of the meta-community is `r round(p$GammaDiversity, 0)` effective species. $\beta$ diversity is `r round(p$TotalBetaDiversity, 2)` effective communities, i.e. the two actual communities are as different from each other as `r round(p$TotalBetaDiversity, 2)` ones with equal weights and no species in common. The `DivEst` function decomposes diversity and estimates confidence interval of $\alpha$, $\beta$ and $\gamma$ diversity following @Marcon2012a. If the observed species frequencies of a community are assumed to be a realization of a multinomial distribution, they can be drawn again to obtain a distribution of entropy. ```{r DivEst, fig.width=6, fig.height=6} de <- DivEst(q = 1, Paracou618.MC, Simulations = 50) # Margin adjustment required for this vignette html output par(mar=c(1,1,2.2,1)) autoplot(de) ``` The result is a `Divest` object which can be summarized and plotted. `DivProfile` calculates diversity profiles. The result is a `DivProfile` object which can be summarized and plotted. ```{r DivProfile, fig.width=6, fig.height=6} dp <- DivProfile(q.seq = seq(0, 2, 0.1), Paracou618.MC) autoplot(dp) ``` Plot #18 can be considered more diverse than plot #6 because their profiles (top right figure, plot #18 is the dotted red line, plot #6, the solid black one) do not cross [@Tothmeresz1995]: its diversity is systematically higher. The shape of the $\beta$ diversity profile shows that the communities are more diverse when their dominant species are considered. The bootstrap confidence intervals of the values of diversity [@Marcon2012a; @Marcon2014a] are calculated if `NumberOfSimulations` is not 0. `DivPart`, `DivEst` and `DivProfile` use plugin estimators by default. To force them to apply the same estimators as community functions, the argument `Biased = FALSE` must be entered. They compute Tsallis entropy and Hill numbers by default. A dendrogram in the argument `Tree` or a similarity matrix in the argument `Z` will make them calculate phylogenetic diversity or similarity-based diversity. # Full documentation https://ericmarcon.github.io/entropart/ # References