CRAN Task View: Analysis of Ecological and Environmental Data

Maintainer:Gavin L. Simpson
Contact:ucfagls at gmail.com
Version:2023-12-18
URL:https://CRAN.R-project.org/view=Environmetrics
Source:https://github.com/cran-task-views/Environmetrics/
Contributions:Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide.
Citation:Gavin L. Simpson (2023). CRAN Task View: Analysis of Ecological and Environmental Data. Version 2023-12-18. URL https://CRAN.R-project.org/view=Environmetrics.
Installation:The packages from this task view can be installed automatically using the ctv package. For example, ctv::install.views("Environmetrics", coreOnly = TRUE) installs all the core packages or ctv::update.views("Environmetrics") installs all packages that are not yet installed and up-to-date. See the CRAN Task View Initiative for more details.

Introduction

This Task View contains information about using R to analyse ecological and environmental data.

The base version of R ships with a wide range of functions for use within the field of environmetrics. This functionality is complemented by a plethora of packages available via CRAN, which provide specialist methods such as ordination & cluster analysis techniques. A brief overview of the available packages is provided in this Task View, grouped by topic or type of analysis. As a testament to the popularity of R for the analysis of environmental and ecological data, a special volume of the Journal of Statistical Software was produced in 2007.

Those interested in environmetrics should consult the Spatial view. Complementary information is also available in the Cluster, and SpatioTemporal task views.

If you have any comments or suggestions for additions or improvements, then please contact the maintainer or submit an issue or pull request in the GitHub repository linked above.

A list of available packages and functions is presented below, grouped by analysis type.

General packages

These packages are general, having wide applicability to the environmetrics field.

Modelling species responses and other data

Analysing species response curves or modelling other data often involves the fitting of standard statistical models to ecological data and includes simple (multiple) regression, Generalized Linear Models (GLM), extended regression (e.g. Generalized Least Squares [GLS]), Generalized Additive Models (GAM), and mixed effects models, amongst others.

Tree-based models

Tree-based models are being increasingly used in ecology, particularly for their ability to fit flexible models to complex data sets and the simple, intuitive output of the tree structure. Ensemble methods such as bagging, boosting and random forests are advocated for improving predictions from tree-based models and to provide information on uncertainty in regression models or classifiers.

Univariate trees

Tree-structured models for regression, classification and survival analysis, following the ideas in the CART book, are implemented in

Multivariate trees

Multivariate trees are available in

Ensembles of trees

Ensemble techniques for trees:

Graphical tools for the visualization of trees are available in package maptree.

Packages mda and earth implement Multivariate Adaptive Regression Splines (MARS), a technique which provides a more flexible, tree-based approach to regression than the piecewise constant functions used in regression trees.

Ordination

R and add-on packages provide a wide range of ordination methods, many of which are specialized techniques particularly suited to the analysis of species data. The two main packages are ade4 and vegan. ade4 derives from the traditions of the French school of “Analyse des Donnees” and is based on the use of the duality diagram. vegan follows the approach of Mark Hill, Cajo ter Braak and others, though the implementation owes more to that presented in Legendre & Legendre (1988) Numerical Ecology, 2nd English Edition, Elsevier. Where the two packages provide duplicate functionality, the user should choose whichever framework that best suits their background.

Model-based multivariate analysis

Multivariate model-based methods follow typical statistical modeling principles, but for multivariate responses. Model-based ordination methods reduce dimensionality of a model component (usually predictor effects of a random-effect covariance matrix), so that they share features with both ordination methods (the ordination) and regression (e.g., information criteria and residual diagnostics). It thus requires specifying a response distribution, and link function, instead of a dissimilarity measure. Unlike “classical” ordination methods, it is usually required to specify the number of ordination axes a priori of fitting the model. The following packages have different features and functionalities, but most support creating ordinations.

Dissimilarity coefficients

Much ecological analysis proceeds from a matrix of dissimilarities between samples. A large amount of effort has been expended formulating a wide range of dissimilarity coefficients suitable for ecological data. A selection of the more useful coefficients are available in R and various contributed packages.

Standard functions that produce, square, symmetric matrices of pair-wise dissimilarities include:

Function distance() in package analogue can be used to calculate dissimilarity between samples of one matrix and those of a second matrix. The same function can be used to produce pair-wise dissimilarity matrices, though the other functions listed above are faster. distance() can also be used to generate matrices based on Gower’s coefficient for mixed data (mixtures of binary, ordinal/nominal and continuous variables). Function daisy() in package cluster provides a faster implementation of Gower’s coefficient for mixed-mode data than distance() if a standard dissimilarity matrix is required. Function gowdis() in package FD also computes Gower’s coefficient and implements extensions to ordinal variables.

Cluster analysis

Cluster analysis aims to identify groups of samples within multivariate data sets. A large range of approaches to this problem have been suggested, but the main techniques are hierarchical cluster analysis, partitioning methods, such as k -means, and finite mixture models or model-based clustering. In the machine learning literature, cluster analysis is an unsupervised learning problem.

The Cluster task view provides a more detailed discussion of available cluster analysis methods and appropriate R functions and packages.

Hierarchical cluster analysis:

Partitioning methods:

Mixture models and model-based cluster analysis:

Ecological theory

There is a growing number of packages and books that focus on the use of R for theoretical ecological models.

Population dynamics

This section concerns estimation of population parameters (population size, density, survival probability, site occupancy etc.) by methods that allow for incomplete detection. Many of these methods use data on marked animals, variously called ‘capture-recapture’, ‘mark-recapture’ or ‘capture-mark-recapture’ data.

Packages secr can also be used to simulate data from the respective models.

See also the SpatioTemporal task view for analysis of animal tracking data under Moving objects, trajectories.

Modelling population growth rates:

Environmental time series

Additionally, a fuller description of available packages for time series analysis can be found in the TimeSeries task view.

Spatial data analysis

See the Spatial CRAN Task View for an overview of spatial analysis in R.

Extreme values

ismev provides functions for models for extreme value statistics and is support software for Coles (2001) An Introduction to Statistical Modelling of Extreme Values , Springer, New York. Other packages for extreme value theory include

See also the ExtremeValue task view for further information.

Phylogenetics and evolution

Packages specifically tailored for the analysis of phylogenetic and evolutionary data include:

UseRs may also be interested in Paradis (2006) Analysis of Phylogenetics and Evolution with R, Springer, New York, a book in the “Use R!” book series from Springer.

Soil science

Several packages are now available that implement R functions for widely-used methods and approaches in pedology.

Hydrology and Oceanography

A growing number of packages are available that implement methods specifically related to the fields of hydrology and oceanography. Also see the Extreme Value and the Climatology sections for related packages.

Climatology

Several packages related to the field of climatology.

Palaeoecology and stratigraphic data

Several packages now provide specialist functionality for the import, analysis, and plotting of palaeoecological data.

Other packages

Several other relevant contributed packages for R are available that do not fit under nice headings.

CRAN packages

Core:ade4, cluster, labdsv, MASS, mgcv, vegan.
Regular:amap, analogue, aod, ape, aqp, BiodiversityR, biogrowth, boral, boussinesq, bReeze, CircStats, circular, cocorresp, Distance, dsm, dyn, dynlm, e1071, earth, ecoCopula, ecodist, EnvStats, equivalence, evd, evdbayes, evir, extRemes, FD, flexmix, forecast, fso, gam, gamair, gjam, gllvm, glmmTMB, Hmsc, ipred, ismev, lme4, maptree, marked, mclust, mda, mefa, metacom, mrds, mvabund, nlme, nsRFA, oce, openair, ouch, party, pastecs, pgirmess, PMCMRplus, popbio, prabclus, pscl, pvclust, qualV, quantreg, quantregGrowth, R2jags, randomForest, Rbeast, Rcapture, rioja, RMark, RMAWGEN, rpart, rtop, seacarb, seas, secr, segmented, sensitivity, simecol, singleRcapture, siplab, sjSDM, soiltexture, spOccupancy, StreamMetabolism, strucchange, surveillance, TMB, tseries, unmarked, untb, VGAM, zoo.
Archived:dse, topmodel.

Related links

Other resources