% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/format-funs.R
\name{format_data}
\alias{format_data}
\title{Format count series}
\usage{
format_data(
  data,
  info = NULL,
  date = "date",
  count = "count",
  location = "location",
  species = "species",
  stat_method = "stat_method",
  lower_ci = "lower_ci",
  upper_ci = "upper_ci",
  sd = NULL,
  var = NULL,
  cv = NULL,
  field_method = NULL,
  pref_field_method = NULL,
  conversion_A2G = NULL,
  rmax = NULL,
  path = ".",
  na_rm = FALSE
)
}
\arguments{
\item{data}{a \code{data.frame} with at least five columns: \code{location},
\code{species}, \code{date}, \code{count}, and \code{stat_method}.

The \code{stat_method} field indicates the method used to estimate counts. It
can contain: \code{T} (total counts), \code{X} (guesstimate), and/or \code{S} (sampling).

If individual counts were estimated by \strong{sampling}, additional column(s)
providing a measure of precision is also required (e.g. \code{lower_ci} and
\code{upper_ci}, or \code{sd}, \code{cv}, \code{var}). Precision metrics can be different
between counts. For instance, some sampling counts can have a \code{sd} value
and others \code{lower_ci} and \code{upper_ci}. In that case three columns are
required (\code{lower_ci}, \code{upper_ci}, and \code{sd}). See above section
\strong{Description} for further information on the computation of the 95\%
confident interval of estimates.

If the individuals were counted by different methods, an additional field
\code{field_method} is also required. It can contain: \code{G} (ground counts)
and/or \code{A} (aerial counts). See above section \strong{Description} for further
information on the counts conversion.

Others fields can be present either in \code{data} or \code{info} (see below).}

\item{info}{(optional) a \code{data.frame} with species in rows and the following
columns: \code{species} (species name), \code{pref_field_method},
\code{conversion_A2G}, and \code{rmax}. See above section \strong{Description} for
further information on these fields.
Default is \code{NULL} (i.e. these information must be present in \code{data}
if not available in \code{popbayes}).}

\item{date}{a \code{character} string. The column name in \code{data} of the date.
This column \code{date} must be in a numerical form with possibly a decimal
part.
Default is \code{'date'}.}

\item{count}{a \code{character} string. The column name in \code{data} of the
number of individuals. This column must be numerical.
Default is \code{'count'}.}

\item{location}{a \code{character} string. The column name in \code{data} of the
site. This field is used to distinguish count series from different sites
(if required) and to create an unique series name.
Default is \code{'location'}.}

\item{species}{a \code{character} string. The column name in \code{data} (and
in \code{info} if provided) of the species. This field is used to distinguish
count series for different species (if required) and to create an unique
series name.
Default is \code{'species'}.}

\item{stat_method}{a \code{character} string. The column name in \code{data} of
the method used to estimate individuals counts. It can contain \code{'T'}
(total counts), \code{'X'} (guesstimate), and/or \code{'S'} (sampling). If some
counts are coded as \code{'S'}, precision column(s) must also be provided (see
below).
Default is \code{'stat_method'}.}

\item{lower_ci}{(optional) a \code{character} string. The column name in \code{data}
of the lower boundary of the 95\% CI of the estimate (i.e. \code{count}). If
provided, the upper boundary of the 95\% CI (argument \code{upper_ci}) must be
also provided. This argument is only required if some counts have been
estimated by a sampling method. But user may prefer use other precision
measures, e.g. standard deviation (argument \code{sd}), variance (argument
\code{var}), or coefficient of variation (argument \code{cv}).
Default is \code{'lower_ci'}.}

\item{upper_ci}{(optional) a \code{character} string. The column name in \code{data}
of the upper boundary of the 95\% CI of the estimate (i.e. \code{count}). If
provided, the lower boundary of the 95\% CI (argument \code{lower_ci}) must be
also provided.
Default is \code{'upper_ci'}.}

\item{sd}{(optional) a \code{character} string. The column name in \code{data} of
the standard deviation of the estimate.
Default is \code{NULL}.}

\item{var}{(optional) a \code{character} string. The column name in \code{data} of
the variance of the estimate.
Default is \code{NULL}.}

\item{cv}{(optional) a \code{character} string. The column name in \code{data} of
the coefficient of variation of the estimate.
Default is \code{NULL}.}

\item{field_method}{(optional) a \code{character} string. The column name in
\code{data} of the field method used to count individuals. Counts can be ground
counts (coded as \code{'G'}) or aerial counts (coded as \code{'A'}). This argument
is optional if individuals have been counted by the same method. See above
section \strong{Description} for further information on the count conversion.
Default is \code{NULL}.}

\item{pref_field_method}{(optional) a \code{character} string. The column name
in \code{data} of the preferred field method of the species. This argument is
only required is \code{field_method} is not \code{NULL} (i.e. individuals have been
counted by different methods). Alternatively, this value can be passed in
\code{info} (or internally retrieved if the species is listed in the package).
See above section \strong{Description} for further information on the count
conversion.
Default is \code{NULL}.}

\item{conversion_A2G}{(optional) a \code{character} string. The column name
in \code{data} of the count conversion factor of the species. This argument is
only required if \code{field_method} is not \code{NULL} (i.e. individuals have been
counted by different methods). Alternatively this value can be passed in
\code{info} (or internally retrieved if the species is listed in the package).
See above section \strong{Description} for further information on the count
conversion.
Default is \code{NULL}.}

\item{rmax}{(optional) a \code{character} string. The column name in \code{data} of
the species demographic potential (i.e. the relative rate of increase of
the population). This is the change in log population size between two
dates and will be used later by \code{\link[=fit_trend]{fit_trend()}}.
Default is \code{NULL}.}

\item{path}{a \code{character} string. The directory to save formatted data.
This directory must exist and can be an absolute or a relative path.
Default is the current working directory.}

\item{na_rm}{a \code{logical.} If \code{TRUE}, counts with \code{NA} values will be
removed.
Default is \code{FALSE} (returns an error to inform user if \code{NA} are detected).}
}
\value{
An n-elements \code{list} (where \code{n} is the number of count series). The
name of each element of this list is a combination of location and
species. Each element of the list is a \code{list} with the following content:
\itemize{
\item \code{location} a \code{character} string. The name of the series site.
\item \code{species} a \code{character} string. The name of the series species.
\item \code{date} a \code{numerical} vector. The sequence of dates of the
series.
\item \code{n_dates} an \code{integer.} The number of unique dates.
\item \code{stat_methods} a \code{character} vector. The different stat methods
of the series.
\item \code{field_methods} (optional) a \code{character} vector. The different
field methods of the series.
\item \code{pref_field_method} (optional) a \code{character} string. The
preferred field method of the species (\code{'A'} or \code{'G'}).
\item \code{conversion_A2G} (optional) a \code{numeric}. The conversion factor
of the species used to convert counts to its preferred field method.
\item \code{rmax} a \code{numeric}. The maximum population growth rate of the
species.
\item \code{data_original} a \code{data.frame}. Original data of the series
with renamed columns. Some rows may have been deleted
(if \code{na_rm = TRUE}).
\item \code{data_converted} a \code{data.frame}. Data containing computed
boundaries of the 95\% CI (\code{lower_ci_conv} and \code{upper_ci_conv}). If
counts have been obtained by different field methods, contains also
converted counts (\code{count_conv}) based on the preferred field method and
conversion factor of the species. This \code{data.frame} will be used by the
function \code{\link[=fit_trend]{fit_trend()}} to fit population models.
}

\strong{Note:} Some original series can be discarded if one of these two
conditions is met: 1) the series contains only zero counts, and 2) the
series contains only a few dates (< 4 dates).
}
\description{
This function provides an easy way to get count series ready to be analyzed
by the package \code{popbayes}. It must be used prior to all other functions.

This function formats the count series (passed through the argument
\code{data}) by selecting and renaming columns, checking columns format and
content, and removing missing data (if \code{na_rm = TRUE}). It converts the
original data frame into a list of count series that will be analyzed later
by the function \code{\link[=fit_trend]{fit_trend()}} to estimate population trends.

To be usable for the estimation of population trends, counts must be
accompanied by information on precision. The population trend model requires
a 95\% confident interval (CI).
If estimates are total counts or guesstimates, this function will construct
boundaries of the 95\% CI by applying the rules set out in
\url{https://frbcesab.github.io/popbayes/articles/popbayes.html}.
If counts were estimated by a sampling method the user needs to specify a
measure of precision. Precision is preferably provided in the form of a 95\%
CI by means of two fields: \code{lower_ci} and \code{upper_ci}. It may also be given
in the form of a standard deviation (\code{sd}), a variance (\code{var}), or a
coefficient of variation (\code{cv}). If the fields \code{lower_ci} and \code{upper_ci} are
both absent (or \code{NA}), fields \code{sd}, \code{var}, and \code{cv} are examined in this
order. When one is found valid (no missing value), a 95\% CI is derived
assuming a normal distribution.
The field \code{stat_method} must be present in \code{data} to indicate
if counts are \strong{total counts} (\code{'T'}), \strong{sampling} (\code{'S'}), or
\strong{guesstimate} (\code{'X'}).

If a series mixes aerial and ground counts, a field \code{field_method} must
also be present and must contain either \code{'A'} (aerial counts), or \code{'G'}
(ground counts). As all counts must eventually refer to the same field
method for a correct estimation of trend, a conversion will be performed to
homogenize counts. This conversion is based on a \strong{preferred field method}
and a \strong{conversion factor} both specific to a species/category.
The preferred field method specifies the conversion direction. The
conversion factor is the multiplicative factor that must be applied to an
aerial count to get an equivalent ground count (note that if the preferred
field method is \code{'A'}, ground counts will be divided by the conversion
factor to get the equivalent aerial count).

The argument \code{rmax} represents the maximum change in log population size
between two dates (i.e. the relative rate of increase). It will be used
by \code{\link[=fit_trend]{fit_trend()}} but must be provided in this function.

These three parameters, named \code{pref_field_method}, \code{conversion_A2G}, and
\code{rmax} can be present in \code{data} or in a second \code{data.frame}
(passed through the argument \code{info}).
Alternatively, the package \code{popbayes} provides their values for some
African large mammals.

\strong{Note:} If the field \code{field_method} is absent in \code{data}, counts are
assumed to be obtained with one field method.
}
\examples{
## Load Garamba raw dataset ----
file_path <- system.file("extdata", "garamba_survey.csv", 
                         package = "popbayes")
                         
garamba <- read.csv(file = file_path)

## Create temporary folder ----
temp_path <- tempdir()

## Format dataset ----
garamba_formatted <- popbayes::format_data(
  data              = garamba, 
  path              = temp_path,
  field_method      = "field_method",
  pref_field_method = "pref_field_method",
  conversion_A2G    = "conversion_A2G",
  rmax              = "rmax")

## Number of count series ----
length(garamba_formatted)

## Retrieve count series names ----
popbayes::list_series(path = temp_path)

## Print content of the first count series ----
names(garamba_formatted[[1]])

## Print original data ----
garamba_formatted[[1]]$"data_original"

## Print converted data ----
garamba_formatted[[1]]$"data_converted"
}
