---
title: "User Manual"
author: "Nils Kehrein"
date: "03 May, 2024"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
  github_document:
    toc: true
  word_document:
    toc: true
vignette: >
  %\VignetteIndexEntry{User Manual}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  fig.path = "../doc/figures/manual-",
  comment = "#>"
)
```

```{r setup, include=FALSE}
library(cvasi)
```

The *cvasi* package is a software library that extends the features of the
programming language [*R*](https://www.r-project.org/) by routines for ecotox
effect modeling. The routines are intended to be used for scientific purposes as
well as in the context of regulatory risk assessments. *cvasi*  implements model
equations and default parameters to simulate a set of toxicokinetic-toxicodynamic
(TKTD) models. The models are intended to assess the effects of substance exposure
on effect endpoints, such as growth and reproduction, for a range of standard test
species.

The package offers a standardized framework which can be extended by (almost) any
TKTD model that can be implemented in *R*. Its modeling workflows are designed to be
self-explanatory and as intuitive as possible to make modeling approaches more
accessible. In addition, the framework's routines are optimized to reduce the
computing time required for extensive risk assessments.

# Required skills

The *cvasi* package and its routines can be used by anyone with basic to intermediate
knowledge of the *R* programming language. The routines can be used in a
procedural fashion, commonly referred to as *base R*, in which function statements
are executed, assigned to a variable, and then passed on to the next function:

```{r}
library(cvasi)

## Sample base R workflow ##
# Create a scenario object of model GUTS-RED-IT
my_it <- GUTS_RED_IT()
# Set model parameters
my_it <- set_param(my_it, c(kd=1.2, hb=0, alpha=9.2, beta=4.3))
# Print scenario details
my_it
```

A more advanced approach is to use the *tidyr* syntax (cf. Wickham & Grolemund 2017)
which uses verb-like statements to modify data. *tidyr* syntax can be used to
chain functions calls which pass the result of one function on to the next:

```{r}
## Example 'tidyr' workflow ##
# the pipeline (%>%) symbol passes results to the next statement
GUTS_RED_IT() %>%
  set_param(c(kd=1.2, hb=0, alpha=9.2, beta=4.3))
```

Both workflow styles achieve the same result, but *tidyr* is terser,
especially if several operations are performed on the same object.
*tidyr* style is the recommended way to interact with the *cvasi* package.

# How to install

<!--The simplest way to install the *cvasi* package is by downloading it from *CRAN*:

```{r eval=FALSE,include=FALSE}
install.packages("cvasi", dependencies=TRUE)
```
-->

The latest development version can be installed from GitHub:

```{r eval=FALSE}
install.packages("remotes", dependencies=TRUE)
remotes::install_github("cvasi-tktd/cvasi", dependencies=TRUE)
```

The package and its source code is also available on [GitHub](https://github.com/cvasi-tktd/cvasi/).


# Modeling and assessment workflow

## Overview

The central concept within the *cvasi* package is the *scenario*. It encapsulates
all key properties to run a simulation such as initial state and model parameters.
The scenario can be thought of as the collection of
data and parameters necessary to reproduce a certain study condition or situation.
The set of available scenario properties depends on the model type but may include:

 - Initial state,
 - Model parameters,
 - Assessment parameters,
 - Exposure time-series, and
 - Environmental factor time-series

Scenario properties can be modified using intuitively named *tidyr* verbs: in
general, each verb accepts a scenario as an argument and returns a modified
scenario as the result. For instance, the `set_param()` function will update 
a scenario's parameters to new values:

```{r}
library(cvasi)

# Create a new GUTS-RED-IT scenario and set its model parameters
GUTS_RED_IT() %>%
  set_param(c(kd=1.2, hb=0, alpha=9.2, beta=4.3))
```

If a scenario is properly defined, it can be passed to dedicated functions that,
for instance, run a simulation or derive assessment endpoints. These higher-level functions
do not return a scenario object but a table of e.g. simulation results for that
particular scenario. As an example, simulating
the sample scenario `minnow_it` will run a *GUTS-RED-IT* model using
substance-specific parameters:

```{r}
# Example GUTS-RED-IT scenario derived from an acute fish toxicity study
# of the fathead minnow and Chlorpyrifos (Geiger et al. 1988)
minnow_it %>%
  simulate()
```

The table returned by `simulate()` contains the value of each state variable at
the requested output time points. In case of the *GUTS-RED-IT* model, this
comprises the state variables `D` (scaled damage) and `H` (cumulative hazard).
Effect levels, effect profiles, and dose-response graphs
can be derived in a similar manner. Please refer to EFSA PPR Panel (2018) or the
package help for more information on *GUTS-RED* type models:

```{r, eval=FALSE}
# Access the package help on GUTS-RED type models
?"GUTS-RED-models"
```

## Setting up a scenario

A scenario is created by first calling a constructor function that is dedicated to
a particular model. The *cvasi* package supports multiple TKTD models, for instance:

- Reduced General Unified Threshold models of Survival (*GUTS-RED*)
   - `GUTS_RED_SD()`
   - `GUTS_RED_IT()`
- Macrophyte models
   - `Lemna_Schmitt()`, published by Schmitt *et al.* (2013)
   - `Lemna_SETAC()`, published by Klein *et al.* (2021)
   - `Myrio()`, an adaption of Klein *et al.* (2021) to *Myriophyllum* at Tier 2C
- Dynamic Energy Budget (*DEB*) models
   - `DEB_abj()`, the *abj* model with type M acceleration

The listed functions create objects that will carry all
data and settings to fully describe a scenario. In case of a *GUTS-RED-IT*
scenario, one needs to provide at least the four model parameters and an exposure
time-series to fully define a scenario:

```{r}
# Define an exposure time-series
myexposure <- data.frame(time=c(0, 1, 1.01, 5),
                         conc=c(10, 10, 0, 0))

# Create and parameterize a scenario
GUTS_RED_IT() %>%
  set_param(c(kd=1.2, hb=0, alpha=9.2, beta=4.3)) %>%
  set_exposure(myexposure) -> myscenario
```

One can examine the main scenario properties by passing the scenario to the
console like in the next example. The console will print the model name,
model parameters (*param*), initial state (*init*), enabled
effect endpoints (*endpt*), output times (*times*), environmental forcings
(*forcs*), and exposure series (*expsr*). Depending on model type, the
scenario may carry additional properties that are not necessarily displayed in the
console output.

```{r}
# Print information about the scenario
myscenario

```

Several `set_*()` verbs are available to modify scenario properties, such as:

 - `set_tag()`, to assign a custom tag to identify the scenario
 - `set_init()`, to set the initial state
 - `set_param()`, to modify model parameters
 - `set_times()`, to define output time points
 - `set_forcings()`, to set environmental factors (i.e. *forcings*)
 - `set_exposure()`, to set an exposure time-series
 - `set_endpoints()`, to enable/disable assessment endpoints
 - `set_transfer()`, to define transfer intervals of *Lemna* fronds
 - `set_moa()`, to select a model's mode of action
 - `set_window()`, to control moving exposure windows

Please note that `set_exposure()` will, by default, use the exposure series' time points
also as output time points. If this behavior is undesired, it can be disabled
by setting the argument `reset_times=FALSE`:

```{r}
# Update the exposure time-series but keep former output time points
myscenario %>%
  set_exposure(no_exposure(), reset_times=FALSE)
```

Note that `times` still has the original four elements although the assigned exposure
series has only a single entry. Output time points can be
explicitly set with `set_times()`. For deriving effect endpoints, the simulated
time frame can be split up into moving exposure windows of a certain length using
`set_window()`. A more complex scenario setup may look like the following workflow:

```{r}
# Selected model parameters
myparam <- c(p_M=3211, v=0.023, k_J=0.63)
# Initial body length L
myinit <- c(L=0.02)
# Constant non-zero exposure
myexposure <- data.frame(time=0, conc=1.72)

DEB_abj() %>%
  set_param(myparam) %>%
  set_init(myinit) %>%
  set_exposure(myexposure) %>%
  set_times(0:10) %>%             # Output times 0,1,2,...,10
  set_mode_of_action(4) %>%       # Method of Action #4 to be activated
  set_window(length=3)            # Using moving exposure windows of length 3 days
```

## Running a simulation

After a scenario is fully set up, it can be passed to `simulate()` to return
model results:

```{r}
# Example scenario of the Lemna TKTD model
metsulfuron %>%
  set_times(0:7) %>%
  simulate()
```

`simulate()` returns the model state for each output time point. In this example,
it returns results for the period from zero to seven with an equi-distant time
step length of one.
The temporal resolution of results can be increased by specifying additional
output times:

```{r}
# Same simulation period, but with a smaller step length of 0.1
metsulfuron %>%
  set_times(seq(0, 7, 0.1)) %>%
  simulate() %>%
  tail()
```

The resulting table now contains ten times as many rows because we decreased
the step length by a factor of ten but simulated the same period. The temporal
resolution of simulations can have an influence on results, i.e. the numerical
values of the returned variables.
These differences usually originate from small numerical errors introduced by the solver
of the model’s Ordinary Differential Equations. Generally, parameters such as the
step-length in time can have influence on the precision of simulation results.

There are several ways to increase the precision of simulation results without
modifying a scenario's output times. One option is to decrease
the solver’s maximum step length in time by setting the optional argument `hmax`.
The smaller `hmax`, the more precise the results:

```{r}
# Original simulation period, but with a maximum solver step length of hmax=0.01
metsulfuron %>%
  set_times(0:7) %>%
  simulate(hmax=0.01)
```

Although very effective, reducing `hmax` to small values may lead to inefficiently
long runtimes of simulations. If numerical precision is an issue,
it is advisable to also test other solver settings such as absolute and relative
error tolerances (`atol` and `rtol`) or to switch to another solver method.
Numerical precision and stability is a complex and advanced topic, please
refer to the manual of `simulate()` for additional information:

```{r, eval=FALSE}
?cvasi::simulate
```

## Deriving assessment endpoints

Simulation results are the basis for deriving assessment endpoints such
as effect levels and effect profiles. The *cvasi* package implements routines
to derive these endpoints efficiently independent of the actual model that is being
assessed. Hence, if a model is available within the package framework, it can be
used to derive assessment endpoints. As an example, the following statement
derives effect levels for a sample *GUTS-RED-IT* scenario:

```{r}
# GUTS-RED-IT scenario of the fathead minnow and chlorpyrifos
minnow_it %>% effect()
```
```{r, include=FALSE}
# make sure that value in text are still up to date
testthat::expect_equal(minnow_it %>% effect() %>% dplyr::pull(L), 6.297e-5, tolerance=0.001)
```
The lethality endpoint `L` has a value of `6.3e-5` which means that lethality
has only marginally increased by 0.0063% compared to a control scenario without
exposure to the contaminant.
The columns `L.dat.start` and `L.dat.end` denote the period for which the
effect level was observed. In case of *GUTS-RED* models, this usually refers to the
whole simulation period. For certain assessments it may be necessary to evaluate
all exposure periods of a certain length within the original series separately.
This approach is referred to as a *moving exposure window*:

```{r}
# Setting up a custom scenario with a total simulation period of 14 days and
# an exposure window length of 7 to assess a trapezoidal exposure pattern
americamysis %>%
  set_window(7) %>%
  set_exposure(data.frame(t=c(0,3,4,7,8), c=c(0,0,3,3,0))) %>%
  set_times(0:14) -> mydeb

# Derive maximum effect level of all exposure windows
mydeb %>% effect()
```
```{r, include=FALSE}
# make sure that value in text are still up to date
testthat::expect_equal(mydeb %>% effect() %>% dplyr::pull(L), 0.0521, tolerance=0.001)
```

The `americamysis` scenario provides two effect endpoints: structural length (`L`)
and reproduction (`R`). Exposure to the simulated toxicant results at maximum in a 5.21%
decrease in body length compared to control scenarios. The
maximum effect was observed in the exposure window starting at time `2.0` and
ending at `9.0`, cf. columns `L.dat.start` and `L.dat.end`. The toxicant had no effect
on reproduction due to the simulation being too short for reproduction to occur.

Scenarios can also be assessed regarding to so-called effect profiles, or *EPx* values for
short. An *EPx* value quantifies the multiplication factor that is necessary to
produce an effect level of x% for that particular scenario. As an example,
an *EP50* of `17.0` would mean that the scenario's exposure series needs to be
multiplied by a factor of 17 to observe a 50% effect. The function
`epx()` can derive effect profiles efficiently with arbitrary precision:

```{r}
# Restrict assessed endpoints to structural length (L)
mydeb %>% 
  set_endpoints("L") %>%
  epx()
```

```{r, include=FALSE}
# make sure that value in text are still up to date
testthat::expect_equal(mydeb %>%  set_endpoints("L") %>% epx(ep_only=TRUE) %>% unlist(use.names=FALSE), c(1.162598, 1.711914), ignore_attr=TRUE, tolerance=0.01)
```

In this example, a factor of `1.16` resulted in 10% effect on structural
length and a factor of `1.71` in 50% effect. `epx()` provides means to control the precision
of derived *EPx* values, the range of tested multiplication factors, factor
cutoff thresholds, and debugging capabilities. To examine how a particular
*EPx* value was derived, simply set the argument `verbose=TRUE`:

```{r}
# Examine how the EP20 value is derived
minnow_it %>% epx(level=20, verbose=TRUE)
```

In this example, the algorithm of `epx()` first tests a multiplication factor of
`10` and then continues testing additional factors using a binary search algorithm
until a factor is found that causes 20% effect.

__Please note__: Depending on model
and chosen parameters, it may be impossible to calculate *EPx* values (for selected
effect levels), because the model or organism cannot reach the required state.
As an example: If the simulated period of a *DEB_abj* scenario is too short,
reproduction cannot occur. Therefore, effects on reproduction will always be
zero independent of the exposure level. Consequently, *EPx* values for the
reproduction endpoint will not be available.





# Data format of model inputs

Input data for the *cvasi* package relies on common *R* data structures and
imposes only minimal requirements to set up scenarios:

1. **Data representing numerical values should be represented by
   numerical types**. Practically, this means that numerical scenario properties
   such as output time points must not be passed as string literals even if
   the string may be representing a number. Due to *R* being a dynamically typed
   language, parameter values and their types may have to be checked on import by
   the user.
1. **Time-series data must be represented by a `data.frame` with exactly
   two numerical columns**: The first column for time, the second
   for the series' value. The column names are irrelevant but sensible names
   may help documenting the data. Constant values can be represented by
   time-series with only a single row or entry.

Therefore, it should be possible to apply the package's routines on existing
datasets with minimal effort. Please refer to Wickham and Grolemund (2017)
on how to implement plain and consistent processes to import and transform data
using *tidyr* syntax.

*cvasi* does support loading data from common sources, such as the exposure model
*TOXSWA*, to ease data processing. Notable file formats and data structures
that are supported are exposure time-series from the model FOCUS *TOXSWA*, as
well as fitted *GUTS-RED* model parameter sets from
[*morse*](https://cran.r-project.org/package=morse).



# How to derive model inputs

In general, all models require parameter values specific to the simulated
species and substance. Some scenario types contain default values for selected model
parameters as recommended by model authors. Scenarios providing default values
include:

- `Lemna_Schmitt()` and derived models are parameterized according to Schmitt
   *et al.* (2013)
- `Lemna_SETAC()`, `Myrio()`, and derived models are parameterized as recommended
   by Klein *et al.* (2021)

In addition, some models require values for environmental factors such as
temperature and nutrient concentrations. These values or time-series are specific
to a certain scenario and must be chosen accordingly.

Parameters of TKTD sub-models are substance-specific and are usually obtained
by fitting model parameters to observed data. The *cvasi* package supports this process with
the function `calibrate()` which uses a frequentist approach for parameter
fitting. However, parameters can also be derived by other means. Generally,
the user is responsible for choosing model parameters and the package does
not impose any restrictions in this regard.

The package function `calibrate()` supports the following operation modes:

- Fitting of a single parameter or multiple parameters simultaneously
- Fit to one or more independent datasets
- Weighting of datasets
- Choosing the error function which is minimized
- Selecting one of multiple optimization algorithms

Parameter fitting builds upon the *R* function `stats::optim()` and it exposes full
control over the fitting process. In this context, a dataset is represented by a
*calibration set* object which consists of a scenario, observed data, and an
optional weighting term.
Please refer to the [Modeling Howto](cvasi-2-howto.html) for a description of the
fitting process using `calibrate()`.

As an alternative for *GUTS-RED* type models, model parameters can also be
obtained  using the [*morse* package](https://cran.r-project.org/package=morse).
It uses a Bayesian approach for parameter fitting.

# Description of model outputs

The *cvasi* package does not use or define any custom file formats for model outputs
but strictly relies on common *R* data structures. Simulation results and
assessment endpoints are returned in tabular formats which can easily be
serialized to common file formats such as `.csv` or *Microsoft Excel* files
for further processing and data storage.

The following subsections will describe in detail the return values of common
package routines for simulating and assessing scenarios.



## Simulation results

The return value of `simulate()` is a `data.frame` containing the value of each
state variable at each requested output time. The first column represents *time*,
followed by columns for each state variable:

```{r}
## Display selected scenario properties
metsulfuron@times  # Output times
metsulfuron@init   # Initial state

## Simulate the sample scenario
metsulfuron %>% simulate()
```

The model state in the first row is, by definition, equal to the initial state
of the scenario. The simulation result may contain additional columns that
are derived variables or intermediate model variables. In case of the
`metsulfuron` scenario, the internal toxicant concentration (`C_int`) was derived
from `BM`, `M_int`, and model parameters. The number of fronds (`FrondNo`)
was derived from `BM` and model parameters. Additional outputs such as `C_int`
and `FrondNo` are included in results for user convenience and to support model
understanding. Additional outputs can be disabled by setting the optional 
argument `nout=0`:

```{r}
metsulfuron %>%
  simulate(nout=0) %>%
  head(5)
```

Intermediate model variables can be enabled in the same manner:

```{r}
metsulfuron %>%
  simulate(nout=8) %>%
  head(5)
```

## Effect levels

The return value of `effect()` is a `data.frame` containing the effect level of
each selected endpoint. The first column, `scenario`, contains the original 
scenario which was the basis of the assessment. Followed by the effect level and
the exposure window in which the maximum effect was observed:

```{r}
metsulfuron %>%
  set_times(0:7) %>%  # restrict scenario to the period [0,7]
  effect()
```

Effect levels are reported as a fraction relative to a control scenario. Effects
can be converted to percentages by multiplying the value with `100` (%). For
scenarios with moving exposure windows disabled, the reported window will cover
the entire simulation period. If multiple windows result in identical maximum
effect levels, the first matching window will be reported.
The table's content can be reduced to endpoints only by setting the
argument `ep_only=TRUE`.

`effect()` can also be instructed to return the effect levels of all assessed
exposure windows by setting the argument `max_only=FALSE`:

```{r}
metsulfuron %>%
  set_window(length=7, interval=1) %>%   # enable moving exposure windows
  effect(max_only=FALSE)                 # return effects of all windows
```

In few cases, `effect()` may report spurious non-zero effect levels that are
very close to zero. These can either be avoided by increasing the numerical
precision of simulations or by setting a marginal effect threshold which instructs
`effect()` to zero any effect below that threshold. A warning will be shown
if a marginal effect threshold greater than 1% is chosen. Applying a threshold
of 1% to the previous example results in zeroing the effect in the last and
second to last exposure window but leaving the remaining results unchanged:

```{r}
metsulfuron %>%
  set_window(length=7, interval=1) %>%         # enable moving exposure windows
  effect(max_only=FALSE, marginal_effect=0.01) # return effects of all windows
```




## Effect profiles

The return value of `epx()` is a `data.frame` containing the effect profile (*EPx*) of
each selected endpoint and effect level. The first column, `scenario`, contains 
the original  scenario which was the basis of the assessment. Followed by the
derived effect profiles, i.e. the multiplication factors to achieve x% effect:

```{r}
metsulfuron %>% epx()
```

For the sample `metsulfuron` scenario, a factor of `0.395` would need to be applied
to the exposure series to achieve an effect level of 10% on biomass (*BM*). An
effect level of 50% on biomass is unattainable with the tested factors ranging
from `1e-30` to `1e30` which is demonstrated by the warning message and the
`NA` value for `BM.EP50`.

The table's content can be reduced to endpoints only by setting the
argument `ep_only=TRUE`. The range of tested factors can be controlled by
setting the arguments `min_factor` and `max_factor`. Multiplication factors
exceeding this range will raise a warning and `NA` will be returned for
that *EPx*. Finding multiplication factors can be cut short if a
certain margin of safety is reached by using the argument `factor_cutoff`:
if the algorithm identifies that an *EPx* greater or equal to `factor_cutoff` is required,
then `factor_cutoff` is reported as the result. This approach may
be used to save computational resources for long running models if the
exact *EPx* value above the cutoff is not of interest.

The precision of *EPx* values is controlled by the argument `effect_tolerance`
and is given as the upper absolute error threshold of effects that
is deemed acceptable. The default value of `0.001` ensures that a derived
*EPx* will result in an effect of x% ± 0.1. Decreasing the `effect_tolerance`
will result in additional model iterations and longer runtime.

The algorithm's inner workings can be examined by setting the argument `verbose=TRUE`:

```{r}
metsulfuron %>%
  set_endpoints("BM") %>%
  epx(verbose=TRUE)
```

In this example, the binary search finds a `BM.EP10` with acceptable precision in
less than ten iterations. For the `BM.EP50`, the algorithm re-uses the
knowledge gained from previously performed calculations but is ultimately unable
to find a suitable multiplication factor within the allowed range of multiplication
factors.



# References
- EFSA PPR Panel (EFSA Panel on Plant Protection Products and their Residues),
  Ockleford C., Adriaanse P., Berny P., et al., 2018: *Scientific Opinion on the
  state of the art of Toxicokinetic/Toxicodynamic (TKTD) effect models for
  regulatory risk assessment of pesticides for aquatic organisms*. EFSA Journal 2018;
  16(8):5377, 188 pp. DOI: [10.2903/j.efsa.2018.5377](https://doi.org/10.2903/j.efsa.2018.5377)
- Geiger D.L., Call D.J., and Brooke L.T., 1988: *Acute toxicities of organic
  chemicals to fathead minnows (Pimephales promelas): Volume IV*, pp. 195-197.
  University of Wisconsin-Superior, Center for Lake Superior Environmental Studies.
  ISBN 9780961496838.
- Klein J., Cedergreen N., Heine S., Reichenberger S., Rendal C.,
  Schmitt W., Hommen U., 2021: *Refined description of the Lemna TKTD growth model
  based on Schmitt et al. (2013) - equation system and default parameters*.
  Report of the working group *Lemna* of the SETAC Europe Interest Group Effect
  Modeling. Version 1, uploaded on 22. Sept. 2021.
  https://www.setac.org/group/effect-modeling.html
- Schmitt W., Bruns E., Dollinger M., and Sowig P., 2013: *Mechanistic TK/TD-model
  simulating the effect of growth inhibitors on Lemna populations*. Ecol Model 255,
  pp. 1-10. DOI: [10.1016/j.ecolmodel.2013.01.017](https://doi.org/10.1016/j.ecolmodel.2013.01.017)
- Wickham H. and Grolemund G., 2017: *R for Data Science: Import, Tidy, Transform,
  Visualize, and Model Data* (1st ed.). O'Reilly Media. https://r4ds.had.co.nz