--- title: "User Manual" author: "Nils Kehrein" date: "03 May, 2024" output: rmarkdown::html_vignette: toc: true toc_depth: 3 github_document: toc: true word_document: toc: true vignette: > %\VignetteIndexEntry{User Manual} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, fig.path = "../doc/figures/manual-", comment = "#>" ) ``` ```{r setup, include=FALSE} library(cvasi) ``` The *cvasi* package is a software library that extends the features of the programming language [*R*](https://www.r-project.org/) by routines for ecotox effect modeling. The routines are intended to be used for scientific purposes as well as in the context of regulatory risk assessments. *cvasi* implements model equations and default parameters to simulate a set of toxicokinetic-toxicodynamic (TKTD) models. The models are intended to assess the effects of substance exposure on effect endpoints, such as growth and reproduction, for a range of standard test species. The package offers a standardized framework which can be extended by (almost) any TKTD model that can be implemented in *R*. Its modeling workflows are designed to be self-explanatory and as intuitive as possible to make modeling approaches more accessible. In addition, the framework's routines are optimized to reduce the computing time required for extensive risk assessments. # Required skills The *cvasi* package and its routines can be used by anyone with basic to intermediate knowledge of the *R* programming language. The routines can be used in a procedural fashion, commonly referred to as *base R*, in which function statements are executed, assigned to a variable, and then passed on to the next function: ```{r} library(cvasi) ## Sample base R workflow ## # Create a scenario object of model GUTS-RED-IT my_it <- GUTS_RED_IT() # Set model parameters my_it <- set_param(my_it, c(kd=1.2, hb=0, alpha=9.2, beta=4.3)) # Print scenario details my_it ``` A more advanced approach is to use the *tidyr* syntax (cf. Wickham & Grolemund 2017) which uses verb-like statements to modify data. *tidyr* syntax can be used to chain functions calls which pass the result of one function on to the next: ```{r} ## Example 'tidyr' workflow ## # the pipeline (%>%) symbol passes results to the next statement GUTS_RED_IT() %>% set_param(c(kd=1.2, hb=0, alpha=9.2, beta=4.3)) ``` Both workflow styles achieve the same result, but *tidyr* is terser, especially if several operations are performed on the same object. *tidyr* style is the recommended way to interact with the *cvasi* package. # How to install The latest development version can be installed from GitHub: ```{r eval=FALSE} install.packages("remotes", dependencies=TRUE) remotes::install_github("cvasi-tktd/cvasi", dependencies=TRUE) ``` The package and its source code is also available on [GitHub](https://github.com/cvasi-tktd/cvasi/). # Modeling and assessment workflow ## Overview The central concept within the *cvasi* package is the *scenario*. It encapsulates all key properties to run a simulation such as initial state and model parameters. The scenario can be thought of as the collection of data and parameters necessary to reproduce a certain study condition or situation. The set of available scenario properties depends on the model type but may include: - Initial state, - Model parameters, - Assessment parameters, - Exposure time-series, and - Environmental factor time-series Scenario properties can be modified using intuitively named *tidyr* verbs: in general, each verb accepts a scenario as an argument and returns a modified scenario as the result. For instance, the `set_param()` function will update a scenario's parameters to new values: ```{r} library(cvasi) # Create a new GUTS-RED-IT scenario and set its model parameters GUTS_RED_IT() %>% set_param(c(kd=1.2, hb=0, alpha=9.2, beta=4.3)) ``` If a scenario is properly defined, it can be passed to dedicated functions that, for instance, run a simulation or derive assessment endpoints. These higher-level functions do not return a scenario object but a table of e.g. simulation results for that particular scenario. As an example, simulating the sample scenario `minnow_it` will run a *GUTS-RED-IT* model using substance-specific parameters: ```{r} # Example GUTS-RED-IT scenario derived from an acute fish toxicity study # of the fathead minnow and Chlorpyrifos (Geiger et al. 1988) minnow_it %>% simulate() ``` The table returned by `simulate()` contains the value of each state variable at the requested output time points. In case of the *GUTS-RED-IT* model, this comprises the state variables `D` (scaled damage) and `H` (cumulative hazard). Effect levels, effect profiles, and dose-response graphs can be derived in a similar manner. Please refer to EFSA PPR Panel (2018) or the package help for more information on *GUTS-RED* type models: ```{r, eval=FALSE} # Access the package help on GUTS-RED type models ?"GUTS-RED-models" ``` ## Setting up a scenario A scenario is created by first calling a constructor function that is dedicated to a particular model. The *cvasi* package supports multiple TKTD models, for instance: - Reduced General Unified Threshold models of Survival (*GUTS-RED*) - `GUTS_RED_SD()` - `GUTS_RED_IT()` - Macrophyte models - `Lemna_Schmitt()`, published by Schmitt *et al.* (2013) - `Lemna_SETAC()`, published by Klein *et al.* (2021) - `Myrio()`, an adaption of Klein *et al.* (2021) to *Myriophyllum* at Tier 2C - Dynamic Energy Budget (*DEB*) models - `DEB_abj()`, the *abj* model with type M acceleration The listed functions create objects that will carry all data and settings to fully describe a scenario. In case of a *GUTS-RED-IT* scenario, one needs to provide at least the four model parameters and an exposure time-series to fully define a scenario: ```{r} # Define an exposure time-series myexposure <- data.frame(time=c(0, 1, 1.01, 5), conc=c(10, 10, 0, 0)) # Create and parameterize a scenario GUTS_RED_IT() %>% set_param(c(kd=1.2, hb=0, alpha=9.2, beta=4.3)) %>% set_exposure(myexposure) -> myscenario ``` One can examine the main scenario properties by passing the scenario to the console like in the next example. The console will print the model name, model parameters (*param*), initial state (*init*), enabled effect endpoints (*endpt*), output times (*times*), environmental forcings (*forcs*), and exposure series (*expsr*). Depending on model type, the scenario may carry additional properties that are not necessarily displayed in the console output. ```{r} # Print information about the scenario myscenario ``` Several `set_*()` verbs are available to modify scenario properties, such as: - `set_tag()`, to assign a custom tag to identify the scenario - `set_init()`, to set the initial state - `set_param()`, to modify model parameters - `set_times()`, to define output time points - `set_forcings()`, to set environmental factors (i.e. *forcings*) - `set_exposure()`, to set an exposure time-series - `set_endpoints()`, to enable/disable assessment endpoints - `set_transfer()`, to define transfer intervals of *Lemna* fronds - `set_moa()`, to select a model's mode of action - `set_window()`, to control moving exposure windows Please note that `set_exposure()` will, by default, use the exposure series' time points also as output time points. If this behavior is undesired, it can be disabled by setting the argument `reset_times=FALSE`: ```{r} # Update the exposure time-series but keep former output time points myscenario %>% set_exposure(no_exposure(), reset_times=FALSE) ``` Note that `times` still has the original four elements although the assigned exposure series has only a single entry. Output time points can be explicitly set with `set_times()`. For deriving effect endpoints, the simulated time frame can be split up into moving exposure windows of a certain length using `set_window()`. A more complex scenario setup may look like the following workflow: ```{r} # Selected model parameters myparam <- c(p_M=3211, v=0.023, k_J=0.63) # Initial body length L myinit <- c(L=0.02) # Constant non-zero exposure myexposure <- data.frame(time=0, conc=1.72) DEB_abj() %>% set_param(myparam) %>% set_init(myinit) %>% set_exposure(myexposure) %>% set_times(0:10) %>% # Output times 0,1,2,...,10 set_mode_of_action(4) %>% # Method of Action #4 to be activated set_window(length=3) # Using moving exposure windows of length 3 days ``` ## Running a simulation After a scenario is fully set up, it can be passed to `simulate()` to return model results: ```{r} # Example scenario of the Lemna TKTD model metsulfuron %>% set_times(0:7) %>% simulate() ``` `simulate()` returns the model state for each output time point. In this example, it returns results for the period from zero to seven with an equi-distant time step length of one. The temporal resolution of results can be increased by specifying additional output times: ```{r} # Same simulation period, but with a smaller step length of 0.1 metsulfuron %>% set_times(seq(0, 7, 0.1)) %>% simulate() %>% tail() ``` The resulting table now contains ten times as many rows because we decreased the step length by a factor of ten but simulated the same period. The temporal resolution of simulations can have an influence on results, i.e. the numerical values of the returned variables. These differences usually originate from small numerical errors introduced by the solver of the model’s Ordinary Differential Equations. Generally, parameters such as the step-length in time can have influence on the precision of simulation results. There are several ways to increase the precision of simulation results without modifying a scenario's output times. One option is to decrease the solver’s maximum step length in time by setting the optional argument `hmax`. The smaller `hmax`, the more precise the results: ```{r} # Original simulation period, but with a maximum solver step length of hmax=0.01 metsulfuron %>% set_times(0:7) %>% simulate(hmax=0.01) ``` Although very effective, reducing `hmax` to small values may lead to inefficiently long runtimes of simulations. If numerical precision is an issue, it is advisable to also test other solver settings such as absolute and relative error tolerances (`atol` and `rtol`) or to switch to another solver method. Numerical precision and stability is a complex and advanced topic, please refer to the manual of `simulate()` for additional information: ```{r, eval=FALSE} ?cvasi::simulate ``` ## Deriving assessment endpoints Simulation results are the basis for deriving assessment endpoints such as effect levels and effect profiles. The *cvasi* package implements routines to derive these endpoints efficiently independent of the actual model that is being assessed. Hence, if a model is available within the package framework, it can be used to derive assessment endpoints. As an example, the following statement derives effect levels for a sample *GUTS-RED-IT* scenario: ```{r} # GUTS-RED-IT scenario of the fathead minnow and chlorpyrifos minnow_it %>% effect() ``` ```{r, include=FALSE} # make sure that value in text are still up to date testthat::expect_equal(minnow_it %>% effect() %>% dplyr::pull(L), 6.297e-5, tolerance=0.001) ``` The lethality endpoint `L` has a value of `6.3e-5` which means that lethality has only marginally increased by 0.0063% compared to a control scenario without exposure to the contaminant. The columns `L.dat.start` and `L.dat.end` denote the period for which the effect level was observed. In case of *GUTS-RED* models, this usually refers to the whole simulation period. For certain assessments it may be necessary to evaluate all exposure periods of a certain length within the original series separately. This approach is referred to as a *moving exposure window*: ```{r} # Setting up a custom scenario with a total simulation period of 14 days and # an exposure window length of 7 to assess a trapezoidal exposure pattern americamysis %>% set_window(7) %>% set_exposure(data.frame(t=c(0,3,4,7,8), c=c(0,0,3,3,0))) %>% set_times(0:14) -> mydeb # Derive maximum effect level of all exposure windows mydeb %>% effect() ``` ```{r, include=FALSE} # make sure that value in text are still up to date testthat::expect_equal(mydeb %>% effect() %>% dplyr::pull(L), 0.0521, tolerance=0.001) ``` The `americamysis` scenario provides two effect endpoints: structural length (`L`) and reproduction (`R`). Exposure to the simulated toxicant results at maximum in a 5.21% decrease in body length compared to control scenarios. The maximum effect was observed in the exposure window starting at time `2.0` and ending at `9.0`, cf. columns `L.dat.start` and `L.dat.end`. The toxicant had no effect on reproduction due to the simulation being too short for reproduction to occur. Scenarios can also be assessed regarding to so-called effect profiles, or *EPx* values for short. An *EPx* value quantifies the multiplication factor that is necessary to produce an effect level of x% for that particular scenario. As an example, an *EP50* of `17.0` would mean that the scenario's exposure series needs to be multiplied by a factor of 17 to observe a 50% effect. The function `epx()` can derive effect profiles efficiently with arbitrary precision: ```{r} # Restrict assessed endpoints to structural length (L) mydeb %>% set_endpoints("L") %>% epx() ``` ```{r, include=FALSE} # make sure that value in text are still up to date testthat::expect_equal(mydeb %>% set_endpoints("L") %>% epx(ep_only=TRUE) %>% unlist(use.names=FALSE), c(1.162598, 1.711914), ignore_attr=TRUE, tolerance=0.01) ``` In this example, a factor of `1.16` resulted in 10% effect on structural length and a factor of `1.71` in 50% effect. `epx()` provides means to control the precision of derived *EPx* values, the range of tested multiplication factors, factor cutoff thresholds, and debugging capabilities. To examine how a particular *EPx* value was derived, simply set the argument `verbose=TRUE`: ```{r} # Examine how the EP20 value is derived minnow_it %>% epx(level=20, verbose=TRUE) ``` In this example, the algorithm of `epx()` first tests a multiplication factor of `10` and then continues testing additional factors using a binary search algorithm until a factor is found that causes 20% effect. __Please note__: Depending on model and chosen parameters, it may be impossible to calculate *EPx* values (for selected effect levels), because the model or organism cannot reach the required state. As an example: If the simulated period of a *DEB_abj* scenario is too short, reproduction cannot occur. Therefore, effects on reproduction will always be zero independent of the exposure level. Consequently, *EPx* values for the reproduction endpoint will not be available. # Data format of model inputs Input data for the *cvasi* package relies on common *R* data structures and imposes only minimal requirements to set up scenarios: 1. **Data representing numerical values should be represented by numerical types**. Practically, this means that numerical scenario properties such as output time points must not be passed as string literals even if the string may be representing a number. Due to *R* being a dynamically typed language, parameter values and their types may have to be checked on import by the user. 1. **Time-series data must be represented by a `data.frame` with exactly two numerical columns**: The first column for time, the second for the series' value. The column names are irrelevant but sensible names may help documenting the data. Constant values can be represented by time-series with only a single row or entry. Therefore, it should be possible to apply the package's routines on existing datasets with minimal effort. Please refer to Wickham and Grolemund (2017) on how to implement plain and consistent processes to import and transform data using *tidyr* syntax. *cvasi* does support loading data from common sources, such as the exposure model *TOXSWA*, to ease data processing. Notable file formats and data structures that are supported are exposure time-series from the model FOCUS *TOXSWA*, as well as fitted *GUTS-RED* model parameter sets from [*morse*](https://cran.r-project.org/package=morse). # How to derive model inputs In general, all models require parameter values specific to the simulated species and substance. Some scenario types contain default values for selected model parameters as recommended by model authors. Scenarios providing default values include: - `Lemna_Schmitt()` and derived models are parameterized according to Schmitt *et al.* (2013) - `Lemna_SETAC()`, `Myrio()`, and derived models are parameterized as recommended by Klein *et al.* (2021) In addition, some models require values for environmental factors such as temperature and nutrient concentrations. These values or time-series are specific to a certain scenario and must be chosen accordingly. Parameters of TKTD sub-models are substance-specific and are usually obtained by fitting model parameters to observed data. The *cvasi* package supports this process with the function `calibrate()` which uses a frequentist approach for parameter fitting. However, parameters can also be derived by other means. Generally, the user is responsible for choosing model parameters and the package does not impose any restrictions in this regard. The package function `calibrate()` supports the following operation modes: - Fitting of a single parameter or multiple parameters simultaneously - Fit to one or more independent datasets - Weighting of datasets - Choosing the error function which is minimized - Selecting one of multiple optimization algorithms Parameter fitting builds upon the *R* function `stats::optim()` and it exposes full control over the fitting process. In this context, a dataset is represented by a *calibration set* object which consists of a scenario, observed data, and an optional weighting term. Please refer to the [Modeling Howto](cvasi-2-howto.html) for a description of the fitting process using `calibrate()`. As an alternative for *GUTS-RED* type models, model parameters can also be obtained using the [*morse* package](https://cran.r-project.org/package=morse). It uses a Bayesian approach for parameter fitting. # Description of model outputs The *cvasi* package does not use or define any custom file formats for model outputs but strictly relies on common *R* data structures. Simulation results and assessment endpoints are returned in tabular formats which can easily be serialized to common file formats such as `.csv` or *Microsoft Excel* files for further processing and data storage. The following subsections will describe in detail the return values of common package routines for simulating and assessing scenarios. ## Simulation results The return value of `simulate()` is a `data.frame` containing the value of each state variable at each requested output time. The first column represents *time*, followed by columns for each state variable: ```{r} ## Display selected scenario properties metsulfuron@times # Output times metsulfuron@init # Initial state ## Simulate the sample scenario metsulfuron %>% simulate() ``` The model state in the first row is, by definition, equal to the initial state of the scenario. The simulation result may contain additional columns that are derived variables or intermediate model variables. In case of the `metsulfuron` scenario, the internal toxicant concentration (`C_int`) was derived from `BM`, `M_int`, and model parameters. The number of fronds (`FrondNo`) was derived from `BM` and model parameters. Additional outputs such as `C_int` and `FrondNo` are included in results for user convenience and to support model understanding. Additional outputs can be disabled by setting the optional argument `nout=0`: ```{r} metsulfuron %>% simulate(nout=0) %>% head(5) ``` Intermediate model variables can be enabled in the same manner: ```{r} metsulfuron %>% simulate(nout=8) %>% head(5) ``` ## Effect levels The return value of `effect()` is a `data.frame` containing the effect level of each selected endpoint. The first column, `scenario`, contains the original scenario which was the basis of the assessment. Followed by the effect level and the exposure window in which the maximum effect was observed: ```{r} metsulfuron %>% set_times(0:7) %>% # restrict scenario to the period [0,7] effect() ``` Effect levels are reported as a fraction relative to a control scenario. Effects can be converted to percentages by multiplying the value with `100` (%). For scenarios with moving exposure windows disabled, the reported window will cover the entire simulation period. If multiple windows result in identical maximum effect levels, the first matching window will be reported. The table's content can be reduced to endpoints only by setting the argument `ep_only=TRUE`. `effect()` can also be instructed to return the effect levels of all assessed exposure windows by setting the argument `max_only=FALSE`: ```{r} metsulfuron %>% set_window(length=7, interval=1) %>% # enable moving exposure windows effect(max_only=FALSE) # return effects of all windows ``` In few cases, `effect()` may report spurious non-zero effect levels that are very close to zero. These can either be avoided by increasing the numerical precision of simulations or by setting a marginal effect threshold which instructs `effect()` to zero any effect below that threshold. A warning will be shown if a marginal effect threshold greater than 1% is chosen. Applying a threshold of 1% to the previous example results in zeroing the effect in the last and second to last exposure window but leaving the remaining results unchanged: ```{r} metsulfuron %>% set_window(length=7, interval=1) %>% # enable moving exposure windows effect(max_only=FALSE, marginal_effect=0.01) # return effects of all windows ``` ## Effect profiles The return value of `epx()` is a `data.frame` containing the effect profile (*EPx*) of each selected endpoint and effect level. The first column, `scenario`, contains the original scenario which was the basis of the assessment. Followed by the derived effect profiles, i.e. the multiplication factors to achieve x% effect: ```{r} metsulfuron %>% epx() ``` For the sample `metsulfuron` scenario, a factor of `0.395` would need to be applied to the exposure series to achieve an effect level of 10% on biomass (*BM*). An effect level of 50% on biomass is unattainable with the tested factors ranging from `1e-30` to `1e30` which is demonstrated by the warning message and the `NA` value for `BM.EP50`. The table's content can be reduced to endpoints only by setting the argument `ep_only=TRUE`. The range of tested factors can be controlled by setting the arguments `min_factor` and `max_factor`. Multiplication factors exceeding this range will raise a warning and `NA` will be returned for that *EPx*. Finding multiplication factors can be cut short if a certain margin of safety is reached by using the argument `factor_cutoff`: if the algorithm identifies that an *EPx* greater or equal to `factor_cutoff` is required, then `factor_cutoff` is reported as the result. This approach may be used to save computational resources for long running models if the exact *EPx* value above the cutoff is not of interest. The precision of *EPx* values is controlled by the argument `effect_tolerance` and is given as the upper absolute error threshold of effects that is deemed acceptable. The default value of `0.001` ensures that a derived *EPx* will result in an effect of x% ± 0.1. Decreasing the `effect_tolerance` will result in additional model iterations and longer runtime. The algorithm's inner workings can be examined by setting the argument `verbose=TRUE`: ```{r} metsulfuron %>% set_endpoints("BM") %>% epx(verbose=TRUE) ``` In this example, the binary search finds a `BM.EP10` with acceptable precision in less than ten iterations. For the `BM.EP50`, the algorithm re-uses the knowledge gained from previously performed calculations but is ultimately unable to find a suitable multiplication factor within the allowed range of multiplication factors. # References - EFSA PPR Panel (EFSA Panel on Plant Protection Products and their Residues), Ockleford C., Adriaanse P., Berny P., et al., 2018: *Scientific Opinion on the state of the art of Toxicokinetic/Toxicodynamic (TKTD) effect models for regulatory risk assessment of pesticides for aquatic organisms*. EFSA Journal 2018; 16(8):5377, 188 pp. DOI: [10.2903/j.efsa.2018.5377](https://doi.org/10.2903/j.efsa.2018.5377) - Geiger D.L., Call D.J., and Brooke L.T., 1988: *Acute toxicities of organic chemicals to fathead minnows (Pimephales promelas): Volume IV*, pp. 195-197. University of Wisconsin-Superior, Center for Lake Superior Environmental Studies. ISBN 9780961496838. - Klein J., Cedergreen N., Heine S., Reichenberger S., Rendal C., Schmitt W., Hommen U., 2021: *Refined description of the Lemna TKTD growth model based on Schmitt et al. (2013) - equation system and default parameters*. Report of the working group *Lemna* of the SETAC Europe Interest Group Effect Modeling. Version 1, uploaded on 22. Sept. 2021. https://www.setac.org/group/effect-modeling.html - Schmitt W., Bruns E., Dollinger M., and Sowig P., 2013: *Mechanistic TK/TD-model simulating the effect of growth inhibitors on Lemna populations*. Ecol Model 255, pp. 1-10. DOI: [10.1016/j.ecolmodel.2013.01.017](https://doi.org/10.1016/j.ecolmodel.2013.01.017) - Wickham H. and Grolemund G., 2017: *R for Data Science: Import, Tidy, Transform, Visualize, and Model Data* (1st ed.). O'Reilly Media. https://r4ds.had.co.nz