---
title: "Building modules"
author: "Tom August & Tim Lucas"
date: '2015-12-07'
output:
  html_document:
    toc: yes
---

<!--
%\VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{Building modules}
-->



## The "1, 2, 3" of module building

The process of making a module is essentially

1. Write an R function
2. Run `BuildModule` with the function and metadata
3. *Optional* -- Upload to the zoon modules repository

Each module type is slightly different to write though the same three basic steps apply. Below we show an example of how to write each of the module types. We also link to pre-existing modules that you can use as templates.

## How to build an occurrence module

The aim of an occurrence module is to return a data.frame of occurrence data which can be used for modelling a species distribution. The example I'm going to show gets data from a fictional survey we have undertaken. The data was saved as a .csv and to share it we have placed it on Figshare.


```r
# Load zoon
library(zoon)

# Start building our function
Lorem_ipsum_UK <- function(){
```

In this case we have not given our function any arguments as we simply want to return the online dataset. However you could add arguments here to modify what your function returns (for an example [see the SpOcc module](https://github.com/zoonproject/modules/blob/master/R/SpOcc.R)).


```r
# I'm going to use the package 'RCurl' so first I get that
# using the zoon 'GetPackage' function
GetPackage('RCurl')
```

```
## Loading required package: RCurl
## Loading required package: bitops
```

It is important that you use the `GetPackage` function rather than `library` or `require` as it will also install the package if the user does not already have it installed. 
  

```r
# Next I retrieve the data from figshare
URL <- "http://files.figshare.com/2519918/Lorem_ipsum_data.csv"
x <- getURL(URL)
out <- read.csv(textConnection(x))
```

Now it is time to think about how we return our data. The output format for occurrence modules is very important. If you do not ensure that the format is correct then your module will not work properly when entered into a workflow. An occurrence module must return a data.frame with the columns longitude, latitude, value, type and fold, the details are given at [the end of this document](#occIO)

Our occurrence data does not have all of these columns so we need to add them. Here is what our data currently look like.


```
##    startDate latitude longitude
## 1 2014-06-25 51.98917 0.8917427
## 2 2014-06-25 51.98917 0.8917427
## 3 2007-08-28 52.21136 0.6602159
## 4       <NA> 51.97564 0.9833449
## 5 1973-01-01 52.34187 0.7142953
## 6 2013-04-12 52.23719 0.7877316
```

So we need to do a little reformatting


```r
# Keep only Lat Long columns
out <- out[, c("latitude", "longitude")]

# Add in the columns we dont have
out$value <- 1 # all our data are presences
out$type <- 'presence'
out$fold <- 1 # we don't add any folds

# Now the data is in the correct format we can return it
return(out)
```

We have now written the R code for our occurrence module, this is what it looks like when you put it all together.


```r
Lorem_ipsum_UK <- function(){
  
  GetPackage('RCurl')
  
  # Get data
  URL <- "http://files.figshare.com/2519918/Lorem_ipsum_data.csv"
  x <- getURL(URL)
  out <- read.csv(textConnection(x))
  out <- out[, c("latitude", "longitude")]
  
  # Add in the columns we dont have
  out$value <- 1 # all our data are presences
  out$type <- 'presence'
  out$fold <- 1 # we wont add any folds
  
  return(out)
}
```

Now that we have our function written we can test it very simply in a workflow like this.


```r
workl1 <- workflow(occurrence = Lorem_ipsum_UK,
                   covariate = UKBioclim,
                   process = OneHundredBackground,
                   model = LogisticRegression,
                   output = PrintMap)
```

```
## Loading required package: dismo
```

![plot of chunk unnamed-chunk-8](figure/unnamed-chunk-8-1.png) 

This is a nice way to debug your function and ensure you are getting the results you expect.

Once you are happy that your function is working as you expect it to you can build you code into a module using the `BuildModule` function in `zoon`. This script adds in metadata including the type of module, authors' names, a brief description and documentation for the arguments it accepts (though this one doesn't accept any arguments).


```r
# Let's build our module
BuildModule(Lorem_ipsum_UK,
            type = 'occurrence',
            title = 'A dataset of Lorem ipsum occurrences',
            description = paste0('The module retrieves a dataset of',
            'Lorem ipsum records from figshare. This dataset contains',
            'precence only data and was collected between 1990 and',
            '2000 by members of to Lorem ipsum appreciation society'),
            details = 'This dataset is fake, Lorem ipsum does not exist',
            author = 'A.B. Ceidi',
            email = 'ABCD@anemail.com',
            dataType = 'presence-only')
```

```
## [1] "Lorem_ipsum_UK"
```

This function is fairly self explanatory however it is worth noting the `dataType` field. This must be any of 'presence-only', 'presence/absence', 'abundance' or 'proportion'. This is important so that people using your module in the future will know what it is going to output. 

`BuildModule` has now written an R file in our working directory containing the function and metadata, so that it can be shared with others.


```r
# First we remove the function from our workspace
rm(list = 'Lorem_ipsum_UK')

# This is how you would use a module that a colleague has sent you
LoadModule(module = 'Lorem_ipsum_UK.R')

work2 <- workflow(occurrence = Lorem_ipsum_UK,
                  covariate = UKBioclim,
                  process = OneHundredBackground,
                  model = LogisticRegression,
                  output = PrintMap)
```

Once we're happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit [the development pages](https://zoonproject.wordpress.com/) for more information.

## How to write a covariate module

The aim of a covariate module is to provide spatial information that will help to explain the distribution of a species. For example this data could be climate data, habitat data or topology.

A covariate module, like an occurrence module, does not have to take any arguments but must return a raster layer, brick or stack.

In this example we will create a covariate module that can provide a number of different climate layers for the area covering Australia.


```r
# Our function will take an argument to set the variable
# the user wants returned
AustraliaAir <- function(variable = 'rhum'){
```

When your module has arguments, as here, it is important to include default for all arguments. This make it easier for other users to use your modules and allows your module to be tested effectively when you upload it to the zoon repository.

The first step is to load the R packages that your code is going to need. It is important that you use the `GetPackage` function rather than `library` or `require` as it will also install the package if the user does not already have it installed.

In this example we do not need any external packages as the data we are downloading is a RasterStack object, and zoon already loads the `raster` package to deal with RasterStacks.


```
## class       : RasterStack 
## dimensions  : 18, 20, 360, 7  (nrow, ncol, ncell, nlayers)
## resolution  : 2.3, 2.222222  (x, y)
## extent      : 111, 157, -46, -6  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## names       :           air,           hgt,          rhum,          shum,         omega,          uwnd,          vwnd 
## min values  :  2.740523e+02,  1.362261e+03,  3.268068e+01,  3.416514e-03, -5.856714e-02, -6.532974e+00, -3.080663e+00 
## max values  :  2.964156e+02,  1.523139e+03,  8.030254e+01,  1.252498e-02,  5.942655e-02,  1.489644e+01,  4.789505e+00
```

To share this we have saved the object as an R data file and [placed it on Figshare](http://figshare.com/articles/NCEP_Australia/1610215) - attributing those that created the data. 

In our function we download this data into R


```r
# Load in the data
URL <- "http://files.figshare.com/2527274/aus_air.rdata"
load(url(URL)) # The object is called 'ras'

# Subset the data according the the variable parameter
ras <- subset(ras, variables)

return(ras)
```

We can test our function works by running it in a workflow with other modules


```r
AustraliaAir <- function(variables = 'rhum'){

  URL <- "http://files.figshare.com/2527274/aus_air.rdata"
  load(url(URL)) # The object is called 'ras'
  ras <- subset(ras, variables)
  return(ras)
  
}

# Select the variables we want
myVariables <- c('air','hgt','rhum','shum','omega','uwnd','vwnd')

work3 <- workflow(occurrence = SpOcc(extent = c(111, 157, -46, -6),
                                     species = 'Varanus varius',
                                     limit = 500),
                  covariate = AustraliaAir(variables = myVariables),
                  process = OneHundredBackground,
                  model = LogisticRegression,
                  output = PrintMap)
```

![plot of chunk unnamed-chunk-14](figure/unnamed-chunk-14-1.png) 

Once we are happy with the function we have written we need to use the `BuildModule` function to convert our function into a module by adding in the necessary metadata


```r
# Build our module
BuildModule(AustraliaAir,
            type = 'covariate',
            title = 'Australia Air data from NCEP',
            description = paste('This modules provides access to the',
                                'NCEP air data for austrlia provided by',
                                'NCEP and should be attributed to Climatic',
                                'Research Unit, University of East Anglia'),
            details = paste('These data are redistributed under the terms of',
                            'the Open Database License',
                            'http://opendatacommons.org/licenses/odbl/1.0/'),
            author = 'Z.O. Onn',
            email = 'zoon@zoon-zoon.com',
            paras = list(variables = paste('A character vector of air variables',
                         'you wish to return. This can include any number of',
                         "the following: 'air','hgt','rhum','shum','omega',",
                         "'uwnd','vwnd'")))
```

```
## [1] "AustraliaAir"
```

`BuildModule` is fairly self explanatory but it is worth noting the `paras` argument. This takes a named list of the parameters the module takes. This should follow the following structure; *list(parameterName = 'Parameter description.', anotherParameter = 'Another description.')* 

Once `BuildModule` has been run there will be an R file in our working directory that represents our module and can be shared with others. This R script can be used as follows.


```r
# remove the original function from our environment
rm(list = 'AustraliaAir')

# Load the module script
LoadModule('AustraliaAir.R')

work4 <- workflow(occurrence = SpOcc(extent = c(111, 157, -46, -6),
                                     species = 'Varanus varius',
                                     limit = 500),
                  covariate = AustraliaAir,
                  process = OneHundredBackground,
                  model = LogisticRegression,
                  output = PrintMap)
```

Once we're happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit [the development pages](https://zoonproject.wordpress.com/) for more information.

## How to write a process module

The aim of a process model is to modify the occurrence data or/and the covariate data prior to modelling. Examples include adding background points, or adding folds for cross-validation.

A process model returns data in exactly the same format that it accepts data. It takes and returns a list of two elements. The first element is a data.frame with the columns values, type, fold, longitude, latitude ([see Occurrence module output](#occIO)), and additional covariate columns. The covariate columns are added internally in the zoon workflow by combining the output of the covariate module. The second element of the list is a RasterBrick, RasterLayer, or RasterStack as output by a covariate module.

In this example we are going to create a process module that cuts down our occurrence data to a user supplied extent.

When writing a module it is useful to have example input to test with. One way to do this is to run a similar workflow and use the outputs of that workflow to test yours. Here is an example:


```r
# We run a very simple workflow so that we can get example input
# for our module
work5 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate  = UKAir,
                  process    = NoProcess,
                  model      = LogisticRegression,
                  output     = PrintMap)
```

![plot of chunk unnamed-chunk-17](figure/unnamed-chunk-17-1.png) 

```r
# The output from a process module is in the same format as the 
# input, so we can use the output of NoProcess as the testing
# input for our module. Note that this object should be called
# .data
.data <- work5$process.output[[1]]

str(.data, 2)
```

```
## List of 2
##  $ df :'data.frame':	188 obs. of  6 variables:
##   ..$ longitude: num [1:188] 1.01 -0.16 -2.83 -0.63 -3.53 ...
##   ..$ latitude : num [1:188] 52.4 51.6 53.4 51.6 56 ...
##   ..$ value    : num [1:188] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ type     : chr [1:188] "presence" "presence" "presence" "presence" ...
##   ..$ fold     : num [1:188] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ layer    : num [1:188] 271 272 272 272 271 ...
##  $ ras:Formal class 'RasterLayer' [package "raster"] with 12 slots
```

It is important to note that the list object that is passed into a process module is named `.data`, and so when writing our module we need to adhere to this convention.


```r
# Start writing our module
ClipOccurence <- function(.data, extent = c(-180, 180, -180, 180)){
```

Here we have remembered to give `.data` as an argument as this is a [default for process modules](#proIO). In addition we have supplied an argument for the extent and set the default to the entire globe (i.e. no clipping). It is important that all of your arguments have defaults (even if the default might not be a good idea in practice), as this allows the zoon system to perform automatic testing on your modules when you share them online.


```r
# Write the body of our function
# extract the occurrence data from the .data object
occDF <- .data$df

# Subset by longitude
occSub <- occDF[occDF$longitude >= extent[1] &
                occDF$longitude <= extent[2], ]

# Subset by latitude
occSub <- occSub[occSub$latitude >= extent[3] &
                 occSub$latitude <= extent[4], ]

# assign this data.frame back to the .data object
.data$df <- occSub
```

So our simple process function looks like this:


```r
ClipOccurrence <- function(.data, extent = c(-180, 180, -180, 180)){
  
  # Write the body of our function
  # extract the occurrence data from the .data object
  occDF <- .data$df
  
  occSub <- occDF[occDF$longitude >= extent[1] &
                  occDF$longitude <= extent[2], ]
 
  occSub <- occSub[occSub$latitude >= extent[3] &
                   occSub$latitude <= extent[4], ]
  
  .data$df <- occSub
  
  return(.data)
  
}
```

Our next step is to test that this function will work in a workflow. Once we have read in our function so that it is available in our working environment we can then include it in a workflow as we would a normal module.


```r
# Run a workflow with our new process
# In this example we first add background points, then clip the data
work6 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate  = UKAir,
                  process    = Chain(OneHundredBackground,
                                     ClipOccurrence(extent = c(-3, 2, 50, 53))),
                  model      = LogisticRegression,
                  output     = PrintMap)
```

![plot of chunk unnamed-chunk-21](figure/unnamed-chunk-21-1.png) 

We can see that the data has been clipped to the extent we specified in the map printed by the output module.

The next stage is to turn this function into a module which is shareable. To do this we need to add metadata to our function using the `BuildModule` function


```r
# Build our module
BuildModule(ClipOccurrence,
            type = 'process',
            title = 'Clip occurrence data to extent',
            description = paste('This process module clips the occurrence',
                                'data that is returned from the occurrence',
                                'module to a user defined extent'),
            details = paste('The extent is a square region which denotes the',
                            'area within which observations will be kept.',
                            'All data that falls outside of the extent will',
                            'be removed and will be not be used in the',
                            'modelling process'),
            author = 'Z.O. Onn',
            email = 'zoon@zoon-zoon.com',
            paras = list(extent = paste('A numeric vector of length for',
                                        'giving (in this order) the minimum',
                                        'longitude, maximum longitude, minimum',
                                        'latitude, maximum latitude.')),
            dataType = c('presence-only', 'presence/absence', 'abundance',
                         'proportion'))
```

```
## [1] "ClipOccurrence"
```

Much of how to use `BuildModule` is self-explanatory but two parameters are worth mentioning here. The `paras` argument takes a named list of the parameters the module takes. This should follow the following structure; *list(parameterName = 'Parameter description.', anotherParameter = 'Another description.')*, but should not include the defaults (i.e. we do not include `.data`). `dataType` describes the types of occurrence data that this module will work with. Certain modules might only work with presence-only data for example. In our case, our module will work with any type of data and so we list all the data types in the `dataType` field.

Once `BuildModule` has been run there will be an R file in our working directory that represents our module and can be shared with others. This R script can be used as follows.


```r
# remove the original function from our environment
rm(list = 'ClipOccurrence')

# Load the module script
LoadModule('ClipOccurrence.R')
```

```
## [1] "ClipOccurrence"
```

```r
work7 <- workflow(occurrence = CWBZimbabwe,
                  covariate = Bioclim(extent = c(31, 34, -22, -18)),
                  process = ClipOccurrence(extent = c(32, 33, -21, -19)),
                  model = LogisticRegression,
                  output = PrintMap)
```

![plot of chunk unnamed-chunk-23](figure/unnamed-chunk-23-1.png) 

Once we're happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit [the development pages](https://zoonproject.wordpress.com/) for more information.

## How to write a model module

Here is a simple function that will become our module. It is a model module that uses general additive models. We will work through it one element at a time

First we start our function by declaring all the parameters we need, including all the defaults


```r
GamGam <- function(.df){
```

Since this is a model module the only default is `.df`. To find out more about defaults see the section [Module IO definitions for module developers](#ModIO).

Next we specify the packages our function needs. These should be specified by using `GetPackage` function in the zoon package. This function will load the package if the user of your module already has it or will install it from CRAN if they don't. For this reason make sure your package only uses packages that are on CRAN.


```r
# Specify the packages we need using the function
# GetPackage
zoon:::GetPackage("gam")
```

Next we can add the code that does our modelling, here we create a simple GAM (Generalised Additive Model) using the package [gam](https://cran.r-project.org/web/packages/gam/index.html)


```r
# Create a data.frame of covariate data
covs <- as.data.frame(.df[, 6:ncol(.df)])
names(covs) <- names(.df)[6:ncol(.df)]

# do a bit of copy-pasting to define smooth terms for each covariate
f <- sprintf('.df$value ~ s(%s)',
                    paste(colnames(covs),
                          collapse = ') + s('))

# Run our gam model
m <- gam::gam(formula = formula(f),
              data = covs,
              family = binomial)
```

The final stage of building a model module is to write some code within the function to create a `ZoonModel` object. This is important as it standardises all outputs from model modules and crucially enables zoon to make predictions from them in a predictable and standard way.

We build a `ZoonModel` object by using the function `ZoonModel`. This takes three parameters

- **`model`**: Your model object
- **`code`**: A section of code that will use `model` [your model] and **`newdata`** [a new set of covariate data], to return a vector of predicted values, one for each row of `newdata`
- **`packages`**: A vector of characters naming the packages needed to run `code`


```r
# Create a ZoonModel object to return.
# this includes our model, predict method
# and the packages we need.
ZoonModel(model = m,
          code = {
          
          # create empty vector of predictions
          p <- rep(NA, nrow(newdata))
          
          # omit NAs in new data
          newdata_clean <- na.omit(newdata)
          
          # get NA indices
          na_idx <- attr(newdata_clean, 'na.action')
          
          # if there are no NAs then the index should 
          # include all rows, else it should name the 
          # rows to ignore
          if (is.null(na_idx)){
            idx <- 1:nrow(newdata)
          } else {
            idx <- -na_idx
          }
          
          # Use the predict function in gam to predict
          # our new values
          p[idx] <- gam::predict.gam(model,
                                     newdata_clean,
                                     type = 'response')
          return (p)
        },
        packages = 'gam')
```

With all these elements in place we now have our module complete. All together it looks like this.


```r
GamGam <- function(.df){

  # Specify the packages we need using the function
  # GetPackage
  zoon:::GetPackage("gam")
  
  # Create a data.frame of covariate data
  covs <- as.data.frame(.df[, 6:ncol(.df)])
  names(covs) <- names(.df)[6:ncol(.df)]
  
  # do a bit of copy-pasting to define smooth terms for each covariate
  f <- sprintf('.df$value ~ s(%s)',
                      paste(colnames(covs),
                            collapse = ') + s('))
  
  # Run our gam model
  m <- gam::gam(formula = formula(f),
                data = covs,
                family = binomial)
  
  # Create a ZoonModel object to return.
  # this includes our model, predict method
  # and the packages we need.
  ZoonModel(model = m,
            code = {
            
            # create empty vector of predictions
            p <- rep(NA, nrow(newdata))
            
            # omit NAs in new data
            newdata_clean <- na.omit(newdata)
            
            # get their indices
            na_idx <- attr(newdata_clean, 'na.action')
            
            # if there are no NAs then the index should 
            # include all rows, else it should name the 
            # rows to ignore
            if (is.null(na_idx)){
              idx <- 1:nrow(newdata)
            } else {
              idx <- -na_idx
            }
            
            # Use the predict function in gam to predict
            # our new values
            p[idx] <- gam::predict.gam(model,
                                       newdata_clean,
                                       type = 'response')
            return (p)
          },
          packages = 'gam')
  
}
```

We then run `BuildModule` on our function, adding the required metadata. As this module has no parameters other than `.df` which is not user specified, we don't need to set the `paras` argument, which would normally be used to document arguments. Default arguments, like `.df` are all signified by starting with a `.` and don't need to be documented as this will be written into the module documentation automatically.


```r
BuildModule(object = GamGam,
            type = 'model',
            title = 'GAM sdm model',
            description = 'This is my mega cool new model.',
            details = paste('This module performs GAMs (Generalised Additive',
                            'Models) using the gam function from the package gam.'),
            author = 'Z. Oon',
            email = 'zoon@zoon.com',
            dataType = c('presence-only', 'presence/absence'))
```

```
## [1] "GamGam"
```

This is now a run-able module.


```r
# remove the function in our workspace else
# this will cause problems
rm(GamGam)

# Load in teh module we just built
LoadModule('GamGam.R')
```

```
## [1] "GamGam"
```

```r
# Run a workflow using our module
work8 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate = UKAir,
                  process  = OneHundredBackground,
                  model = GamGam,
                  output = PrintMap)
```

![plot of chunk unnamed-chunk-29](figure/unnamed-chunk-29-1.png) 

Once we're happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit [the development pages](https://zoonproject.wordpress.com/) for more information.

## How to write a output module

An output module is the last module in a zoon workflow and is an opportunity to summarise the model results, make predictions, or otherwise visualise the data or results. The input to output modules is a combination of the outputs of occurrence, covariate, process and model modules providing many possible output types.

In this example we will create an output module that uses the model output to predict the species occurrence in a new location given by a user-provided raster.

When writing a module it is useful to have example input to test with. One way to do this is to run a similar workflow and use the outputs of that workflow to test yours. Here is an example:


```r
# We run a very simple workflow so that we can get example input
# for our module
work9 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate  = UKAir,
                  process    = OneHundredBackground,
                  model      = LogisticRegression,
                  output     = PrintMap)

# The input to an output module is a combination of the output
# from the model module and the covariate module. We can recreate
# it for this work flow like this
.model <- work9$model.output[[1]]
.ras <- work9$covariate.output[[1]]
```

Both `.model` and `.ras` are default arguements for an output model so it is important that you have them as arguements for your module, even if you dont use them both. It is also important that you stick to the same naming conventions.


```r
# Our output module takes the default parameters and a user-defined
# Raster* object that has the same structure as the raster layer output
# by the covariate module
PredictNewRasterMap <- function(.model, .ras, raster = .ras){
```

It is important to have default values for all user defined parameters so that your module can be tested when you upload it to the zoon website. Here we set our default 'new area' raster to be the same as the raster used to create the model. Clearly this is not how we envisage the module being used in a real application (unless they genuinely wanted to predict back to the same area), however this ensures that this module will always work with its default arguments, no matter what workflow it is placed in. 


```r
# The first step is to extract the covariate values
# from the user provided raster
vals <- data.frame(getValues(raster))
colnames(vals) <- names(raster)
```

Once we have these new values we can predict using the `ZoonPredict` function. This function is very useful as it simplifies the process of making predictions from the ouput of a model module. See the [InteractiveMap](https://github.com/zoonproject/modules/blob/master/R/InteractiveMap.R) module for an innovative visualisation using predicted values.


```r
# Make predictions to the new values
pred <- ZoonPredict(.model$model,
                    newdata = vals)

# Create a copy of the users' raster...
# (just a single layer)
pred_ras <- raster[[1]]
    
# ... and assign the predicted values to it
pred_ras <- setValues(pred_ras, pred)
```

Once we have the raster of predicted values we can plot it and return the results to the user.


```r
# Plot the predictions as a map
plot(pred_ras)

# Return the raster of predictions
return (pred_ras)
```

Our function now looks like this:


```r
PredictNewRasterMap <- function(.model, .ras, raster = .ras){
  
  # Extract the values from the user provided raster
  vals <- data.frame(getValues(raster))
  colnames(vals) <- names(raster)
  
  # Make predictions to the new values
  pred <- ZoonPredict(.model$model,
                      newdata = vals)
  
  pred_ras <- raster[[1]]
  pred_ras <- setValues(pred_ras, pred)
  
  # Print the predictions as a map
  plot(pred_ras)
  
  return(pred_ras)
}
```

Our next step is to test that this function will work in a workflow. Once we have read in our function so that it is available in our working environment we can then include it in a workflow as we would a normal module.


```r
# Run it with the defaults
work10 <- workflow(occurrence = UKAnophelesPlumbeus,
                   covariate  = UKBioclim,
                   process    = OneHundredBackground,
                   model      = LogisticRegression,
                   output     = PredictNewRasterMap)
```

![plot of chunk unnamed-chunk-36](figure/unnamed-chunk-36-1.png) 

```r
# Now I'm going to run it with a different raster
library(raster)

# Get Bioclim data (using the getData function in the raster package,
# which zoon loads) ...
BioclimData <- getData('worldclim', var = 'bio', res = 5)
BioclimData <- BioclimData[[1:19]]

# ... and crop to Australia
cropped <- crop(BioclimData,
                c(109,155,-46,-7))

# Run it with my new raster
work11 <- workflow(occurrence = UKAnophelesPlumbeus,
                   covariate  = UKBioclim,
                   process    = OneHundredBackground,
                   model      = LogisticRegression,
                   output     = PredictNewRasterMap(raster = cropped))
```

![plot of chunk unnamed-chunk-36](figure/unnamed-chunk-36-2.png) 

```r
# The prediction map should also be returned as a raster
str(work11$report, 2)
```

```
## List of 1
##  $ :Formal class 'RasterLayer' [package "raster"] with 12 slots
```

The next stage is to turn this function into a module which is shareable. To do this we need to add metadata to our function using the `BuildModule` function


```r
# Build our module
BuildModule(PredictNewRasterMap,
            type = 'output',
            title = 'Predict to a new raster and map',
            description = paste('This output module predicts the species',
                                'distribution in a new area given a new',
                                'raster'),
            details = paste('The results are printed as a map and a raster is',
                            'returned with the predicted values. It is important',
                            'that the new raster has the same structure as the',
                            'raster provided by the covariate module.',
                            'It must have the same covariate columns in the',
                            'same order.'),
            author = 'Z.O. On',
            email = 'zoon@zoon-zoon.com',
            paras = list(raster = paste('A RasterBrick, RasterLayer or RasterStack in',
                                        'the same format as the raster provided',
                                        'by the covariate module. Predicted values',
                                        'will be estimated for this raster using',
                                        'the results from the model module')),
            dataType = c('presence-only', 'presence/absence', 'abundance',
                         'proportion'))
```

```
## [1] "PredictNewRasterMap"
```

Much of how to use `BuildModule` is self-explanatory but two parameters are worth mentioning here. The `paras` argument takes a named list of the parameters the module takes in the following structure: `list(parameterName = 'Parameter description.', anotherParameter = 'Another description.')`, but should not include the defaults (i.e. we do not include `.model` or `.ras`). `dataType` describes the types of occurrence data that this module will work with. Certain modules might only work with presence-only data for example. In our case, our module will work with any type of data and so we list all the data types in the `dataType` field.

Once `BuildModule` has been run there will be an R file in our working directory that represents our module and can be shared with others. This R script can be used as follows.


```r
# remove the original function from our environment
rm(list = 'PredictNewRasterMap')

# Load the module script
LoadModule('PredictNewRasterMap.R')
```

```
## [1] "PredictNewRasterMap"
```

```r
# Now I model a crop pest from Zimbabwe in its home
# range and in Australia by chaining together
# output modules
work12 <- workflow(occurrence = CWBZimbabwe,
                   covariate = Bioclim(extent = c(28, 38, -24, -16)),
                   process = NoProcess,
                   model = RandomForest,
                   output = Chain(PrintMap,
                                  PredictNewRasterMap(raster = cropped)))
```

```
## Loading required package: randomForest
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
```

![plot of chunk unnamed-chunk-38](figure/unnamed-chunk-38-1.png) ![plot of chunk unnamed-chunk-38](figure/unnamed-chunk-38-2.png) 

Once we're happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit [the development pages](https://zoonproject.wordpress.com/) for more information.
<a id="ModIO"> </a>

## Module IO definitions for module developers 

The default input arguments and return values of modules are strict. However, any module type can have additional named input arguments, provided they have default values. A lot of the data frames include '+ covariates'. This indicates that the number of covariate columns is flexible.
<a id="occIO"></a>

### Occurrence

*In*: No default inputs

*Out*: `data.frame` with column names:

* `longitude`: The longitude of the observation.
* `latitude`: The latitude of the observation.
* `value`: The response value for the observation when used in a model. This can be 1 or 0 for presence/absence, an integer for abundance (e.g. 1, 3, 67), or a decimal number between 0 and 1 for proportions (e.g. 0.12, 0.5, 0.98).
* `type`: This is linked to `value` and dictates for each row of the data.frame the type of value given. This can be one of the following; `'presence'`, `'absence'`, `'background'`, `'abundance'`, `'proportion'`.
* `fold`: Folds are used to test your model. If we have, for example, 3 folds (1, 2, 3) then we can use the [`PerformanceMeasures` output module](https://github.com/zoonproject/modules/blob/master/R/PerformanceMeasures.R) to test the performance of the model. A common method, implemented by `PerformanceMeasures` is to build the model using all but one fold, and then test the models ability to predict the fold that was held back.  

### Covariate
*In*: No default inputs

*Out*: `RasterLayer`, `RasterBrick` or `RasterStack` object
<a id="proIO"></a>

### Process 
*In*: list named `.data` with 2 named elements:

* `df`: A data.frame with columns: `'values'`, `'type'`, `'fold'`, `'longitude'`, `'latitude'` plus additional names columns giving associated covariate values. See [occurrence module](#occIO) for details on these columns.
* `ras`: A `RasterLayer`, `RasterBrick` or `RasterStack` object of covariate rasters

Out:  list with 2 elements

* `df`: A data.frame with columns: `values`, `type`, `fold`, `longitude`, `latitude` plus additional names columns giving associated covariate values
* `ras`: A `RasterLayer`, `RasterBrick` or `RasterStack` object of covariate rasters

### Model
In: data.frame from process called **.df**

Out: A `ZoonModel` object (see the example above)

### Output
In:

* A list named **`.model`** with 2 named elements:

- `model`: A `ZoonModel` object from a model module
- `data`: A data.frame from a process module with the added column `predictions`

* A `RasterLayer`, `RasterBrick` or `RasterStack` object named **`.ras`**, provided by the covariate module

Out: Anything!

# Pictoral description of inputs and outputs
![OccurrenceModule](occurrenceInOut.svg)
![CovariateModule](covariateInOut.svg)
![ProcessModule](processInOut.svg)
![ModelModule](modelInOut.svg)
![OuputModule](outputInOut.svg)




