---
title: "MUFASA Workflow Example"
author: "Holly Steeves"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{MUFASA Workflow Example}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Load Package

```{r}
library(QFASA)
library(dplyr)
library(compositions)
```

# Modeling Inputs

Prior to starting make sure that:

* Fatty acid names in all files are the same (contain the exact same 
  numbers/characters and punctuation)
  
* There are no fatty acids in the prey file that do not appear in the 
  predator file and visa versa


### Fatty Acid Set

* This is the list of FAs to be used in the modelling.

* The simplest alternative is to load a .csv file that contains a single 
  column with a header row with the names of thee fatty acids listed below 
  (see example file \textbf{``FAset.csv"}).
  
* A more complicated alternative is to load a .csv file with the full 
  set of FAs and then add code to subset the FAs you wish to use from that 
  set -> this alternative is useful if you are planning to test multiple sets.
  
* Regardless of how you load the FA set, it must be converted to a vector. 

```{r}
data(FAset)
fa.set = as.vector(unlist(FAset))
```

### Matrix of Predator FA Signatures

* The FA signatures in the originating .csv file should be proportions 
  adding to 1.
  
* Each predator signature is a row with the FAs in columns 
  (see example file \textbf{``predatorFAs.csv"}).  The FA signatures are 
  subsetted for the chosen FA set (created above) and renormalized DURING the 
  modelling so there is and/or renormalize prior to 
  loading the .csv file or running p.MUFASA BUT make sure that the same FAs 
  appear in the predator and prey files.
  
* Your predator FA .csv file can contain as much tombstone data columns as 
  you like, you must extract the predator FA signatures as separate input in 
  order to run in p.MUFASA. For example: in the code below, the predator 
  .csv file ("predatorFAs.csv") has 4 tombstone columns 
  (SampleCode, AnimalCode, SampleGroup, Biopsy). Prior to running MUFASA, 
  the tombstone (columns 1-4) and FA data (columns 5 onward) are each 
  extracted from the original data frame. The FA data becomes the 
  predator.matrix (which is passed to p.MUFASA) and the tombstone data is 
  retained so that it can be recombined with the model output later on.
  
```{r}
data(predatorFAs)
tombstone.info = predatorFAs[,1:4]
predator.matrix = predatorFAs[,5:(ncol(predatorFAs))]
npredators = nrow(predator.matrix)
```

### Matrix of Prey FA Signatures

* The FA signatures in the .csv file should be proportions that sum to 1.

* The prey file should contain all of the individual FA signatures of the 
prey and their lipid contents (where appropriate). 

* If you want to only include a subset of prey species, you must extract 
it prior to input (see code below).


```{r}
data(preyFAs)
prey.matrix = preyFAs[,-c(1,3)]

# Selecting 5 prey species to include
spec.red <-c("capelin", "herring", "mackerel", "pilchard", "sandlance")
spec.red <- sort(spec.red)
prey.red <- prey.matrix %>%
  filter(Species %in% spec.red)

```
  

### Prey Lipid Content (Fat Content)

* Mean lipid content by species group is calculated from the full prey file 
using the species group as a summary variable (see code below).

* Note: If no lipid content correction is going to be applied, 
then a vector of 1s of length equal to the number of species groups is used as 
the vector instead. I.e. FC - rep(1,nrow(prey.matrix)).

* If you've decided on a subset of species, you must extract them from the 
mean lipid content vector as well.


```{r}
FC = preyFAs[,c(2,3)] 
FC = FC %>%
  arrange(Species)
FC.vec = tapply(FC$lipid,FC$Species,mean,na.rm=TRUE)
FC.red <- FC.vec[spec.red]
```

### Calibration Coefficients

* Calibration .csv file should contain 2 columns (with headers). The first 
contains the FA names, the second the value of the CC for each FA 
(see example file "CC.csv").

* Important: The FAs in the CC.csv file must be exactly the same 
as the FAs in the original predator.csv file and they must be in the EXACT SAME 
ORDER.


```{r}
data(CC)
cal.vec = CC[,2]
cal.m = replicate(npredators, cal.vec)
rownames(cal.m) <- CC$FA
```

# Running MUFASA

```{r eval=FALSE}
M <- p.MUFASA(predator.matrix, prey.red, cal.m, FC.red, fa.set)
```


### p.MUFASA Output

The MUFASA output is a list with 3 components:

* Diet Estimates

* nll

* Var_Epsilon


#### Diet Estimates

This is a matrix of the diet estimate for each predator (by rows, in the same 
order as the input file) by the species groups (by column, in the same order 
as the prey.red file). The estimates are expressed as a proportion 
(they will sum to 1).

````{r eval=FALSE}
Diet_Estimates <- M$Diet_Estimates
````

#### nll

This is a vector of the negative log likelihood values at each iteration of the 
optimizer. 

````{r eval=FALSE}
nll <- M$nll
````

#### Var_Epsilon

This is the optimized diagonal values of the variance-covariance matrix of the 
errors. See reference in help file for details.
```{r eval=FALSE}
VarEps <- M$Var_Epsilon
```