--- title: "MUFASA Workflow Example" author: "Holly Steeves" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{MUFASA Workflow Example} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Load Package ```{r} library(QFASA) library(dplyr) library(compositions) ``` # Modeling Inputs Prior to starting make sure that: * Fatty acid names in all files are the same (contain the exact same numbers/characters and punctuation) * There are no fatty acids in the prey file that do not appear in the predator file and visa versa ### Fatty Acid Set * This is the list of FAs to be used in the modelling. * The simplest alternative is to load a .csv file that contains a single column with a header row with the names of thee fatty acids listed below (see example file \textbf{``FAset.csv"}). * A more complicated alternative is to load a .csv file with the full set of FAs and then add code to subset the FAs you wish to use from that set -> this alternative is useful if you are planning to test multiple sets. * Regardless of how you load the FA set, it must be converted to a vector. ```{r} data(FAset) fa.set = as.vector(unlist(FAset)) ``` ### Matrix of Predator FA Signatures * The FA signatures in the originating .csv file should be proportions adding to 1. * Each predator signature is a row with the FAs in columns (see example file \textbf{``predatorFAs.csv"}). The FA signatures are subsetted for the chosen FA set (created above) and renormalized DURING the modelling so there is and/or renormalize prior to loading the .csv file or running p.MUFASA BUT make sure that the same FAs appear in the predator and prey files. * Your predator FA .csv file can contain as much tombstone data columns as you like, you must extract the predator FA signatures as separate input in order to run in p.MUFASA. For example: in the code below, the predator .csv file ("predatorFAs.csv") has 4 tombstone columns (SampleCode, AnimalCode, SampleGroup, Biopsy). Prior to running MUFASA, the tombstone (columns 1-4) and FA data (columns 5 onward) are each extracted from the original data frame. The FA data becomes the predator.matrix (which is passed to p.MUFASA) and the tombstone data is retained so that it can be recombined with the model output later on. ```{r} data(predatorFAs) tombstone.info = predatorFAs[,1:4] predator.matrix = predatorFAs[,5:(ncol(predatorFAs))] npredators = nrow(predator.matrix) ``` ### Matrix of Prey FA Signatures * The FA signatures in the .csv file should be proportions that sum to 1. * The prey file should contain all of the individual FA signatures of the prey and their lipid contents (where appropriate). * If you want to only include a subset of prey species, you must extract it prior to input (see code below). ```{r} data(preyFAs) prey.matrix = preyFAs[,-c(1,3)] # Selecting 5 prey species to include spec.red <-c("capelin", "herring", "mackerel", "pilchard", "sandlance") spec.red <- sort(spec.red) prey.red <- prey.matrix %>% filter(Species %in% spec.red) ``` ### Prey Lipid Content (Fat Content) * Mean lipid content by species group is calculated from the full prey file using the species group as a summary variable (see code below). * Note: If no lipid content correction is going to be applied, then a vector of 1s of length equal to the number of species groups is used as the vector instead. I.e. FC - rep(1,nrow(prey.matrix)). * If you've decided on a subset of species, you must extract them from the mean lipid content vector as well. ```{r} FC = preyFAs[,c(2,3)] FC = FC %>% arrange(Species) FC.vec = tapply(FC$lipid,FC$Species,mean,na.rm=TRUE) FC.red <- FC.vec[spec.red] ``` ### Calibration Coefficients * Calibration .csv file should contain 2 columns (with headers). The first contains the FA names, the second the value of the CC for each FA (see example file "CC.csv"). * Important: The FAs in the CC.csv file must be exactly the same as the FAs in the original predator.csv file and they must be in the EXACT SAME ORDER. ```{r} data(CC) cal.vec = CC[,2] cal.m = replicate(npredators, cal.vec) rownames(cal.m) <- CC$FA ``` # Running MUFASA ```{r eval=FALSE} M <- p.MUFASA(predator.matrix, prey.red, cal.m, FC.red, fa.set) ``` ### p.MUFASA Output The MUFASA output is a list with 3 components: * Diet Estimates * nll * Var_Epsilon #### Diet Estimates This is a matrix of the diet estimate for each predator (by rows, in the same order as the input file) by the species groups (by column, in the same order as the prey.red file). The estimates are expressed as a proportion (they will sum to 1). ````{r eval=FALSE} Diet_Estimates <- M$Diet_Estimates ```` #### nll This is a vector of the negative log likelihood values at each iteration of the optimizer. ````{r eval=FALSE} nll <- M$nll ```` #### Var_Epsilon This is the optimized diagonal values of the variance-covariance matrix of the errors. See reference in help file for details. ```{r eval=FALSE} VarEps <- M$Var_Epsilon ```