--- title: "Revse Ecology Analysis on Microbiome" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Revse Ecology Analysis on Microbiome} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE} library(knitr) library(RevEcoR) opts_chunk$set(fig.width=8, fig.height=5) set.seed(60823316) ``` ## RevEcoR -- Reverse ecology linking organism(s) and their environments ###### Yang Cao , Fei Li ###### BeiJing Institute of Radiation -- Biotechnology Laboratory ### Table of Contens * [Background](#background) * [Introduction](#intro) * [Installation](#install) * [Downloding the metabolic data](#metadata) * [Reconstruction of organism metabolic network](#reconstruct) * [Identify seed set of metabolic network](#seedset) * [Predict species interactions](#cooperation) * [Comparing predicted interactions and co-occurrences](#microbiome) * [sessionInfo](#session) * [references](#ref) ### Background Ecology is the scientific With the rapid and inexpensive next-generation sequencing technologies The structure of complex biological systems reflects not only their function but also into the habitates in which they envoled and are adapted to. Reverse Ecology- an emerging new frontier in Evolutionary Sysmtems biology at the interface of computational biology, genomics and environmental science, which uses population genomics to study ecology with no a priori assumptions about the organism(s) under consideration. This term was suggested by [Li *et al.*](#reversecology) during a conference on ecological genomics in Christchurch, New Zealand. It facilitates the translation of high through genomic data into large scale ecological data, and utilizes system-based method to obtain novel insights of pooly characterized microorganisms and relationships between microorgasnims or their environments in a superorganism.Traditional approach, however, can only applied to a small scale and for ralatively well-studied systems. ### Introduction This manual is a brief introduction to structure, funcitons and usage of **RevEcoR** package. The **RevEcoR** package implements the reverse ecology framework. It can be used for reconstruction of metabolic environmnets using a cross-species analysis, idenditifying the set of compounds and predicting the species interactions on a large scale with a graph-based algorithm mentioned in [Borenstein *et al.*](#seedset). The reverse ecology framework which takes advances of system biology and genomic metabolic modeling, aims to predict the ecological traits of poorly studied microorganisms, their interactions with other microorganisms, and the ecology of microbial communities from system-level analysis of complex biological networks. Two softwares ([NetSeed](http://elbo.gs.washington.edu/software_netseed.html) and [NetCooperate](http://elbo.gs.washington.edu/software_netcooperate.html)), have developed by **Borenstein's** lab for studying the ecology of microorganisms. Howevever, neither of them supports metabolic network reconstruction of species, and both are limited in small scale analysis (two species). The **RevEcoR** package provides an interface to microbiome reverse ecology analysis on a large scale. The main features of this package consists of several steps: 1. Downding metabolic networks data from KEGG database: all species metabolic network data could be downloaded from the KEGG PATHWAY datase with the KEGG REST API (application programming interface). 2. Reconstruction metabolic networks of all species: a directed graph whose nodes represent compounds and whose edges represent reactions linking substrates to products. 3. Identify seed set of a specific organism: each metabolic network was decomposed into its strongly conneted components (SCC) using [Kasaraju's algorithm](#Kasaraju). The SCC forms a directed graph whose nodes are the components and whose edges are the orginal edges in the graph that connect nodes in two different components. Then, detectting the seed set with this SCCs. 4. Host-microbe and microbe-microbe cooperation analysis ### Installation **RevEcoR** is free available on CRAN. You can install the latest released version from CRAN as following: ```{r,eval=FALSE} install.packages("RevEcoR") ``` or the latest development version from github. To install packages from GitHub, you first need install the **devtools** package on your system with `install.packages("devtools")`. Note that devtools sometimes needs some extra non-R software on your system -- more specifically, an Rtools download for Windows or Xcode for OS X. There's more information about devtools [here](https://github.com/hadley/devtools). ```{r,eval=FALSE} if (!require(devtools) install.packages("devtools") devtools::install_github("yiluheihei/RevEcoR") ``` After installation, you can load **RevEcoR** into current workspace by typing or pasting the following codes: ```{r eval=TRUE} library(RevEcoR) ``` ### Downloding the metabolic data of a specific organism Both the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Integrated Microbial Genomes database (IMG) [7] collect complete high-quality genome sequences and metagenome sequences with a comprehensive set of publicly available bacterial, archaeal, eukaryotic, and phage genomes, as well as engineered, environmental and host associated metagenome samples. All the sequences and functional annotation profile can be download directly. The KEGG database provides a REST-style API compared with IMG database. This package provides function `getOrgMetabolicData` to download the specific organism metabolic data from KEGG database, and return a list where each element consists of three elements: reaction, substrate and product. Data in IMG, we can only download it manually. In addition, users can annotate their private genomic data with KO terms on IMG systems or in their local machine to obtain the annotation profile. ```{r eval=FALSE} ## download sample metabolic data from remote KEGG database buc <- getOrgMetabolicData("buc") data(kegg_buc) head(buc) ``` ### Reconstruction of organism metabolic network The crucial premise of Reverse Ecology is that genomic information can be converted into ecological information. Metabolism linking miroorgasnim with the biochemical environmnet acrossing compound exchanges: import of exogenous compounds and impact the composition of its environment via secretion of other compounds. Thus, metabolic network is the reflection of interation between organisms and their environment. Graph-based representation of metabolic reactions where nodes represent compounds and edges represent reactions is a common tool in analyzing and studing metabolic networks. A directed edge from node a to b indicates that compound a is a substrate in some reaction that produces compound b. Once the metabolic data is obtained, `reconstructGsMN` could be used to reconstruction the metabolic network of a specific organism. ```{r eval=TRUE, htmlcap="Figure 1 Reconstruction metabolic network of *Buchnera aphidicola APS*", fig.lp="Figure 1", fig.width=8, fig.height=8} ## species in KEGG buc.net <- reconstructGsMN(kegg_buc, RefData = NULL) igraph::print.igraph(buc.net) igraph::plot.igraph(buc.net, vertex.label=NA, vertex.size=5, edge.arrow.size=0.1) ## ko annotation profile species detected in a human microbiome in IMG (not in KEGG) annodir <- system.file("extdata/koanno.tab",package = "RevEcoR") metabolic.data <- read.delim(annodir,stringsAsFactors=FALSE) ##load the reference metabolic data data(RefDbcache) g2 <- reconstructGsMN(metabolic.data, RefData = RefDbcache) ``` ### Identify seed set of metabolic network As the interactions with the environment was reflected in the metabolic networks, these networks could be used not only to infer metabolic function but alse to obtain insights into the growth environments in which the species evolved. Apparantly, organisms can survive in a wide range of enviromnets and may activate only a subset of the pathways in the network of each environment, using a different set of exogenously acquired compounds (termed seed set). The seed set of the network is defined as the minimal set of compounds in the network that allows the synthesis of all other compouds, and can serve as a good proxy for its environment and can be conceived as the essential and effective biochemical environment. `getSeedSets` was used to detect the seed set of a metabolic network which returns a seedset-class object. Futhermore, some methods was supported for seedset-class, e.g length, nonseeds. For more details on seedset-class, see `?seedset-class`.It can help us to get the compound that organisms are exogenously acquired compouds from the environment, representing the organism's nutritional profile. This algorithm is based on a fast method Kasaraju algorithm for SCC decomposition which is implementaed in `Kasaraju`. For more details, see `?getSeedSets` and `?KasarajuSCC`. ```{r eval=TRUE, htmlcap="Figure 2The node colored with red represents the species' seed set",fig.lp="Figure 2", fig.width=8, fig.height=8} ## seed set prediction seed.set <- getSeedSets(buc.net, 0.2) show(seed.set) head(seed.set@seeds) ## The node colored with red represents the species' seed set nodes <- igraph::V(buc.net)$name seeds <- unlist(seed.set@seeds) seed.index <- match(seeds,nodes) node.color <- rep("SkyBlue2",length(nodes)) node.color[seed.index] <- "red" igraph::plot.igraph(buc.net, vertex.label=NA, vertex.size=5, edge.arrow.size=0.1, vertex.color = node.color) ``` ### Predict species interactions The topology of metabolic networks can provide insight not only into the metabolic process that accur within each species, but also into interactions between different species. Here we provides `calculateCoopreationIndex` using three cooperation index: [competition index, coplementarity index](#rule) measure the microbe-microbe co-occurrence pattern. More details, see `?calculateCoopreationIndex`. ```{r} # ptr metabolic network data(kegg_ptr) ##ptr.net <- reconstructGsMN(getOrgMetabolicData("ptr")) ptr.net <- reconstructGsMN(kegg_ptr) # cooperation analysis between buc and ptr cooperation.index <- calculateCooperationIndex(buc.net,ptr.net) cooperation.index ``` Blow we applied **RevEcoR** to predict interactions among several human oral microbiota species whose co-occurrence patterns have previously been described. Our seven sample oral species metabolic dataset was downloaded from the IMG server; it consists of the following [Genomes Online Database (GOD)](http://www.genomesonline.org) IMG identification numbers: Gc0016386, Gi07614, Gi00264, Gc00809, Gc00643, Gi03876, and Gi07289. The interactions among the seven species is calculated as follwing: The metabolic networks of seven species was reconstructed first: ```{r, eval = FALSE, echo=TRUE} ##metabolic network reconstruction of these seven species net <- lapply(anno.species, reconstructGsMN) ``` Then `caculateCooperationIndex` is used to calculated the species interactions among the seven species. ```{r, eval=FALSE, echo=TRUE} ## caculate interactions among vious species interactions <- calculateCooperationIndex(net, p = TRUE) ## competition index $competition.index Aa Ao Fn Pg Sg So Va Aa 1.0000000 0.4736842 0.3157895 0.2280702 0.4210526 0.4385965 0.2456140 Ao 0.4736842 1.0000000 0.3684211 0.3333333 0.4736842 0.4736842 0.2456140 Fn 0.5000000 0.5833333 1.0000000 0.4166667 0.5833333 0.5555556 0.4166667 Pg 0.4193548 0.6129032 0.4838710 1.0000000 0.6129032 0.5161290 0.3870968 Sg 0.5454545 0.6136364 0.4772727 0.4318182 1.0000000 0.9090909 0.3863636 So 0.5813953 0.6046512 0.4651163 0.3720930 0.9302326 1.0000000 0.3953488 Va 0.4827586 0.4827586 0.5172414 0.4137931 0.5862069 0.5862069 1.0000000 ## p value of competition index $competition.index.p Aa Ao Fn Pg Sg So Va Aa 0.000 0.001 0.001 0.001 0.001 0.001 0.001 Ao 0.001 0.000 0.001 0.001 0.001 0.001 0.001 Fn 0.001 0.001 0.000 0.001 0.001 0.001 0.001 Pg 0.001 0.001 0.001 0.000 0.001 0.001 0.001 Sg 0.001 0.001 0.001 0.001 0.000 0.001 0.001 So 0.001 0.001 0.001 0.001 0.001 0.000 0.001 Va 0.001 0.001 0.001 0.001 0.001 0.001 0.000 ## complementarity index $complementarity.index Aa Ao Fn Pg Sg So Va Aa 0.0000000 0.1052632 0.1228070 0.07017544 0.0877193 0.08771930 0.1228070 Ao 0.1403509 0.0000000 0.1403509 0.07017544 0.1228070 0.12280702 0.1403509 Fn 0.1944444 0.1666667 0.0000000 0.16666667 0.1111111 0.11111111 0.1388889 Pg 0.2258065 0.2258065 0.1612903 0.00000000 0.1612903 0.19354839 0.2258065 Sg 0.2272727 0.1818182 0.1590909 0.09090909 0.0000000 0.04545455 0.1590909 So 0.1860465 0.1395349 0.1860465 0.09302326 0.0000000 0.00000000 0.1395349 Va 0.2068966 0.1724138 0.1379310 0.17241379 0.1379310 0.13793103 0.0000000 ## p value of complementarity index $complementarity.index.p Aa Ao Fn Pg Sg So Va Aa 0.000 0.001 0.001 0.001 0.001 0.001 0.001 Ao 0.001 0.000 0.001 0.001 0.001 0.001 0.001 Fn 0.001 0.001 0.000 0.001 0.001 0.001 0.001 Pg 0.001 0.001 0.001 0.000 0.001 0.001 0.001 Sg 0.001 0.001 0.001 0.001 0.000 0.001 0.001 So 0.001 0.001 0.001 0.001 0.001 0.000 0.001 Va 0.001 0.001 0.001 0.001 0.001 0.001 0.000 ``` We found that Streptococcus oralis (So) and Streptococcus gordonii (Sg) had the lowest complementarity index (0.04 and 0.00, respectively) and the highest competition index (0.91 and 0.93, respectively) among all pairs. Appanrently, the p value of all interactions are biologcial/statistical significance. This indicates that these two species are antagonistic, which is consistent with previous findings. ### Comparing predicted interactions and co-occurrences To further evaluate the predicted interactions of RevEcoR on a large scale, we integrated functions mentioned above to investigate species interactions in the gut microbiome. We focused on a list of 116 prevalent gut species, whose genome sequence is available in IMG database and sequence coverage is more than 1% in at least one metagenomic sample of [124 individuals](#gut). Genome annotation profiles of this 116 species was collected from IMG database and was used to calculated the interactions (competition and complementarity index) for all pairs of species. #### 1. Predicting species interactions in gut microbiome For each species, we download the list of genes mapped to the Kyoto Encyclopedia of Genes and Genomes orthologous groups (KOs) was downloaded with a in-house R script. This data, which was used to reconstruct the metabolic network of each species, have been saved as *gut_microbiome.rda* in subdirectory *data* of RevEcoR. We can load it as the following code: ```{r} data(gut_microbiome) ## summary(gut_microbiome) ``` Then, it was used to reconstuct the metabolic network, predict the seed set, and finally predict the pairs of interactions between different species: ```{r, eval = FALSE, echo = TRUE} gut.nets <- lapply(gut_microbiome,reconstructGsMN) seed.sets <- lapply(gut.nets,getSeedSets) ## Since calculation is on large scale, species interactions prediction may take several hours gut.interactions <- calculateCooperationIndex(gut.nets) competition.index <- gut.interactions$competition.index complementarity.index <- gut.interactions$complementarity.index ``` #### 2. Obtaining co-occurrence scores in gut microbiome Specially, it will help us to predict whether species compete with one another tend to co-occur or to exclude by comparing the species interactions and co-occurrences. We obtained co-occurrence scores directly from [*Levy R's* research](#rule), and saved it as *occurence.tab* in subdirectory *inst/extdata* of RevEcoR . Co-occurrence score was calculated based on species abundances across all samples and measured by the Jaccard similarity index. Load the co-occurrence data: ```{r, eval = TRUE, echo = TRUE} occurrence.score <- read.delim(system.file("extdata/occurrence.tab", package = "RevEcoR"),stringsAsFactors = FALSE, quote = "") ``` #### 3. Comparing the interactions and co-occurrences The Jaccard index measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets. Thus, co-occurrence scores are generally symmetric whereas competition index or complementarity index are not according to their definitions. A symmetric version of two interaction index was gnerated by replacing each element of the interaction indices with the mean of each value and that transpose value. ```{r, eval=FALSE,echo=TRUE} competition.index <- (competition.index + t(competition.index))/2 complementarity.index <- (complementarity.index + t(complementarity.index))/2 ``` Subsequently, the Spearman correlation between the co-occurrence scores and the two interaction indices was calculated. A permutation-based Mantel test, which is commonly used in ecology, was used to determin the significance of this correlation. ```{r, eval=FALSE,echo=TRUE} ## upper triangles, which is used to calculate the correlation competition.upper <- competition.index[upper.tri(competition.index)] occurrence.upper <- occurrence.score[upper.tri(occurrence.score)] complementarity.upper <- complementarity.index[upper.tri(complementarity.index)] ## calculate the spearman correlation betwwen co-occurrence scores and two ## interactions indices competition.cor <- cor(competition.upper,occurrence.upper,method="spearman") complementarity.cor <- cor(complementarity.upper,occurrence.upper,method="spearman") ## permutation-based mantel test. Random permutation the co-occurance score ## 10000 times, P value is the fraction of correlations as high as or higher ## than the original if (require(magrittr)){ null.stat <- replicate(10000, sample(1:116) %>% occurrence.score[.,.] %>% .[upper.tri(.)] ) competition.null <- cor(competition.upper,null.stat) complementarity.null <- cor(complementarity.upper,null.stat) length(which(competition.null >= competition.cor)) ## 0 p.competition < 0.00001 length(which(complementarity.null <= complementarity.cor)) ## 0 p.complementarity< 0.00001 } ``` We found that competition index is significant positively correlated with co-occurrence (cor = 0.261, P < 10-4, Mantel correlation test), whereas the complementarity index is significant negatively correlated with co-occurrence (cor = -0.259, P < 10-4, Mantel correlation test). This suggests that competition is liked to be the key factor to promote the assembly of gut microorganisms ### sessionInfo The version number of R and packages loaded for generating the vignette were: ```{r, eval=TRUE} sessionInfo() ``` ### references * Li Y F, Costello J C, Holloway A K, et al. "Reverse ecology" and the power of population genomics[J]. Evolution, 2008, 62(12): 2984-2994. * Borenstein E, Kupiec M, Feldman M W, et al. Large-scale reconstruction and phylogenetic analysis of metabolic environments[J]. Proceedings of the National Academy of Sciences, 2008, 105(38): 14482-14487. * Tarjan R. Depth-first search and linear graph algorithms[J]. SIAM journal on computing, 1972, 1(2): 146-160. * Levy R, Borenstein E. Metabolic modeling of species interaction in the human microbiome elucidates community-level assembly rules[J]. Proceedings of the National Academy of Sciences, 2013, 110(31): 12804-12809. * Borenstein E, Feldman M W. Topological signatures of species interactions in metabolic networks[J]. Journal of Computational Biology, 2009, 16(2): 191-200. * Qin J, Li R, Raes J, et al. A human gut microbial gene catalogue established by metagenomic sequencing[J]. Nature, 2010, 464(7285): 59-65.