--- title: "Illustration on a simulated multipartite network" author: "Sophie Donnet, Pierre Barbillon" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 4 bibliography: references.bib link-citations: yes vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Illustration on a simulated multipartite network} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", # fig.path = "man/figures/README-", out.width = "100%" ) ``` We present the performances of GREMLINS on a simulated multipartite network. GREMLINS includes a function `rMBM` to simulate multipartite networks. Mathematical details can be found in @multipartite. ## Simulation of a complex multipartite network. We use the function `rMBM` provided in the package to simulate a multipartite network involving $2$ functional groups (namely A and B) of respective sizes $$n_A = 60, \quad, n_B = 50.$$ A and B are divided respectively into $3$ and $2$ blocks. The sizes of the blocks are generated randomly. For reproductibility, we fix the random seed to an arbitrarily chosen value. ```{r param FG} namesFG <- c('A','B') v_NQ <- c(60,50) #size of each FG list_pi = list(c(0.16 ,0.40 ,0.44),c(0.3,0.7)) #proportion of each block in each FG list_pi[[1]] ``` We assume that we observe $3$ interactions matrices - A-B : continuous weighted interactions - B-B : binary interactions - A-A : counting directed interactions ```{r param 1 net} E <- rbind(c(1,2),c(2,2),c(1,1)) typeInter <- c( "inc","diradj", "adj") v_distrib <- c('ZIgaussian','bernoulli','poisson') ``` Note that the distributions may be `Bernoulli`, `Poisson`, `Gaussian` or `Laplace` (with null mean). For the Gaussian distribution, a mean and a variance must be given. We generate randomly the emission parameters $\theta$. ```{r param net} list_theta <- list() list_theta[[1]] <- list() list_theta[[1]]$mean <- matrix(c(6.1, 8.9, 6.6, 9.8, 2.6, 1.0), 3, 2) list_theta[[1]]$var <- matrix(c(1.6, 1.6, 1.8, 1.7 ,2.3, 1.5),3, 2) list_theta[[1]]$p0 <- matrix(c(0.4, 0.1, 0.6, 0.5 , 0.2, 0),3, 2) list_theta[[2]] <- matrix(c(0.7,1.0, 0.4, 0.6),2, 2) m3 <- matrix(c(2.5, 2.6 ,2.2 ,2.2, 2.7 ,3.0 ,3.6, 3.5, 3.3),3,3 ) list_theta[[3]] <- (m3 + t(m3))/2# for symetrisation ``` We are now ready to simulate the data ```{r simul 2, eval = TRUE, echo = TRUE} library(GREMLINS) dataSim <- rMBM(v_NQ,E , typeInter, v_distrib, list_pi, list_theta, namesFG = namesFG, seed = 4,keepClassif = TRUE) list_Net <- dataSim$list_Net length(list_Net) names(list_Net[[1]]) list_Net[[1]]$typeInter list_Net[[1]]$rowFG list_Net[[1]]$colFG ``` ## Inference with model selection The model selection and the estimation are performed with the function `multipartiteBM`. ```{r MBM simul, echo = TRUE, eval = TRUE} res_MBMsimu <- multipartiteBM(list_Net, v_distrib = v_distrib, namesFG = c('A','B'), v_Kinit = c(2,2), nbCores = 2, initBM = FALSE, keep = FALSE) ``` We can now get the estimated parameters. ```{r estim param, eval=TRUE} res_MBMsimu$fittedModel[[1]]$paramEstim$list_theta$AB$mean ``` `extractClustersMBM` produces the clusters in each functional group. ```{r extract cluster, eval=TRUE} Cl <- extractClustersMBM(res_MBMsimu) ``` ## Inference without model selection One may also want to estimate the parameters for given numbers of clusters. The function `multipartiteBMFixedModel` is designed for this task. ```{r MBM fixed, echo = TRUE, eval = TRUE} res_MBMsimu_fixed <- multipartiteBMFixedModel(list_Net, v_distrib = v_distrib, nbCores = 2,namesFG = namesFG, v_K = c(3,2)) res_MBMsimu_fixed$fittedModel[[1]]$paramEstim$v_K extractClustersMBM(res_MBMsimu_fixed)$A ``` ## Missing data GREMLINS is also able to handle missing data. In the following experiment, we artificially set missing data in the previously simulated matrices. ```{r sim NA} ############# NA data at random in any matrix epsilon = 10/100 list_Net_NA <- list_Net for (m in 1:nrow(E)){ U <- sample(c(1,0),v_NQ[E[m,1]]*v_NQ[E[m,2]],replace=TRUE,prob = c(epsilon, 1-epsilon)) matNA <- matrix(U,v_NQ[E[m,1]],v_NQ[E[m,2]]) list_Net_NA[[m]]$mat[matNA== 1] = NA if (list_Net_NA[[m]]$typeInter == 'adj') { M <- list_Net_NA[[m]]$mat diag(M) <- NA M[lower.tri(M)] = t(M)[lower.tri(M)] list_Net_NA[[m]]$mat <- M } } ``` ```{r MBM simul NA, echo = TRUE, eval = TRUE} res_MBMsimuNA <- multipartiteBM(list_Net_NA, v_distrib = v_distrib, namesFG = c('A','B'), v_Kinit = c(2,2), nbCores = 2, keep = FALSE) ``` We then have a function to predict the missing edges (probability if binary or intensity if weighted) ```{r MBM predict NA, echo = TRUE, eval = TRUE} pred <- predictMBM(res_MBMsimuNA) ``` ## References