--- title: "MCMC scheme for posterior inference of Gaussian DAG models: the `learn_DAG()` function" output: rmarkdown::html_vignette # html_document: # theme: readable # highlight: textmate vignette: > %\VignetteIndexEntry{MCMC scheme for posterior inference of Gaussian DAG models: the `learn_DAG()` function} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) oldpar <- par(no.readonly = TRUE) oldoptions <- options() ``` ```{css, echo = FALSE} .math.inline { font-size: 11px; } ``` ```{r setup} library(BCDAG) ``` This is the second of a series of three vignettes for the R package `BCDAG`. In this vignette we focus on function `learn_DAG()`, which implements a Markov Chain Monte Carlo (MCMC) algorithm to sample from the joint posterior of DAG structures and DAG-parameters under the Gaussian assumption. ### Model description The underlying Bayesian Gaussian DAG-model can be summarized as follows: \begin{eqnarray} X_1, \dots, X_q \,|\,\boldsymbol L, \boldsymbol D, \mathcal{D} &\sim& \mathcal{N}_q\left(\boldsymbol 0, (\boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top)^{-1}\right)\\ (\boldsymbol L, \boldsymbol D)\,|\,\mathcal{D} &\sim& \text{DAG-Wishart}(\boldsymbol{a}_{c}^{\mathcal{D}}, \boldsymbol U) \\ p(\mathcal{D}) &\propto& w^{|\mathcal{S}_\mathcal{D}|}(1-w)^{\frac{q(q-1)}{2} - {|\mathcal{S}_\mathcal{D}|}} \end{eqnarray} In particular $\mathcal{D}=(V,E)$ denotes a DAG structure with set of nodes $V=\{1,\dots,q\}$ and set of edges $E\subseteq V \times V$. Moreover, $(\boldsymbol L, \boldsymbol D)$ are model parameters providing the decomposition of the precision (inverse-covariance) matrix $\boldsymbol{\Omega} = \boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top$; specifically, $\boldsymbol{L}$ is a $(q, q)$ matrix of coefficients such that for each $(u, v)$-element $\boldsymbol{L}_{uv}$ with $u \ne v$, $\boldsymbol{L}_{uv} \ne 0$ if and only if $(u, v) \in E$, while $\boldsymbol{L}_{uu} = 1$ for each $u = 1,\dots, q$; also, $\boldsymbol{D}$ is a $(q, q)$ diagonal matrix with $(u, u)$-element $\boldsymbol{D}_{uu}$. The latter decomposition follows from the equivalent Structural Equation Model (SEM) representation of a Gaussian DAG-model; see also Castelletti \& Mascaro (2021). Conditionally to $\mathcal{D}$, a prior to $(\boldsymbol{L}, \boldsymbol{D})$ is assigned through a *compatible* DAG-Wishart distribution with rate hyperparameter $\boldsymbol{U}$, a $(q,q)$ s.p.d. matrix, and shape hyperparameter $\boldsymbol{a}^{\mathcal {D}}_{c}$, a $(q,1)$ vector; see also Cao et al. (2019) and Peluso \& Consonni (2020). Finally, a prior on DAG $\mathcal{D}$ is assigned through a Binomial distribution on the number of edges in the graph; in $p(\mathcal{D})$, $w \in (0,1)$ is a prior probability of edge inclusion, while $|\mathcal{S_{\mathcal{D}}}|$ denotes the number of edges in $\mathcal{D}$; see again Castelletti \& Mascaro (2021) for further details. Target of the MCMC scheme is therefore the joint posterior of $(\boldsymbol{L},\boldsymbol{D},\mathcal{D})$, \begin{equation} p(\boldsymbol L, \boldsymbol D, \mathcal{D}\,|\, \boldsymbol X) \propto p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})p(\boldsymbol{L},\boldsymbol{D}\,|\,\mathcal{D}) \,p(\mathcal{D}), \end{equation} where $\boldsymbol{X}$ denotes a $(n,q)$ data matrix as obtained through $n$ i.i.d. draws from the Gaussian DAG-model and $p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})$ is the likelihood function. See also Castelletti \& Mascaro (2022+) for full details. ### Generating data We first randomly generate a DAG $\mathcal{D}$, the DAG parameters $(\boldsymbol{L},\boldsymbol{D})$ and then $n=1000$ i.i.d. observations from a Gaussian DAG-model as follows: ```{r} set.seed(1) q <- 8 w <- 0.2 DAG <- rDAG(q,w) a <- q U <- diag(1,q) outDL <- rDAGWishart(n=1, DAG, a, U) L <- outDL$L; D <- outDL$D Omega <- L %*% solve(D) %*% t(L) Sigma <- solve(Omega) n <- 1000 X <- mvtnorm::rmvnorm(n = n, sigma = Sigma) ``` See also our [vignette about data generation from Gaussian DAG-models](#). ## `learn_DAG()` Function `learn_DAG()` implements an MCMC algorithm to sample from the joint posterior of DAGs and DAG parameters. This is based on a Partial Analytic Structure (PAS) algorithm (Godsill, 2012) which, at each iteration: 1. Updates the DAG through a Metropolis-Hastings (MH) step where, given the current DAG, a new (direct successor) DAG is drawn from a suitable proposal distribution and accepted with a probability given by the MH acceptance rate (see also section [A note on `fast = TRUE`]); 2. Samples from the posterior distribution of the (updated DAG) parameters; see also Castelletti \& Consonni (2021) for more details. We implement it as follows: ```{r echo = FALSE, include=FALSE} out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = FALSE, collapse = FALSE) ``` ```{r eval = FALSE} out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = FALSE, collapse = FALSE) ``` ### Input Inputs of `learn_DAG()` correspond to three different sets of arguments: * `S`, `burn` and `data` are standard inputs required by any MCMC algorithm. In particular, `S` defines the desired length of the chain, which is obtained by discarding the first `burn` observations (the total number of sampled observations is therefore `S + burn`); `data` is instead the $(n,q)$ matrix $\boldsymbol{X}$; * `a`, `U` and `w` are hyperparameters of the priors on DAGs (`w`) and DAG parameters (`a`, `U`); see also Equation [REF]. The same appear in functions `rDAG()` and `rDAGWishart()` which were introduced in our vignette [ADD REF TO THE VIGNETTE]. * `fast`, `save.memory` and `collapse` are boolean arguments which allow to: implement an approximate proposal distribution within the MCMC scheme (`fast = TRUE`); change the array structure of the stored MCMC output into strings in order to save memory (`save.memory = TRUE`); implement an MCMC for DAG structure learning only, without sampling from the posterior of parameters (`collapse = TRUE`). See also [A note on `fast = TRUE`] and Castelletti \& Mascaro (2022+) for full details. ### Output The output of `learn_DAG()` is an object of class `bcdag`: ```{r} class(out) ``` `bcdag` objects include the output of the MCMC algorithm together with a collection of meta-data representing the input arguments of `learn_DAG()`; these are stored in the attributes of the object:: ```{r} str(out) ``` Attribute `type` refers to the output of `learn_DAG()`, whose structure depends on the choice of the arguments `save.memory` and `collapse`. \vspace{0.2cm} When both are set equal to `FALSE`, as in the previous example, the output of `learn_DAG()` is a *complete* `bcdag` object, collecting three $(q,q,S)$ arrays with the DAG structures (in the form of $q \times q$ adjacency matrices) and the DAG parameters $\boldsymbol{L}$ and $\boldsymbol{D}$ (both as $q \times q$ matrices) sampled by the MCMC: ```{r} out$Graphs[,,1] round(out$L[,,1],2) round(out$D[,,1],2) ``` \vspace{0.2cm} When `collapse = TRUE` and `save.memory = FALSE` the output of `learn_DAG()` is a *collapsed* `bcdag` object, consisting of a $(q,q,S)$ array with the adjacency matrices of the DAGs sampled by the MCMC: ```{r echo = FALSE, include=FALSE} collapsed_out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = FALSE, collapse = TRUE) ``` ```{r eval = FALSE} collapsed_out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = FALSE, collapse = TRUE) ``` ```{r} names(collapsed_out) class(collapsed_out) attributes(collapsed_out)$type collapsed_out$Graphs[,,1] ``` \vspace{0.2cm} When `save.memory = TRUE` and `collapse = FALSE`, the output is a *compressed* `bcdag` object, collecting samples from the joint posterior on DAGs and DAG parameters in the form of a vector of strings: ```{r echo = FALSE, include=FALSE} compressed_out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = TRUE, collapse = FALSE) ``` ```{r eval = FALSE} compressed_out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = TRUE, collapse = FALSE) ``` ```{r} names(compressed_out) class(compressed_out) attributes(compressed_out)$type ``` In such a case, we can access to the MCMC draws as: ```{r} compressed_out$Graphs[1] compressed_out$L[1] compressed_out$D[1] ``` In addition, we implement `bd_decode`, an internal function that can be used to visualize the previous objects as matrices: ```{r} BCDAG:::bd_decode(compressed_out$Graphs[1]) round(BCDAG:::bd_decode(compressed_out$L[1]),2) round(BCDAG:::bd_decode(compressed_out$D[1]),2) ``` \vspace{0.2cm} Finally, if `save.memory = TRUE` and `collapse = TRUE`, the output of `learn_DAG()` is a *compressed and collapsed* `bcdag` object collecting only the sampled DAGs represented as vector of strings: ```{r echo = FALSE, include=FALSE} comprcoll_out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = TRUE, collapse = TRUE) ``` ```{r eval = FALSE} comprcoll_out <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = TRUE, collapse = TRUE) ``` ```{r} names(comprcoll_out) class(comprcoll_out) attributes(comprcoll_out)$type BCDAG:::bd_decode(comprcoll_out$Graphs[1]) ``` ## A note on `fast = TRUE` Step 1. of the MCMC scheme implemented by `learn_DAG()` updates DAG $\mathcal{D}$ by randomly drawing a new candidate DAG $\mathcal{D}'$ from a proposal distribution and then accepting it with probability given by the Metropolis Hastings (MH) acceptance rate; see also Castelletti \& Mascaro (2021). For a given DAG $\mathcal{D}$, the proposal distribution $q(\mathcal{D}'\,|\,\mathcal{D})$ is built over the set $\mathcal{O}_{\mathcal{D}}$ of \emp{all} direct successors DAGs that can be reached from $\mathcal{D}$ by inserting, deleting or reversing a single edge in $\mathcal{D}$. A DAG $\mathcal{D}'$ is then proposed uniformly from the set $\mathcal{O}_{\mathcal{D}}$ so that $q(\mathcal{D}'\,|\,\mathcal{D})=1/|\mathcal{O}_{\mathcal{D}}|$. Moreover, the MH rate requires to evaluate the ratio of proposals $q(\mathcal{D}'\,|\,\mathcal{D})/q(\mathcal{D}\,|\,\mathcal{D}') = |\mathcal{O}_{\mathcal{D}'}|/|\mathcal{O}_{\mathcal{D}}|$, and accordingly the construction of both $\mathcal{O}_{\mathcal{D}}$ and $\mathcal{O}_{\mathcal{D}'}$. If `fast = FALSE`, the proposal ratio is computed exactly; this requires the enumerations of $\mathcal{O}_\mathcal{D}$ and $\mathcal{O}_{\mathcal{D}'}$ which may become computationally expensive, especially when $q$ is large. However, the ratio approaches $1$ as the number of variables $q$ increases: option `fast = TRUE` implements such an approximation, which therefore avoids the construction of $\mathcal{O}_\mathcal{D}$ and $\mathcal{O}_{\mathcal{D}'}$. A comparison between `fast = FALSE` and `fast = TRUE` in the execution of `learn_DAG()` produces the following results in terms of computational time: ```{r results='hide'} # No approximation time_nofast <- system.time(out_nofast <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = FALSE, save.memory = FALSE, collapse = FALSE)) # Approximation time_fast <- system.time(out_fast <- learn_DAG(S = 5000, burn = 1000, data = X, a, U, w, fast = TRUE, save.memory = FALSE, collapse = FALSE)) ``` ```{r} time_nofast time_fast ``` Finally, the corresponding estimated posterior probabilities of edge inclusion are the following: ```{r} round(get_edgeprobs(out_nofast), 2) round(get_edgeprobs(out_fast), 2) ``` ### References * Ben-David E, Li T, Massam H, Rajaratnam B (2015). “High dimensional Bayesian inference for Gaussian directed acyclic graph models.” *arXiv pre-print*. * Cao X, Khare K, Ghosh M (2019). “Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models.” *The Annals of Statistics*, 47(1), 319–348. * Castelletti F, Consonni G (2021). “Bayesian causal inference in probit graphical models” *Bayesian Analysis*, In press. * Castelletti F, Mascaro A (2021). “Structural learning and estimation of joint causal effects among network-dependent variables.” *Statistical Methods & Applications*, 30, 1289–1314. * Castelletti F, Mascaro A (2022). “BCDAG: An R package for Bayesian structural and Causal learning of Gaussian DAGs.” *arXiv pre-print*. * Godsill, SJ (2012). "On the relationship between Markov chain Monte Carlo methods for model uncertainty." *Journal of computational and graphical statistics*, 10(2), 230-248. * Peluso S, Consonni G (2020). “Compatible priors for model selection of high-dimensional Gaussian DAGs.” *Electronic Journal of Statistics*, 14(2), 4110–4132. ```{r, include = FALSE} par(oldpar) options(oldoptions) ```