AIDS-Progression

Here we will analyse the data in Section 4.2 of Chen, Ibrahim, and Yiannoutsos (1999) using a logistic regression model. The analysis consists of two AIDS clinical trials called ACTG036 and ACTG019, the data for which can be accessed by calling data(actg036) and data(actg019), respectively (see below).

The historical data will come from the ACTG019 (Volberding et al. (1990)) study, which was a double-blind placebo-controlled clinical trial comparing zidovudine (AZT) with a placebo in people with CD4 cell counts less than 500. The sample size of complete observations for this study was \(n_0 = 822\). The response variable (\(y_0\)) for these data is binary, with 1 indicating death, development of AIDS or ARC and 0 otherwise. We will use the following covariates: cd4 (CD4 cell count), age, treatment and race. To facilitate computation, we will center and scale the continuous covariates (age and cd4) based on the current data. In general, we recommend this centering procedure in order to keep coefficients on roughly the same scale and thus avoid difficult posterior geometries for the Stan dynamic Hamiltonian Monte Carlo (dHMC) procedure to explore.

We will use the methods in hdbayes to construct informative priors using the ACTG019 data as historical data in order to analyze the data from ACTG036 study (Merigan et al. (1991)), for which we will use the same four covariates. For the ACTG036 study the sample size was \(n = 183\).

Let’s take a look at the data and take the opportunity to center and scale the continuous covariates (cd4 and age):

library(hdbayes)
library(posterior)
library(dplyr)
library(parallel)
library(ggplot2)
library(tibble)
age_stats <- with(actg036,
                  c('mean' = mean(age), 'sd' = sd(age)))
cd4_stats <- with(actg036,
                  c('mean' = mean(cd4), 'sd' = sd(cd4)))
actg036$age <- ( actg036$age - age_stats['mean'] ) / age_stats['sd']
actg019$age <- ( actg019$age - age_stats['mean'] ) / age_stats['sd']
actg036$cd4 <- ( actg036$cd4 - cd4_stats['mean'] ) / cd4_stats['sd']
actg019$cd4 <- ( actg019$cd4 - cd4_stats['mean'] ) / cd4_stats['sd']

To establish a baseline, we will estimate the parameters via maximum likelihood, using glm() from stats:

formula <- outcome ~  age + race + treatment + cd4
p       <- length(attr(terms(formula), "term.labels")) # number of predictors
family  <- binomial('logit')
# frequentist analysis of current and historical data separately using GLM
fit.mle.cur  <- glm(formula, family, actg036)
fit.mle.hist <- glm(formula, family, actg019)

summary(fit.mle.hist)
#> 
#> Call:
#> glm(formula = formula, family = family, data = actg019)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -4.0106     1.0276  -3.903 9.51e-05 ***
#> age           0.4958     0.1921   2.581  0.00984 ** 
#> race          1.5368     1.0244   1.500  0.13355    
#> treatment    -0.7412     0.3043  -2.436  0.01485 *  
#> cd4          -0.7109     0.1615  -4.402 1.07e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 398.43  on 821  degrees of freedom
#> Residual deviance: 360.54  on 817  degrees of freedom
#> AIC: 370.54
#> 
#> Number of Fisher Scoring iterations: 7
summary(fit.mle.cur)
#> 
#> Call:
#> glm(formula = formula, family = family, data = actg036)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -3.99251    1.30107  -3.069 0.002150 ** 
#> age          0.17270    0.33265   0.519 0.603648    
#> race         0.08954    1.18186   0.076 0.939607    
#> treatment   -0.09600    0.71886  -0.134 0.893761    
#> cd4         -1.80039    0.49731  -3.620 0.000294 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 83.180  on 182  degrees of freedom
#> Residual deviance: 61.527  on 178  degrees of freedom
#> AIC: 71.527
#> 
#> Number of Fisher Scoring iterations: 7

suppressMessages(confint(fit.mle.hist))
#>                   2.5 %     97.5 %
#> (Intercept) -6.90140344 -2.4402829
#> age          0.11373107  0.8703403
#> race        -0.02626701  4.4242649
#> treatment   -1.35741690 -0.1576862
#> cd4         -1.03097071 -0.3957758
suppressMessages(confint(fit.mle.cur))
#>                  2.5 %     97.5 %
#> (Intercept) -7.2372472 -1.8398296
#> age         -0.5130633  0.8126854
#> race        -1.9320180  3.1410322
#> treatment   -1.5747131  1.3168469
#> cd4         -2.9132023 -0.9259674

the.data <-  list(actg036, actg019)

from which we can see quite some discrepancy in the coefficient estimates. In particular, there is substantial uncertainty in the estimates of the treatment effect, as evidenced by a wide 95% confidence interval. We would thus like to incorporate information from the ACTG019 trial into a prior distribution for the regression coefficients. We will use the plethora of methods available in hdbayes to construct informative priors using the available historical data.

Bayesian Hierarchical model (BHM)

Our first approach will be to fit a Bayesian hierarchical model (BHM), where we model the coefficients for historical and current data jointly through a hierarchical (multilevel) structure: the coefficients for each study are drawn from the same multivariate normal distribution. The hyperparameters of this distribution are also assigned hyperpriors.

In summary, we employ the model: \[ \begin{align*} y_i | x_i, \beta &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right), \ \ i = 1, \ldots, n \\ y_{0i} | x_{0i}, \beta_{0} &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_{0i}'\beta_{0}) \right), \ \ i = 1, \ldots, n_{0} \\ \beta, \beta_{0} &\sim N_p(\mu, \Sigma), \text{ where }\Sigma = \text{diag}(\sigma_1^2, \ldots, \sigma_p^2) \\ \mu &\sim N_p(\mu_0, \Sigma_{0}), \text{ where } \Sigma_{0} \text{ is a diagonal matrix} \\ \sigma_j &\sim N^{+}(\nu_{0,j}, \psi_{0,j}^2),\: j = 1, \ldots, p \end{align*} \]

where \(\beta\) is the vector of regression coefficients of the current data set, \(\beta_{0}\) is the vector of regression coefficients for the historical data set, \(\mu\) is the common prior mean of \(\beta\) and \(\beta_{0}\), which is treated as random with a normal hyperprior having mean \(\mu_0\), and a diagonal covariance matrix \(\Sigma_{0}\). \(\Sigma\) is also treated as random and assumed to be a diagonal matrix. Half-normal hyperpriors are elicited on the diagonal entries of \(\Sigma\).

We set

To fit this model, let us first set up the computational specs, which will also be used in the subsequent analyses.

ncores        <- 4 # maximum number of MCMC chains to run in parallel
nchains       <- 4 # number of Markov chains
iter_warmup   <- 1000 # warmup per chain for MCMC sampling
iter_sampling <- 2500 # number of samples post warmup per chain

base.pars     <- c("(Intercept)", "age", "race", "treatment", "cd4")

# function to pull out the posterior summaries in a convenient form
get_summaries <- function(fit, pars.interest, digits = 2) {
  out <- suppressWarnings(subset(posterior::summarise_draws(fit,
                                           .num_args = list(
                                             sigfig = digits,
                                             notation = "dec"
                                           )),
                variable %in% pars.interest))
  return(out)
}

Now, a simple call to glm.bhm() will fit the model:

time.bhm <- system.time(
  if (instantiate::stan_cmdstan_exists()) {
    fit.bhm <- glm.bhm(
      formula, family, the.data,
      meta.mean.mean = 0, meta.mean.sd = 10,
      meta.sd.mean = 0, meta.sd.sd = 0.5,
      iter_warmup = iter_warmup, iter_sampling = iter_sampling,
      chains = nchains, parallel_chains = ncores,
      refresh = 0
    )
  }
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 4 finished in 13.7 seconds.
#> Chain 3 finished in 15.3 seconds.
#> Chain 2 finished in 15.5 seconds.
#> Chain 1 finished in 16.3 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 15.2 seconds.
#> Total execution time: 16.4 seconds.
get_summaries(fit.bhm, pars.interest = base.pars)
#> # A tibble: 5 × 10
#>   variable     mean median    sd   mad    q5    q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept) -4.1   -4.0   0.89  0.85 -5.7  -2.8     1.0    7271.    4958.
#> 2 age          0.26   0.29  0.26  0.25 -0.19  0.66    1.0    9787.    8676.
#> 3 race         0.99   0.93  0.90  0.85 -0.37  2.5     1.0    6876.    4937.
#> 4 treatment   -0.64  -0.64  0.43  0.40 -1.3   0.069   1.0   10792.    8835.
#> 5 cd4         -1.3   -1.2   0.37  0.36 -1.9  -0.72    1.0    6574.    7825.

As we can see, all MCMC (Stan) diagnostics (Rhat, divergences, etc) look good so we can continue.

Commensurate prior

Next, we consider the commensurate prior of Hobbs, Sargent, and Carlin (2012). This model has the following structure: \[ \begin{align*} y_i | x_i, \beta &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right) \\ y_{0i} | x_{0i}, \beta_{0} &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_{0i}'\beta_{0}) \right) \\ \beta_{0} &\sim N_p(\mu_0, \Sigma_{0}) , \text{ where }\Sigma_{0} \text{ is a diagonal matrix} \\ \beta_j &\sim N_1\left( \beta_{0j}, \tau_j^{-1} \right), j = 1, \ldots, p \\ \tau_j &\overset{\text{i.i.d.}}{\sim} p_{\text{spike}} \cdot N^{+}(\mu_{\text{spike}}, \sigma_{\text{spike}}^2) + (1 - p_{\text{spike}}) \cdot N^{+}(\mu_{\text{slab}}, \sigma_{\text{slab}}^2),\: j = 1, \ldots, p \end{align*} \] where \(\beta = (\beta_1, \ldots, \beta_p)'\), \(\beta_{0} = (\beta_{01}, \ldots, \beta_{0p})'\). The commensurability parameters (i.e., \(\tau_j\)’s) are treated as random with a spike-and-slab prior, which is specified as a mixture of two half-normal priors. We use the following default values in hdbayes:

Again, a simple call to the appropriate function will fit the model.

time.commensurate <- system.time(
  if (instantiate::stan_cmdstan_exists()) {
    fit.commensurate <- glm.commensurate(
      formula = formula, family = family, data.list = the.data,
      p.spike = 0.1, spike.mean = 200, spike.sd = 0.1,
      slab.mean = 0, slab.sd = 5,
      iter_warmup = iter_warmup, iter_sampling = iter_sampling,
      chains = nchains, parallel_chains = ncores,
      refresh = 0
    )
  }
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 8.3 seconds.
#> Chain 2 finished in 8.6 seconds.
#> Chain 3 finished in 8.7 seconds.
#> Chain 4 finished in 8.9 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 8.6 seconds.
#> Total execution time: 9.2 seconds.
get_summaries(fit.commensurate, pars.interest = base.pars)
#> # A tibble: 5 × 10
#>   variable     mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept) -4.2   -4.1   0.88  0.86 -5.7  -2.8    1.0    3377.    3871.
#> 2 age          0.22   0.22  0.28  0.27 -0.24  0.66   1.0   11391.    7098.
#> 3 race         0.96   0.91  0.89  0.85 -0.37  2.5    1.0    3347.    3793.
#> 4 treatment   -0.60  -0.61  0.47  0.45 -1.4   0.18   1.0    8826.    7554.
#> 5 cd4         -1.3   -1.3   0.35  0.34 -1.9  -0.80   1.0    9255.    6867.

And again the diagnostics are fine, so we may proceed with our exploration of informative priors.

Robust Meta-Analytic Predictive (RMAP) Prior

The RMAP prior is a generalization of the Bayesian Hierarchical Model (BHM), and takes the form \[ \begin{align*} y_i | x_i, \beta &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right) \\ y_{0i} | x_{0i}, \beta_{0} &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_{0i}'\beta_{0}) \right) \\ \beta_{0} &\sim N_p(\mu, \Sigma), \text{ where }\Sigma = \text{diag}(\sigma_1^2, \ldots, \sigma_p^2) \\ \beta &\sim w \times N_p(\mu, \Sigma) + (1 - w) N_p(\mu_v, \Sigma_v) \\ \mu &\sim N_p(\mu_0, \Sigma_{0}), \text{ where }\Sigma_{0} \text{ is a diagonal matrix} \\ \sigma_j &\sim N^{+}(\nu_{0,j}, \psi_{0,j}^2),\: j = 1, \ldots, p \end{align*} \] where \(w \in (0,1)\) controls for the level of borrowing of the historical data. Note that when \(w = 1\), the robust MAP prior effectively becomes the BHM. The defaults are the same as in the BHM except the default value for \(w\) is 0.1, and the default vague/non-informative prior \(N_p(\mu_v, \Sigma_v)\) is specified as:

We will explore how different values of \(w\) affect the posterior results. The RMAP can be called with glm.rmap():

rmap.t1 <- system.time(
  if (instantiate::stan_cmdstan_exists()) {
    res.rmap.a <- glm.rmap(
      formula = formula, family = family, data.list = the.data,
      w = 0.1,
      iter_warmup = iter_warmup, iter_sampling = iter_sampling,
      chains = nchains, parallel_chains = ncores,
      refresh = 0
    )
  }
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 2 finished in 18.3 seconds.
#> Chain 1 finished in 19.0 seconds.
#> Chain 3 finished in 19.2 seconds.
#> Chain 4 finished in 19.5 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 19.0 seconds.
#> Total execution time: 19.7 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 2 finished in 18.9 seconds.
#> Chain 4 finished in 21.0 seconds.
#> Chain 1 finished in 21.9 seconds.
#> Chain 3 finished in 23.6 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 21.4 seconds.
#> Total execution time: 23.8 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
#> Iteration: 6
#> Iteration: 7
#> Iteration: 8
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 1.3 seconds.
#> Chain 2 finished in 1.4 seconds.
#> Chain 3 finished in 1.3 seconds.
#> Chain 4 finished in 1.3 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.3 seconds.
#> Total execution time: 1.5 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
fit.rmap.a <- res.rmap.a[["post.samples"]]

rmap.t2 <- system.time(
  if (instantiate::stan_cmdstan_exists()) {
    res.rmap.b <- glm.rmap(
      formula = formula, family = family, data.list = the.data,
      w = 0.5,
      iter_warmup = iter_warmup, iter_sampling = iter_sampling,
      chains = nchains, parallel_chains = ncores,
      refresh = 0
    )
  }
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 2 finished in 16.9 seconds.
#> Chain 1 finished in 17.8 seconds.
#> Chain 3 finished in 17.9 seconds.
#> Chain 4 finished in 18.4 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 17.7 seconds.
#> Total execution time: 18.6 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
#> Iteration: 6
#> Iteration: 7
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 4 finished in 19.8 seconds.
#> Chain 1 finished in 20.5 seconds.
#> Chain 3 finished in 21.7 seconds.
#> Chain 2 finished in 21.9 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 21.0 seconds.
#> Total execution time: 21.9 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
#> Iteration: 6
#> Iteration: 7
#> Iteration: 8
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 1.0 seconds.
#> Chain 3 finished in 1.0 seconds.
#> Chain 2 finished in 1.2 seconds.
#> Chain 4 finished in 1.2 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.1 seconds.
#> Total execution time: 1.4 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
fit.rmap.b <- res.rmap.b[["post.samples"]]

rmap.t3 <- system.time(
  if (instantiate::stan_cmdstan_exists()) {
    res.rmap.c <- glm.rmap(
      formula = formula, family = family, data.list = the.data,
      w = 0.9,
      iter_warmup = iter_warmup, iter_sampling = iter_sampling,
      chains = nchains, parallel_chains = ncores,
      refresh = 0
    )
  }
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 3 finished in 17.5 seconds.
#> Chain 4 finished in 18.2 seconds.
#> Chain 2 finished in 21.4 seconds.
#> Chain 1 finished in 26.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 20.8 seconds.
#> Total execution time: 26.1 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
#> Iteration: 6
#> Iteration: 7
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 2 finished in 20.1 seconds.
#> Chain 1 finished in 21.9 seconds.
#> Chain 4 finished in 22.2 seconds.
#> Chain 3 finished in 22.5 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 21.7 seconds.
#> Total execution time: 22.7 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
#> Iteration: 6
#> Iteration: 7
#> Iteration: 8
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 1.0 seconds.
#> Chain 2 finished in 1.1 seconds.
#> Chain 3 finished in 1.1 seconds.
#> Chain 4 finished in 1.0 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.0 seconds.
#> Total execution time: 1.2 seconds.
#> 
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
fit.rmap.c <- res.rmap.c[["post.samples"]]

time.rmap <- rmap.t1

get_summaries(fit.rmap.a, pars.interest = base.pars)
#> # A tibble: 5 × 10
#>   variable     mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept) -4.3   -4.2   1.0   0.96 -6.1  -2.8    1.0    4058.    3810.
#> 2 age          0.24   0.26  0.29  0.28 -0.26  0.68   1.0    5184.    5183.
#> 3 race         0.87   0.82  0.97  0.92 -0.62  2.6    1.0    4249.    3580.
#> 4 treatment   -0.55  -0.56  0.52  0.48 -1.4   0.33   1.0    5202.    4451.
#> 5 cd4         -1.5   -1.4   0.45  0.45 -2.3  -0.79   1.0    3757.    4276.
get_summaries(fit.rmap.b, pars.interest = base.pars)
#> # A tibble: 5 × 10
#>   variable     mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept) -4.3   -4.2   0.99  0.95 -6.1  -2.9    1.0    3884.    2840.
#> 2 age          0.24   0.26  0.29  0.29 -0.26  0.67   1.0    4807.    4752.
#> 3 race         0.85   0.80  0.97  0.90 -0.65  2.5    1.0    4064.    4067.
#> 4 treatment   -0.55  -0.56  0.52  0.48 -1.4   0.33   1.0    4801.    4710.
#> 5 cd4         -1.5   -1.5   0.44  0.47 -2.2  -0.82   1.0    3641.    3831.
get_summaries(fit.rmap.c, pars.interest = base.pars)
#> # A tibble: 5 × 10
#>   variable     mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept) -4.3   -4.2   0.98  0.95 -6.0  -2.8    1.0    3548.    4025.
#> 2 age          0.24   0.26  0.29  0.27 -0.25  0.68   1.0    5221.    4451.
#> 3 race         0.82   0.78  0.97  0.94 -0.71  2.5    1.0    3467.    3782.
#> 4 treatment   -0.55  -0.56  0.52  0.47 -1.4   0.33   1.0    4852.    4536.
#> 5 cd4         -1.5   -1.5   0.44  0.44 -2.3  -0.82   1.0    4394.    3811.

The diagnostics look good, so we will proceed.

Power priors

In this section we will explore the power prior (PP, Ibrahim and Chen (2000)) and its variations: the normalized power prior (NPP, Duan, Smith, and Ye (2006), Neuenschwander, Branson, and Spiegelhalter (2009)) and the normalized asymptotic power prior (NAPP, Ibrahim et al. (2015)). The Power Prior takes the form \[ \begin{align*} &y_i | x_i, \beta \sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right) , \ \ i = 1, \ldots, n\\ &\pi_{\text{PP}}(\beta | D_0, a_{0}) \propto L(\beta | D_{0})^{a_{0}} \pi_0(\beta) \end{align*} \]

where \(L(\beta | D)\) is the likelihood of the GLM based on data set \(D\), \(a_{0} \in (0,1)\) is a fixed hyperparameter controlling the effective sample size contributed by the historical data set \(D_{0}\), and \(\pi_0(\beta)\) is an “initial prior” for \(\beta\).

The default in hdbayes is a (non-informative) normal initial prior on \(\beta\): \[ \beta \sim N_p(\textbf{0}_p, 100 \times I_p) \]

The question that then arises is how to choose \(a_0\). While no definitive answer can be given without context of the specific data and model under analysis, we find it reasonable to set \(a_0 = a^\star = \frac{1}{2}\frac{n}{n_0}\). This way, when the historical and current data sets are of the same size, we set the borrowing factor to \(a_0 = 1/2\), reflecting no particular desire to either borrow or not borrow information. Let us do just that

n0 <- nrow(actg019)
n <- nrow(actg036)
a0.star <- (n/n0) * 1/2

and then proceed to call glm.pp() twice (once for \(a_0 = 1/2\) and once for \(a_0 = a^\star\)):

time.pp <- system.time(
  if (instantiate::stan_cmdstan_exists()) {
    fit.pp  <- glm.pp(
      formula = formula, family = family, data.list = the.data,
      a0 = 0.5,
      iter_warmup = iter_warmup, iter_sampling = iter_sampling,
      chains = nchains, parallel_chains = ncores,
      refresh = 0
    )
  }
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 3 finished in 6.3 seconds.
#> Chain 1 finished in 6.6 seconds.
#> Chain 2 finished in 6.5 seconds.
#> Chain 4 finished in 6.9 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 6.6 seconds.
#> Total execution time: 7.0 seconds.

time.pp.star <- system.time(
  if (instantiate::stan_cmdstan_exists()) {
    fit.pp.star  <- glm.pp(
      formula = formula, family = family, data.list = the.data,
      a0 = a0.star,
      iter_warmup = iter_warmup, iter_sampling = iter_sampling,
      chains = nchains, parallel_chains = ncores,
      refresh = 0
    )
  }
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 3 finished in 5.5 seconds.
#> Chain 4 finished in 5.7 seconds.
#> Chain 2 finished in 5.9 seconds.
#> Chain 1 finished in 6.3 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 5.9 seconds.
#> Total execution time: 6.4 seconds.

get_summaries(fit.pp, pars.interest = base.pars)
#> # A tibble: 5 × 10
#>   variable     mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept) -3.8   -3.6   0.99  0.91 -5.5   -2.4     1.0    4242.    3407.
#> 2 age          0.36   0.36  0.21  0.21  0.023  0.70    1.0    9387.    6432.
#> 3 race         1.1    0.98  0.99  0.91 -0.27   2.9     1.0    4339.    3444.
#> 4 treatment   -0.67  -0.67  0.37  0.36 -1.3   -0.074   1.0    8539.    5840.
#> 5 cd4         -0.92  -0.92  0.19  0.19 -1.2   -0.60    1.0    9323.    6400.
get_summaries(fit.pp.star, pars.interest = base.pars)
#> # A tibble: 5 × 10
#>   variable     mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept) -3.9   -3.8   1.3   1.2  -6.2  -2.1    1.0    4028.    3111.
#> 2 age          0.25   0.25  0.27  0.27 -0.20  0.69   1.0    8153.    5807.
#> 3 race         0.76   0.60  1.3   1.2  -1.0   3.0    1.0    4130.    3288.
#> 4 treatment   -0.49  -0.47  0.57  0.57 -1.4   0.42   1.0    8001.    6056.
#> 5 cd4         -1.3   -1.3   0.33  0.32 -1.8  -0.77   1.0    7719.    5997.

from which we can see that (i) all of the diagnostics are fine and (ii) the choice of \(a_0\) does seem to matter (look at the estimates for the treatment coefficient).

Normalized power prior (NPP)

Now we are prepared to start our normalized power prior analysis. The NPP treats the hyperparameter \(a_0\) as random, allowing the data to decide what is the best value. For most models, this requires estimating the normalizing constant \(Z(a_0) = \int L(\beta | D_0)^{a_0} \pi_0(\beta) d\beta\).

The NPP may be summarized as \[ \begin{align*} &y_i | x_i, \beta \sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right) , \ \ i = 1, \ldots, n\\ &\pi_{\text{NPP}}(\beta| D_0, a_0) = \frac{ L(\beta | D_{0})^{a_{0}} }{Z(a_{0})} \pi_0(\beta)\\ &\pi(a_{0}) \propto a_0^{\alpha_0 - 1} (1 - a_0)^{\gamma_0 - 1} \end{align*} \]

The defaults in hdbayes are

when \(\alpha_0 = 1\) and \(\gamma_0 = 1\), the prior on \(a_0\) is a \(U(0,1)\) prior.

We thus might decide to let \(a_0\) vary and be estimated from data. This naturally necessitates placing a prior on \(a_0\). We will now construct a Beta prior for \(a_0\) centred on \(a^\star = \frac{1}{2}\frac{n}{n_0}\) with coefficient of variation (cv) equal to \(1\):

# function for eliciting hyperparameters of a Beta distribution from mean and variance or
# mean and coefficient of variation (cv)
# m0: the desired mean, a scalar between 0 and 1.
# v0: the desired mean, a scalar between 0 and 1.
# cv: the desired coefficient of variation, a scalar larger than 0.
elicit_beta_mean_cv <- function(m0, v0 = NULL, cv = 1) {
  if (!is.null(v0)) {
    a <- -(m0 * v0 + m0 ^ 3 - m0 ^ 2) / v0
    b <- ((m0 - 1) * v0 + m0 ^ 3 - 2 * m0 ^ 2 + m0) / v0
  } else{
    a <- -(m0 * (cv * m0) ^ 2 + m0 ^ 3 - m0 ^ 2) / (cv * m0) ^ 2
    b <- ((m0 - 1) * (cv * m0) ^ 2 + m0 ^ 3 - 2 * m0 ^ 2 + m0) / (cv * m0) ^
      2
  }
  if (a < 0 || b < 0) {
    warning("Warning: at least one of the obtained parameters is not valid")
  }
  return(list(a = a, b = b))
}

# construct Beta prior with cv = 1
beta.pars <- elicit_beta_mean_cv(m0 = a0.star, cv = 1)
curve(dbeta(x, shape1 = beta.pars$a, shape2 = beta.pars$b),
      lwd = 3,
      main = "Prior on a0",
      ylab = "Density",
      xlab = expression(a[0]))
abline(v = a0.star, lwd = 2, lty = 3)
legend(x = "topright",
       legend = c(expression(pi(a[0])),
                  expression(a^'*')),
       lwd = 2, lty = c(1, 3),
       bty = 'n')

plot of chunk a0prior

To conduct an NPP analysis we first need to estimate the normalising constant \(c(a_0)\) for a grid of values of \(a_0\). In hdbayes, there is one function to estimate the normalizing constant across a grid of values for \(a_0\) and another to obtain posterior samples of the normalized power prior.

a0       <- seq(0, 1, length.out = 21)
histdata <- the.data[[2]]

if( requireNamespace("parallel") ){
  if (instantiate::stan_cmdstan_exists()) {
    # wrapper to obtain log normalizing constant in parallel package
    logncfun <- function(a0, ...){
      hdbayes::glm.npp.lognc(
        formula = formula, family = family, histdata = histdata, a0 = a0, ...
      )
    }
    cl <- parallel::makeCluster(10)
    parallel::clusterSetRNGStream(cl, 123)
    parallel::clusterExport(cl, varlist = c('formula', 'family', 'histdata'))
    # call created function
    time.npp.1 <- system.time(
      a0.lognc <- parLapply(
        cl = cl, X = a0, fun = logncfun, iter_warmup = iter_warmup,
        iter_sampling = iter_sampling, chains = nchains, refresh = 0
      )
    )
    parallel::stopCluster(cl)
  }
  a0.lognc <- data.frame( do.call(rbind, a0.lognc) )
}

We will now fit the NPP with both a uniform (Beta(1, 1)) and the informative prior on \(a_0\) devised above using the dictionary of \(a_0\) and \(Z(a_0)\) we just created:

time.npp.2 <- system.time(
  if (instantiate::stan_cmdstan_exists()) {
    fit.npp_unif <- glm.npp(
      formula = formula, family = family, data.list = the.data,
      a0.lognc = a0.lognc$a0,
      lognc = matrix(a0.lognc$lognc, ncol = 1),
      a0.shape1 = 1, a0.shape2 = 1, # uniform prior on a0
      iter_warmup = iter_warmup, iter_sampling = iter_sampling,
      chains = nchains, parallel_chains = ncores,
      refresh = 0
    )
  }
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 3 finished in 7.5 seconds.
#> Chain 4 finished in 7.8 seconds.
#> Chain 1 finished in 7.9 seconds.
#> Chain 2 finished in 8.2 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 7.8 seconds.
#> Total execution time: 8.4 seconds.

time.npp.2.star <- system.time(
  if (instantiate::stan_cmdstan_exists()) {
    fit.npp.star <- glm.npp(
      formula = formula, family = family, data.list = the.data,
      a0.lognc = a0.lognc$a0,
      lognc = matrix(a0.lognc$lognc, ncol = 1),
      a0.shape1 = beta.pars$a, a0.shape2 = beta.pars$b, # beta prior on a0
      iter_warmup = iter_warmup, iter_sampling = iter_sampling,
      chains = nchains, parallel_chains = ncores,
      refresh = 0
    )
  }
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 6.5 seconds.
#> Chain 4 finished in 7.0 seconds.
#> Chain 3 finished in 7.3 seconds.
#> Chain 2 finished in 7.5 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 7.1 seconds.
#> Total execution time: 7.6 seconds.

get_summaries(fit.npp_unif, pars.interest = base.pars)
#> # A tibble: 5 × 10
#>   variable     mean median    sd   mad      q5     q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept) -3.8   -3.7   1.0   0.91 -5.6    -2.4      1.0    4006.    3164.
#> 2 age          0.36   0.37  0.21  0.20  0.0034  0.68     1.0    6680.    5132.
#> 3 race         1.1    1.0   1.0   0.93 -0.37    2.9      1.0    3916.    2967.
#> 4 treatment   -0.65  -0.66  0.40  0.37 -1.3    -0.0033   1.0    7435.    5903.
#> 5 cd4         -0.95  -0.92  0.26  0.21 -1.4    -0.61     1.0    5290.    3159.
get_summaries(fit.npp.star, pars.interest = base.pars)
#> # A tibble: 5 × 10
#>   variable     mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept) -3.8   -3.7   1.2   1.1  -6.0  -2.2    1.0    4189.    4018.
#> 2 age          0.28   0.29  0.26  0.25 -0.17  0.69   1.0    7976.    5443.
#> 3 race         0.78   0.68  1.2   1.1  -0.92  2.8    1.0    4286.    4094.
#> 4 treatment   -0.52  -0.53  0.54  0.53 -1.4   0.38   1.0    6708.    6109.
#> 5 cd4         -1.2   -1.2   0.35  0.34 -1.8  -0.70   1.0    5503.    5271.

With all diagnostics failing to detect problems we move on.

Normalized asymptotic power prior (NAPP)

The Normalized asymptotic power prior (NAPP) uses a large sample theory argument to formulate a normal approximation to the power prior, i.e., the prior is given by \[ \beta | a_0 \sim N(\hat{\beta}_0, a_0^{-1} [I_n(\beta)]^{-1}), \] where \(\hat{\beta}_0\) is the maximum likelihood estimate (MLE) of \(\beta\) based on the historical data and \(I_n(\beta)\) is the associated information matrix (negative Hessian).

In this case, the normalizing constant is known, so we do not need to estimate the normalizing constant before sampling.

The following code analyzes the data set using the NAPP:

time.napp <- system.time(
  if (instantiate::stan_cmdstan_exists()) {
    fit.napp_unif <- glm.napp(
      formula = formula, family = family, data.list = the.data,
      a0.shape1 = 1, a0.shape2 = 1,
      iter_warmup = iter_warmup, iter_sampling = iter_sampling,
      chains = nchains, parallel_chains = ncores,
      refresh = 0
    )
  }
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 finished in 1.8 seconds.
#> Chain 2 finished in 1.8 seconds.
#> Chain 3 finished in 1.8 seconds.
#> Chain 4 finished in 1.9 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.8 seconds.
#> Total execution time: 2.1 seconds.

time.napp.star <- system.time(
  if (instantiate::stan_cmdstan_exists()) {
    fit.napp.star <- glm.napp(
      formula = formula, family = family, data.list = the.data,
      a0.shape1 = beta.pars$a, a0.shape2 = beta.pars$b,
      iter_warmup = iter_warmup, iter_sampling = iter_sampling,
      chains = nchains, parallel_chains = ncores,
      refresh = 0
    )
  }
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 2 finished in 1.7 seconds.
#> Chain 1 finished in 1.8 seconds.
#> Chain 4 finished in 1.7 seconds.
#> Chain 3 finished in 1.8 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.7 seconds.
#> Total execution time: 2.0 seconds.

get_summaries(fit.napp_unif, pars.interest = base.pars)
#> # A tibble: 5 × 10
#>   variable     mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept) -3.5   -3.5   0.97  0.96 -5.2   -2.1     1.0    4830.    4921.
#> 2 age          0.38   0.38  0.19  0.19  0.055  0.69    1.0    7018.    5990.
#> 3 race         0.90   0.85  0.98  0.96 -0.62   2.6     1.0    4938.    5322.
#> 4 treatment   -0.65  -0.66  0.36  0.35 -1.2   -0.055   1.0    8249.    6265.
#> 5 cd4         -0.90  -0.88  0.21  0.19 -1.3   -0.59    1.0    6689.    4517.
get_summaries(fit.napp.star, pars.interest = base.pars)
#> # A tibble: 5 × 10
#>   variable     mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept) -3.7   -3.5   1.2   1.1  -5.8  -1.9    1.0    4737.    4306.
#> 2 age          0.30   0.31  0.25  0.25 -0.11  0.70   1.0    7560.    5463.
#> 3 race         0.75   0.65  1.2   1.1  -1.0   2.9    1.0    5017.    4595.
#> 4 treatment   -0.53  -0.54  0.50  0.49 -1.3   0.30   1.0    7402.    5395.
#> 5 cd4         -1.1   -1.1   0.31  0.29 -1.7  -0.65   1.0    5949.    4849.

Latent exchangeability prior (LEAP)

The LEAP assumes that the historical data are generated from a finite mixture model consisting of \(K \ge 2\) components, with the current data generated from the first component of this mixture. The posterior under the LEAP may be expressed hierarchically as \[ \begin{align*} y_i | x_i, \beta &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right) , \ \ i = 1, \ldots, n\\ y_{0i} | x_{0i}, \beta, \beta_{0k}, \gamma &\sim \gamma_1 \text{Bernoulli}\left( \text{logit}^{-1}(x_{0i}'\beta) \right) + \sum_{k=2}^K \gamma_k \text{Bernoulli}\left( \text{logit}^{-1}(x_{0i}'\beta_{0k}) \right) , \ \ i = 1, \ldots, n_0 \\ \beta, \beta_{0k} &\overset{\text{i.i.d.}}{\sim} N(\mu_0, \Sigma_{0}), \:k = 2, \ldots, K, \text{ where }\Sigma_{0} = \text{diag}(\sigma_{01}^2, \ldots, \sigma_{0p}^2) \\ \gamma &\sim \text{Dirichlet}(\alpha_{0}) \end{align*} \] where \(\gamma = (\gamma_1, \ldots, \gamma_K)'\) is a vector of mixing probabilities, \(\alpha_{0} = (\alpha_1, \ldots, \alpha_K)'\) is a vector of concentration parameters, and \(\mu_0\) and \(\Sigma_{0}\) are respectively the prior mean and covariance matrices for the \(K\) regression coefficients. The defaults in hdbayes are

The LEAP can be fit as follows:

time.leap <- system.time(
  if (instantiate::stan_cmdstan_exists()) {
    fit.leap <- glm.leap(
      formula = formula, family = family, data.list = the.data,
      K = 2, prob.conc = rep(1, 2),
      iter_warmup = iter_warmup, iter_sampling = iter_sampling,
      chains = nchains, parallel_chains = ncores,
      refresh = 0
    )
  }
)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 2 finished in 44.9 seconds.
#> Chain 4 finished in 51.0 seconds.
#> Chain 3 finished in 53.6 seconds.
#> Chain 1 finished in 57.2 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 51.7 seconds.
#> Total execution time: 57.3 seconds.
get_summaries(fit.leap, pars.interest = base.pars)
#> # A tibble: 5 × 10
#>   variable     mean median    sd   mad    q5   q95  rhat ess_bulk ess_tail
#>   <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept) -4.3   -4.2   1.1   0.96 -6.4  -2.9    1.0    1473.     562.
#> 2 age          0.33   0.34  0.26  0.26 -0.13  0.74   1.0    1094.    2566.
#> 3 race         1.1    0.93  1.1   0.91 -0.38  3.0    1.0    1510.    1684.
#> 4 treatment   -0.53  -0.57  0.53  0.52 -1.3   0.38   1.0     582.     580.
#> 5 cd4         -1.3   -1.2   0.40  0.37 -2.0  -0.72   1.0    1408.    2987.

Comparison of methods

After all this work, we can now finally compare the point estimate (MLE / posterior mean) and uncertainty (SE / posterior standard deviation) of all the methods considered here.

fit.list <- list('BHM' = fit.bhm,
                 'Commensurate' = fit.commensurate,
                 'RMAP_w=0.1' = fit.rmap.a,
                 'RMAP_w=0.5' = fit.rmap.b,
                 'RMAP_w=0.9' = fit.rmap.c,
                 'NAPP_unif' = fit.napp_unif,
                 'NAPP' = fit.napp.star,
                 'NPP_unif' = fit.npp_unif,
                 'NPP' = fit.npp.star,
                 'PP_half' = fit.pp,
                 'PP' = fit.pp.star,
                 'LEAP' = fit.leap)

As we can see, there seems to be quite some variation in estimates across methods. To aid our understanding, we will now visualize the posterior distributions for each method:

plot of chunk posteriorplots

where we have omitted the NPP and NAPP with uniform priors and also the power prior with \(a_0 = 1/2\) for clarity. The first observation to make is that the BHM and commensurate prior shrink the most towards the historical MLE for the treatment effect, but the opposite behaviour occurs for the coefficient for age. Let’s take a look at the coefficients under the RMAP prior with different values for \(w\) (\(w=0.1\), \(w=0.5\) and \(w=0.9\)).

plot of chunk rmap_plots

We can now plot the posterior summaries for the coefficients under the NPP and NAPP with uniform and informative priors on \(a_0\) in order to visualize the effect these different prior choices have on the estimated coefficients.

plot of chunk nxpp_plots

Finally, we will take a look at the posterior distribution of \(a_0\) for the NPP and NAPP models under uniform and informative priors:

plot of chunk a0post_plots

where the vertical dotted line marks \(a^\star = \frac{1}{2}\frac{n}{n_0}\) and the dashed curve depicts the informative beta prior. As we can see, the prior does make a big difference with regards to the posterior for \(a_0\). This is not unexpected, since \(a_0\) is a hierarchical parameter and learning it from data is not a simple task.

Now a comparison of running times

time.bhm
#>    user  system elapsed 
#>  60.404   1.113  17.341
time.commensurate
#>    user  system elapsed 
#>  34.538   0.832   9.993
time.rmap
#>    user  system elapsed 
#> 171.930   4.008  55.632
time.pp.star
#>    user  system elapsed 
#>  23.416   0.730   7.152
time.npp.1 + time.npp.2
#>    user  system elapsed 
#>  31.426   1.161 158.077
time.npp.1 + time.npp.2.star
#>    user  system elapsed 
#>  28.103   1.189 157.335
time.napp
#>    user  system elapsed 
#>   7.605   0.594   2.865
time.napp.star
#>    user  system elapsed 
#>   7.297   0.611   2.843
time.leap
#>    user  system elapsed 
#> 202.877   2.525  58.117

References

Chen, M-H, Joseph G Ibrahim, and Constantin Yiannoutsos. 1999. “Prior Elicitation, Variable Selection and Bayesian Computation for Logistic Regression Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61 (1): 223–42.

Duan, Yuyan, Eric P Smith, and Keying Ye. 2006. “Using Power Priors to Improve the Binomial Test of Water Quality.” Journal of Agricultural, Biological, and Environmental Statistics 11 (2): 151–68.

Hobbs, Brian P, Daniel J Sargent, and Bradley P Carlin. 2012. “Commensurate Priors for Incorporating Historical Information in Clinical Trials Using General and Generalized Linear Models.” Bayesian Analysis (Online) 7 (3): 639.

Ibrahim, Joseph G, and Ming-Hui Chen. 2000. “Power Prior Distributions for Regression Models.” Statistical Science, 46–60.

Ibrahim, Joseph G, Ming-Hui Chen, Yeongjin Gwon, and Fang Chen. 2015. “The Power Prior: Theory and Applications.” Statistics in Medicine 34 (28): 3724–49.

Merigan, Thomas C, David A Amato, James Balsley, Maureen Power, William A Price, Sharon Benoit, Angela Perez-Michael, Alan Brownstein, Amy Simon Kramer, and Doreen Brettler. 1991. “Placebo-Controlled Trial to Evaluate Zidovudine in Treatment of Human Immunodeficiency Virus Infection in Asymptomatic Patients with Hemophilia. NHF-Actg 036 Study Group.”

Neuenschwander, Beat, Michael Branson, and David J Spiegelhalter. 2009. “A Note on the Power Prior.” Statistics in Medicine 28 (28): 3562–6.

Volberding, Paul A, Stephen W Lagakos, Matthew A Koch, Carla Pettinelli, Maureen W Myers, David K Booth, Henry H Balfour Jr, et al. 1990. “Zidovudine in Asymptomatic Human Immunodeficiency Virus Infection: A Controlled Trial in Persons with Fewer Than 500 Cd4-Positive Cells Per Cubic Millimeter.” New England Journal of Medicine 322 (14): 941–49.