An introduction to bootstrap p-values for regression models using the boot.pval package

library(boot.pval)

An introduction to bootstrap p-values for regression models using the boot.pval package

Background

p-values can be computed by inverting the corresponding confidence intervals, as described in Section 14.2 of Thulin (2024) and Section 3.12 in Hall (1992). This package contains functions for computing bootstrap p-values in this way. The approach relies on the fact that:

Summaries for regression models

Summary tables with confidence intervals and p-values for the coefficients of regression models can be obtained using the boot_summary (most models) and censboot_summary (models with censored response variables) functions. Currently, the following models are supported:

A number of examples are available in Chapters 8 and 9 of Modern Statistics with R.

Here are some simple examples with a linear regression model for the mtcars data:

# Bootstrap summary of a linear model for mtcars:
model <- lm(mpg ~ hp + vs, data = mtcars)
boot_summary(model)
#>                Estimate Lower.bound Upper.bound p.value
#> (Intercept) 26.96300111 21.33457536 32.74542669  <0.001
#> hp          -0.05453412 -0.08314687 -0.02580722  <0.001
#> vs           2.57622314 -1.35314501  6.37510000   0.188

# Use 9999 bootstrap replicates and adjust p-values for
# multiplicity using Holm's method:
boot_summary(model, R = 9999, adjust.method = "holm")
#>                Estimate Lower.bound Upper.bound p.value Adjusted p-value
#> (Intercept) 26.96300111 21.37809728  32.7889221  <1e-04           0.0003
#> hp          -0.05453412 -0.08335973  -0.0253564   5e-04           0.0010
#> vs           2.57622314 -1.37357379   6.4663444  0.2023           0.2023

# Export results to a gt table:
boot_summary(model, R = 9999) |>
  summary_to_gt()
Estimate 95 % CI p-value
(Intercept) 26.963 (21.324, 32.589) <1e-04
hp −0.055 (−0.082, −0.026) 4e-04
vs 2.576 (−1.278, 6.435) 0.1956
# Export results to a Word document:
library(flextable)
boot_summary(model, R = 9999) |>
  summary_to_flextable() |> 
  save_as_docx(path = "my_table.docx")

And a toy example for a generalised linear mixed model (using a small number of bootstrap repetitions):

library(lme4)
model <- glmer(TICKS ~ YEAR + (1|LOCATION),
           data = grouseticks, family = poisson)
boot_summary(model, R = 99)

Speeding up computations

For complex models, speed can be greatly improved by using parallelisation. This is set using the parallel (available options are "multicore" and "snow"). The number of CPUs to use is set using ncpus.

model <- glmer(TICKS ~ YEAR + (1|LOCATION),
           data = grouseticks, family = poisson)
boot_summary(model, R = 999, parallel = "multicore", ncpus = 10)

Survival models

Survival regression models should be fitted using the argument model = TRUE. A summary table can then be obtained using censboot_summary. By default, the table contains exponentiated coefficients (i.e. hazard ratios, in the case of a Cox PH model).

library(survival)
# Weibull AFT model:
model <- survreg(formula = Surv(time, status) ~ age + sex, data = lung,
                dist = "weibull", model = TRUE)
# Table with exponentiated coefficients:
censboot_summary(model)
#> Using exponentiated coefficients.
#>                Estimate Lower.bound Upper.bound p.value
#> (Intercept) 531.0483429 212.9814729 1385.668622  <0.001
#> age           0.9878178   0.9730581    1.001211   0.081
#> sex           1.4653368   1.1830736    1.892506  <0.001

# Cox PH model:
model <- coxph(formula = Surv(time, status) ~ age + sex, data = lung,
               model = TRUE)
# Table with hazard ratios:
censboot_summary(model)
#> Using exponentiated coefficients.
#>     Estimate Lower.bound Upper.bound p.value
#> age 1.017191   0.9991815   1.0384897   0.062
#> sex 0.598566   0.4253779   0.8060072  <0.001
# Table with original coefficients:
censboot_summary(model, coef = "raw")
#> Using raw coefficients.
#>        Estimate  Lower.bound Upper.bound p.value
#> age  0.01704533 -0.002337735  0.03574706   0.081
#> sex -0.51321852 -0.850332996 -0.18333554   0.007

Other hypothesis tests

Bootstrap p-values for hypothesis tests based on boot objects can be obtained using the boot.pval function. The following examples are extensions of those given in the documentation for boot::boot:

# Hypothesis test for the city data
# H0: ratio = 1.4
library(boot)
ratio <- function(d, w) sum(d$x * w)/sum(d$u * w)
city.boot <- boot(city, ratio, R = 999, stype = "w", sim = "ordinary")
boot.pval(city.boot, theta_null = 1.4)
#> [1] 0.4544545

# Studentized test for the two sample difference of means problem
# using the final two series of the gravity data.
diff.means <- function(d, f)
{
  n <- nrow(d)
  gp1 <- 1:table(as.numeric(d$series))[1]
  m1 <- sum(d[gp1,1] * f[gp1])/sum(f[gp1])
  m2 <- sum(d[-gp1,1] * f[-gp1])/sum(f[-gp1])
  ss1 <- sum(d[gp1,1]^2 * f[gp1]) - (m1 *  m1 * sum(f[gp1]))
  ss2 <- sum(d[-gp1,1]^2 * f[-gp1]) - (m2 *  m2 * sum(f[-gp1]))
  c(m1 - m2, (ss1 + ss2)/(sum(f) - 2))
}
grav1 <- gravity[as.numeric(gravity[,2]) >= 7, ]
grav1.boot <- boot(grav1, diff.means, R = 999, stype = "f",
                   strata = grav1[ ,2])
boot.pval(grav1.boot, type = "stud", theta_null = 0)
#> [1] 0.05005005