Variance-Based Sensitivity Analysis for Weighting Estimators

Application to the NCDS Example: Does Education Attainment Increase Wages?

1. Data Overview

The National Child Development Survey (NCDS) follows a group of individuals born in the United Kingdom back in 1958 ¹: This longitudinal study collects respondents’ information on education, family, socioeconomic status and health. Among over 17,000 individuals, we took a subset of 3,642 men who were working in 1991 and had complete data on their education and wages, following the approach of Battistin and Sianesi (2011). Missing values were then imputed using MICE (van Buuren & Groothuis-Oudshoorn, 2010). We excluded variables related to education level and wage to avoid data leakage in this study.

We are interested in whether attainment of academic qualification leads to higher hourly wage. The outcome variable wage is log of hourly pay in British pounds. The treatment variable Dany is a binary indicator for education attainment, i.e. whether the respondents has attained any academic qualification. 2399 people have qualification no lower than O-level, while the left 1243 have no formal education credential. A variety of covariates to control are considered as follows:
white (whether is identified as white people), scht (type of school at age 16), qmab & qmab2 (math scores at ages 7 & 11), qvab & qvab2 (reading scores at ages 7 & 11), sib_u (number of siblings), agepa & agema (age of father and mother in 1974), maemp (mother’s job status in 1974), paed_u & maed_u (years of schooling for father and mother). In terms of potential unobserved confoundedness, differences in neighborhood culture, economic background and mental health can be considered.

Using str() helps to give a glimpse on the dataset.

library(WeightIt)
library(dplyr)
library(ggplot2)
library(jointVIP)
library(cobalt)
library(vbm)
data(NCDS)
str(ncds)
#> 'data.frame':    3642 obs. of  14 variables:
#>  $ white : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ wage  : num  2.57 2.04 1.72 2.2 2.48 ...
#>  $ Dany  : int  1 1 0 1 1 0 0 1 1 1 ...
#>  $ maemp : int  0 0 0 0 1 1 0 1 0 1 ...
#>  $ scht  : int  2 1 1 3 1 2 1 1 1 3 ...
#>  $ qmab  : int  2 5 4 5 3 1 4 5 5 2 ...
#>  $ qmab2 : int  2 5 4 4 3 1 1 4 4 5 ...
#>  $ qvab  : int  1 5 4 4 2 2 2 4 3 4 ...
#>  $ qvab2 : int  2 5 5 5 3 2 1 3 1 5 ...
#>  $ paed_u: int  9 0 0 10 9 10 0 11 10 10 ...
#>  $ maed_u: int  9 0 0 10 9 9 0 11 9 10 ...
#>  $ agepa : int  60 56 57 40 57 43 43 46 43 47 ...
#>  $ agema : int  59 56 53 41 45 42 38 45 43 40 ...
#>  $ sib_u : int  3 0 0 1 1 1 1 1 0 3 ...

2. Balancing covariates by weighting

The outcome variable can be influenced by treatment variable, covariates, and other unobserved confounders. Given the lack of randomization in treatment assignment, we examined the covariate imbalance and adjusted it by weighting to have credible causal inference conclusion.

WeightIt package offers a handy approach to estimate propensity score in weighting:

weightlist_att <- weightit(
  formula = Dany ~ white + maemp + scht +
    qmab + qmab2 +
    qvab + qvab2 +
    paed_u + maed_u +
    agepa + agema +
    sib_u,
  data = ncds,
  method = "ebal",
  estimand = "ATT",
  stabilize = FALSE,
  include.obj = TRUE
)

summary(weightlist_att)
#>                   Summary of weights
#> 
#> - Weight ranges:
#> 
#>           Min                                  Max
#> treated 1.     ||                            1.   
#> control 0.019 |---------------------------| 27.489
#> 
#> - Units with the 5 most extreme weights by group:
#>                                            
#>               1      2      3      4      5
#>  treated      1      1      1      1      1
#>            1135    503    185    129     55
#>  control 15.566 15.887 16.575 21.105 27.489
#> 
#> - Weight statistics:
#> 
#>         Coef of Var   MAD Entropy # Zeros
#> treated       0.000 0.000   0.000       0
#> control       1.974 1.036   0.906       0
#> 
#> - Effective Sample Sizes:
#> 
#>            Control Treated
#> Unweighted 1243.      2399
#> Weighted    253.97    2399
love.plot(weightlist_att, stars="std")

The following love plot shows the standardized mean differences between treated and control groups in each observed covariate. Red and green dots respectively indicate pre-weighting and post-weighting standardized mean differences. Before adjustment, early academic performances—such as reading and math scores at age 7 and 11—shows substantial imbalance. Entropy balancing effectively eliminates the mean differences as adjusted dots cluster around 0. This conclusion is also supported by the joint variable importance plot generated by jointVIP.

jointVIP focuses on identification of variables that should be prioritized for adjustment based on both treatment and outcome, while traditional love plot only considers treatment imbalance. The two dimensions of variable’s confoundedness possibility by jointVIP gives a wider picture in variable importance measurement. Treatment imbalance and outcome association can be estimated by standardized mean difference by treatment and correlation between outcome and covariates. Bias is mathematically proportional to their product. In the context of variable selection, estimation on whole dataset will cause data leakage, thus an individual pilot dataset should be included for covariate balance evaluation. In this case, a separate subsample from the survey population is used in jointVIP. We will go back to the full dataset in the following analysis.

set.seed(123)
pilot_prop = 0.1
controls <- which(ncds$Dany == 0)
pilot_indices <- sample(controls, length(controls) * pilot_prop)

pilot_df <- ncds[pilot_indices, ]
analysis_data <- ncds[-pilot_indices, ]
analysis_data1 <- ncds[-pilot_indices, ]
post_df <- weightit(
  formula = Dany ~ white + maemp + scht +
    qmab + qmab2 +
    qvab + qvab2 +
    paed_u + maed_u +
    agepa + agema +
    sib_u,
  data = analysis_data,
  method = "ebal",
  estimand = "ATT",
  stabilize = FALSE,
  include.obj = TRUE
)
new_jointVIP <- create_jointVIP(
  treatment = "Dany",           
  outcome = "wage",             
  covariates = c("white", "maemp", "scht", "qmab", "qmab2", "qvab",
                 "qvab2", "paed_u", "maed_u", "agepa", "agema", "sib_u"), 
  pilot_df = pilot_df,
  analysis_df = analysis_data    
)
plot(create_post_jointVIP(new_jointVIP,
                                      post_analysis_df = analysis_data),
     plot_title = "Pre-weighting jointVIP: Educational Attainment Analysis")

plot(create_post_jointVIP(new_jointVIP,
                                      post_analysis_df = analysis_data,
                                      wts = post_df$weights),
     plot_title = "Post-weighting jointVIP: Educational Attainment Analysis")

Similarly, jointVIP shows that reading score at age 11 has the strongest association with both treatment and outcome, followed by reading score at 7 and math score at 11. Number of siblings along with these three is most likely to be an influential confounder in this study. After entropy balancing, treatment imbalance is largely reduced, while outcome correlation remains unchanged.

Given this successful weighting, we move on to estimate average treatment effect of the treated units and sensitivity analysis with respect to unobserved confounders.

3. vbm package—ATT and standard error estimation

run_sensitivity_analysis() is capable of both ATT estimation and sensitivity analysis.

# Run the sensitivity analysis
vbm_results <- run_sensitivity_analysis(
  weightlist = weightlist_att,
  Y = "wage",
  treatment = "Dany",
  data = ncds,
  approach = "vbm",
  n_bootstrap = 1000,
  seed = 331,
  n_cores = 1,
  benchmarking = TRUE
)

vbm_results$ipw_estimate
#> treatment 
#> 0.2260783
vbm_results$ipw_se
#> [1] 0.02732327
vbm_results$difference_in_means
#> treatment 
#> 0.3254757
vbm_results$difference_in_means_se
#> [1] 0.01352582

The unadjusted ATT is 0.33 with standard error of 0.01, while the adjusted is 0.23 with standard error of 0.03. After covariate adjustment and accounting of the log scale, respondents with academic qualification have approximately 1.26 times higher wage than those without. The causal effect is 95% significant regardless of covariate balancing.

4. vbm package–sensitivity analysis

Variance-based Sensitivity Model

The variance-based sensitivity model (Huang & Pimentel, 2024) quantifies sensitivity to unobserved confounding by constraining the residual variance in the inverse propensity weights not explained by observed covariates. Instead of bounding the worst-case individual error as in traditional marginal sensitivity models, this approach bounds the distributional differences between the ideal weights (which adjust for both observed and unobserved confounders) and the observable weights (which adjust only for observed covariates). The key parameter is an \(R^2\) value, ranging from 0 to 1, that represents the proportion of variation in the true weights unexplained by the estimated weights. For a given \(R^2\), the maximum possible bias from omitting a confounder has a closed-form expression involving the correlation between weights and outcomes, the outcome variance, and the weight variance. This formulation moves away from worst-case bounds, yielding more informative and stable sensitivity intervals while maintaining a standardized, interpretable sensitivity parameter that can be benchmarked against observed covariates.

head(vbm_results$point_bounds)
#>     R2    ATT_low  ATT_high
#> 1 0.00 0.22607828 0.2260783
#> 2 0.01 0.15375973 0.2983968
#> 3 0.02 0.12328392 0.3288726
#> 4 0.03 0.09953412 0.3526224
#> 5 0.04 0.07919860 0.3729580
#> 6 0.05 0.06099977 0.3911568
head(vbm_results$benchmark_parameters)
#>   variable          bias R2_benchmark rho_benchmark
#> 1    white  9.683647e-05 0.0025764368   0.002622954
#> 2    maemp  5.006839e-04 0.0007849679   0.024591711
#> 3     scht -1.112915e-02 0.1003143359  -0.045882530
#> 4     qmab -4.970010e-03 0.0128957701  -0.059860009
#> 5    qmab2 -1.364044e-02 0.1024507301  -0.055580385
#> 6     qvab -4.235385e-03 0.0658596166  -0.021958920
plot_sensitivity_analysis(
  vbm_results,
  parameter_level = seq(0, 0.15, by = 0.01),
  benchmarking = TRUE,
  benchmark_variable = c("scht", "qmab2", "agema", "qvab2", "white", "qmab", "qvab"),
  variable_name = c("Selective School", "Math Score at 11", "Mother's Age",
                     "Reading Score at 11", "White", 
                    "Math Score at 7", "Reading Score at 7"),
  header = "Effect of Education Attainment on Hourly Wages by VBM"
)

In NCDS study, the above figure displays the variation of point estimate bounds and confidence interval of ATT with respect to different unobserved confounders under variance-based sensitivity model. The solid bars denote the point estimate bounds for a specified value. The lighter intervals represent the 95% confidence intervals. Benchmarked parameters for selected observed covariates (selective school, math score at 11, mother’s age, reading score at 11, white, math score at 7, reading score at 7) are provided in light brown. The critical \(R^2_*\) is plotted in dashed line.

We estimated \(R^2_*=0.05\), which implies that if an omitted confounder explains 5% or more of the variation in the true weights, our estimated effect of education attainment on log of hourly wage is no longer significantly different from the expected distribution under the null. To assess the plausibility of an omitted confounder resulting in an \(R^2\) value of 0.05, we extended formal benchmarking approaches to calibrate possible \(R^2\) values against the strength of observed covariates. The benchmarking results indicate that math and score at 11 might have strong impact on causal effect if neglected under variance-based model. The results echo the finding from jointVIP in Section 2. Comparing the critical and the benchmarking values, we can conclude that if the unobserved confounder accounts for the variance of weights at the lower level of math score at age of 7, the average treatment effect for the treated may be significant; otherwise, the causal conclusion will be altered. Notice that this dataset has a sensitive causal effect due to misclassification of treatment. See more details at Misclassified Treatment Status and Treatment Effects: an Application to Returns to Education in the United Kingdom (Battistin.E & Sianesi.B, 2011).

Marginal Sensitivity Model

The marginal sensitivity model (Zhao at el., 2019) bounds the individual-level multiplicative error in propensity weights using a parameter \(\Lambda \geq 1\), such that \(\Lambda^{-1} \leq w^*(X,U)/w(X) \leq \Lambda\) for all observed covariates \(X\) and unobserved confounders \(U\). For a fixed \(\Lambda\), researchers compute worst-case bias bounds and confidence intervals, then increase \(\Lambda\) until the interval contains zero. The resulting \(\Lambda^*\) value indicates the confounding strength needed to overturn the estimated treated effect. While widely used due to its simplicity, the MSM’s worst-case constraint can yield overly wide or misleadingly narrow intervals depending on the underlying distribution of unobserved confounders and the degree of outcome overlap between treatment groups.

msm_results <- run_sensitivity_analysis(
  weightlist = weightlist_att,
  Y = "wage",
  treatment = "Dany",
  data = ncds,
  approach = "msm",
  n_bootstrap = 1000,
  seed = 331,
  n_cores = 1,
  benchmarking = TRUE
)
msm_results$lambda_star
#> [1] 2
head(msm_results$point_bounds)
#>   lambda     ATT_low  ATT_high
#> 1   1.00  0.22607828 0.2260783
#> 2   1.25  0.16005009 0.2913671
#> 3   1.50  0.10663907 0.3443859
#> 4   1.75  0.06279558 0.3893368
#> 5   2.00  0.02590018 0.4279904
#> 6   2.25 -0.00742418 0.4616414
msm_results$benchmark_parameters
#>    variable lambda
#> 1     white    1.8
#> 2     maemp    1.1
#> 3      scht    3.5
#> 4      qmab    1.4
#> 5     qmab2    3.4
#> 6      qvab    2.4
#> 7     qvab2    2.9
#> 8    paed_u    1.4
#> 9    maed_u    1.3
#> 10    agepa    2.0
#> 11    agema    2.6
#> 12    sib_u    4.8
plot_sensitivity_analysis(
  msm_results,
  parameter_level = seq(1, 5.5, by = 0.25),
  benchmarking = TRUE,
  benchmark_variable = c("maemp", "qvab", "scht", "sib_u", "qvab2", "qmab2", "qmab"),
  variable_name = c("Mother's Employment", "Reading Score at 7", "Selective School", 
                    "No. of Siblings", "Reading Score at 11", "Math Score at 11", 
                    "Math Score at 7"),
  header = "Effect of Education Attainment on Hourly Wages by MSM"
)

This figure displays the variation of point estimate bounds and confidence interval of unconfounded ATT with respect to different \(\lambda\). The solid bars denote the point estimate bounds for a specified \(\lambda\) value. The lighter intervals represent the 95% confidence intervals. Benchmarked parameters selected for similar observed covariates are provided in light brown. The critical \(\Lambda*\) is plotted in dashed line.

Estimated critical \(\Lambda*\) is 2. If the largest possible error that can arise from omitting a confounder is larger than 2, the estimated average treatment effect of the treated units will lose its significance. According to the benchmarking \(\Lambda\) values, math and reading score at 7 are the most important covariates, which is partially revealed by jointVIP. Comparing the critical \(\Lambda*\) and benchmarked values, the average treatment effect for the treated is not robust if there exists some unconfoundedness as strong as twice that of mother’s employment or equally important as reading score at 11 in influencing the variance of weights.

Comparison among VBM and VBM with More Conservative Bounds

Variance-based model gives more informative bounds in terms of rare unobserved covariates, while marginal sensitivity is historically widely used for its good interpretability. In typical variance-based model, there is one single parameter accounting for the proportion of residual variation in true weights due to the omission of unobserved confounding where the assumption takes the correlation between the outcome and the imbalance in unobserved confounders as 1. However, if researchers have some intuition on the omitted confounding and outcome, they can have a tighter bounds by specifying this correlation in variance-based model, and obtain a tighter bounds on point estimates and confidence intervals.

# Run the sensitivity analysis
vbm_w_cor_results <- run_sensitivity_analysis(
  weightlist = weightlist_att,
  Y = "wage",
  treatment = "Dany",
  data = ncds,
  approach = "vbm_w_cor",
  R2_seq = seq(0, 0.8, by = 0.01),
  cor_eps_seq = rep(0.8, 81),
  n_bootstrap = 1000,
  n_cores = 1,
  seed = 331
)

# Critical lambda*
vbm_w_cor_results$Rstar
#> [1] 0.07
# Point bounds by vbm with less conservative bounds
head(vbm_w_cor_results$point_bounds)
#>     R2 rho   ATT_low  ATT_high
#> 1 0.00 0.8 0.2260783 0.2260783
#> 2 0.01 0.8 0.1859928 0.2661637
#> 3 0.02 0.8 0.1691004 0.2830562
#> 4 0.03 0.8 0.1559361 0.2962205
#> 5 0.04 0.8 0.1446643 0.3074922
#> 6 0.05 0.8 0.1345769 0.3175797
# Point bounds by vbm with worst case bounds
head(vbm_results$point_bounds)
#>     R2    ATT_low  ATT_high
#> 1 0.00 0.22607828 0.2260783
#> 2 0.01 0.15375973 0.2983968
#> 3 0.02 0.12328392 0.3288726
#> 4 0.03 0.09953412 0.3526224
#> 5 0.04 0.07919860 0.3729580
#> 6 0.05 0.06099977 0.3911568
# Point bounds by msm with worst case bounds
head(msm_results$point_bounds)
#>   lambda     ATT_low  ATT_high
#> 1   1.00  0.22607828 0.2260783
#> 2   1.25  0.16005009 0.2913671
#> 3   1.50  0.10663907 0.3443859
#> 4   1.75  0.06279558 0.3893368
#> 5   2.00  0.02590018 0.4279904
#> 6   2.25 -0.00742418 0.4616414
# Confidence intervals by vbm with less conservative bounds
head(vbm_w_cor_results$confidence_intervals)
#>     R2 rho     ci_low   ci_high
#> 1 0.00 0.8 0.16640460 0.2875252
#> 2 0.01 0.8 0.10658844 0.3469118
#> 3 0.02 0.8 0.08170304 0.3737700
#> 4 0.03 0.8 0.06081503 0.3936648
#> 5 0.04 0.8 0.04331116 0.4107221
#> 6 0.05 0.8 0.02640352 0.4262472
# Confidence intervals by vbm with worst case bounds
head(vbm_results$confidence_intervals)
#>     R2       ci_low   ci_high
#> 1 0.00  0.166404603 0.2875252
#> 2 0.01  0.092935562 0.3623717
#> 3 0.02  0.060688592 0.3948949
#> 4 0.03  0.034658128 0.4196833
#> 5 0.04  0.013557837 0.4426278
#> 6 0.05 -0.005090294 0.4631943
# Confidence intervals by msm with worst case bounds
head(msm_results$confidence_intervals)
#>   lambda       ci_low   ci_high
#> 1   1.00  0.166404603 0.2875252
#> 2   1.25  0.100726540 0.3534666
#> 3   1.50  0.047401891 0.4080163
#> 4   1.75  0.003236563 0.4546177
#> 5   2.00 -0.035025822 0.4962254
#> 6   2.25 -0.069441545 0.5348832

When assuming the correlation between the outcome and the imbalance in unobserved confounders is bounded by 0.8, the critical \(R^2_*\) is 0.07 and tighter point bounds and confidence intervals can be obtained. This method gives a more realistic sensitivity analysis if researchers have further information on potentially unobserved confounding.

Welcome to ClientVPS Mirrors

Variance-Based Sensitivity Analysis for Weighting Estimators

Jiayao Gan, Melody Huang, Samuel D. Pimentel, Andy A. Shen

2026-06-25

Introduction

Package installation