Truncated Bayesian Model-Averaged T-Test

Henrik R. Godmann & František Bartoš

2024-11-09

Truncated Bayesian Model-Averaged T-Test

Introduction

This vignettes accompanies our recent manuscript ‘’Truncating the Likelihood Allows Outlier Exclusion Without Overestimating the Evidence in the Bayes Factor t-Test’’ (godmann2024TruncLikelihood?) and shows how to use the RoBTT R package to estimate a truncated Bayesian model-averaged independent samples \(t\)-test (TrBTT). TrBTT adapts the t-test to researchers’ outlier handling and thus mitigates the unwanted side effects of outlier exclusion on the inferences. For a general introduction to the RoBTT package, see the Introduction to RoBTT vignette.

Background

Outliers can lead to biased analysis results. However, the widely applied approach of simply excluding extreme observations without changing the analysis is also not appropriate, as it often leads to inflated evidence. This vignette introduces a truncated version of the Bayesian model-averaged independent samples \(t\)-test and demonstrates an alternative way of handling outliers in a Bayesian hypothesis testing framework. TrBTT incorporates the Bayesian model-averaging approach with a truncated likelihood. As such, TrBTT offers a robust solution for conducting independent samples \(t\)-tests that are less susceptible to the influence of outlier.

The TrBTT truncates the likelihood identically to the truncation applied to data. As such, it overcomes the otherwise biased variance estimates due to outlier exclusion. It simultaneously model-averages across \(4\) different models;

  1. model assuming no effect, and equal variances across group,
  2. model assuming no effect, and unequal variances across groups,
  3. model assuming presence of the effect, and equal variances across group,
  4. and model assuming presence of the effect, and unequal variances across groups.

For all models, the likelihood is adjusted according to the specified values. Inferences are based on a weighted average of each model’s predictive performance.

Application

Installing and Loading RoBTT

First, we ensure that the RoBTT package is installed and loaded into the R session:

# Install RoBTT from CRAN
# install.packages("RoBTT")

# Load the RoBTT package
library(RoBTT)

Example Data Generation

We generate some example data to demonstrate the functionality of the test:

set.seed(42)
x1 <- rnorm(100, 0, 1)
x2 <- rnorm(100, 0, 1)

Model-Averaged Truncated Bayesian Independent Samples \(t\)-Test

1. Manual Outlier Exclusion Based on Specific Cutoffs.

First, we demonstrate how to manually exclude outliers using specific cut-offs and then apply truncation to the likelihood function. It is possible to specify specific cut-offs for each group separately, as would be the case for instance with the box plot method for identifying outliers. Further, it is possible to define a cut-off that was applied to both groups, for instance when all response times slower than \(200\) ms and higher than \(1000\) ms should be excluded in both groups.

First, we apply the box plot method for excluding outliers and specify the cut-off range for each group:

# Identify outliers using boxplot statistics for each group
stats1 <- boxplot.stats(x1)
lower_whisker1 <- stats1$stats[1]
upper_whisker1 <- stats1$stats[5]


stats2 <- boxplot.stats(x2)
lower_whisker2 <- stats2$stats[1]
upper_whisker2 <- stats2$stats[5]

# Exclude outliers based on identified whiskers
x1_filtered <- x1[x1 >= lower_whisker1 & x1 <= upper_whisker1]
x2_filtered <- x2[x2 >= lower_whisker2 & x2 <= upper_whisker2]

# Define whiskers for truncated likelihood application
whisker1 <- c(lower_whisker1, upper_whisker1)
whisker2 <- c(lower_whisker2, upper_whisker2)

We can then fit the truncated RoBTT:

# Fit the RoBTT model with truncation using the filtered data
fit1_trunc <- RoBTT(
  x1 = x1_filtered, x2 = x2_filtered,
  truncation = list(x1 = whisker1, x2 = whisker2),
  seed = 1, parallel = FALSE)

We can summarize the fitted model using the summary() function.

summary(fit1_trunc, group_estimates = TRUE)
#> Call:
#> RoBTT(x1 = x1_filtered, x2 = x2_filtered, truncation = list(x1 = whisker1, 
#>     x2 = whisker2), parallel = FALSE, seed = 1)
#> 
#> Robust Bayesian t-test
#> Components summary:
#>               Models Prior prob. Post. prob. Inclusion BF
#> Effect           2/4       0.500       0.319        0.468
#> Heterogeneity    2/4       0.500       0.171        0.207
#> 
#> Model-averaged estimates:
#>         Mean Median  0.025 0.975
#> delta -0.070  0.000 -0.442 0.008
#> rho    0.498  0.500  0.406 0.574
#> 
#> Model-averaged group parameter estimates:
#>            Mean Median  0.025 0.975
#> mu[1]     0.041  0.034 -0.151 0.278
#> mu[2]    -0.031 -0.022 -0.290 0.169
#> sigma[1]  1.055  1.047  0.906 1.258
#> sigma[2]  1.052  1.043  0.887 1.270

The printed output is structured into three sections. First, the Components summary table which contains the inclusion Bayes factor for the presence of an effect and heterogeneity computed using all specified models. Second, the Model-averaged estimates table which contains the model-averaged posterior mean, median estimate, and 95% central credible interval for the effect (Cohen’s d) and variance allocation rho. Third, the Model-averaged group parameter estimates table (generated by setting the group_estimates = TRUE argument) which summarizes the model-averaged mean and standard deviation estimates of each group.

We can also summarize information about the specified models by setting the type = "models" argument in the summary() function.

summary(fit1_trunc, group_estimates = TRUE, type = "models")
#> Call:
#> RoBTT(x1 = x1_filtered, x2 = x2_filtered, truncation = list(x1 = whisker1, 
#>     x2 = whisker2), parallel = FALSE, seed = 1)
#> 
#> Robust Bayesian t-test
#> Models overview:
#>  Model     Distribution   Prior delta    Prior rho Prior prob. log(marglik)
#>      1 truncated normal        Spike(0) Spike(0.5)       0.250      -261.28
#>      2 truncated normal        Spike(0) Beta(1, 1)       0.250      -262.86
#>      3 truncated normal Cauchy(0, 0.71) Spike(0.5)       0.250      -262.04
#>      4 truncated normal Cauchy(0, 0.71) Beta(1, 1)       0.250      -263.62
#>  Post. prob. Inclusion BF
#>        0.564        3.884
#>        0.117        0.397
#>        0.264        1.078
#>        0.055        0.173

This output contains a table summarizing the specifics for each model: The type of likelihood distribution, the prior distributions on the effect parameter, the prior distributions on the rho parameter, the prior model probabilities, the log marginal likelihoods, posterior model probabilities, and the inclusion Bayes factors.

Second, we can also specify the cut-off range for each group separately. Here, we specify identical cut-offs across groups:

cut_off <- c(-2,2)

x1 <- x1[x1 >= -2 & x1 <= 2]
x2 <- x2[x2 >= -2 & x2 <= 2]
# fit RoBTT with truncated likelihood
fit2_trunc  <- RoBTT(
  x1 = x1, x2 = x2, 
  truncation = list(x = cut_off),
  seed = 1, parallel = FALSE)

The results can again be obtained using the summary() function (see above).

2. Applying Direct Truncation Based on Standard Deviations

The RoBTT package also allows specifying truncation directly based on standard deviations, simplifying the process of outlier handling. The function proceeds by excluding extreme observations and truncating the likelihood accordingly. Note that the analyst should not exclude outliers manually and then specify sigma truncation, as the data would be truncated twice.

This is again possible for the same standard deviation value sigma to be applied to both groups, as well as to specify different standard deviations per group.

First, a cut-off range sigma for both groups:

# Fit the model with direct truncation based on standard deviations
fit1_trunc <- RoBTT(
  x1 = x1, x2 = x2,
  truncation = list(sigma = 2.5),
  seed = 1, parallel = FALSE)

Second, a different standard deviation sigma for each group:

# Fit the model with direct truncation based on standard deviations
fit1_trunc <- RoBTT(
  x1 = x1, x2 = x2,
  truncation = list(sigma1 = 2, sigma2 = 2.5),
  seed = 1, parallel = FALSE)

Just like before, the results can be obtained using the summary() function.

Conclusions

This vignette demonstrated outlier handling with truncated Bayesian model-averaged t-test implemented in the RoBTT R package. For methodological background see (godmann2024TruncLikelihood?).

References