funbootband: Simultaneous Prediction and Confidence Bands for Time Series Data

Daniel Koska

2025-10-14

Overview

funbootband computes simultaneous prediction and confidence bands for dense time series data on a common grid. It supports both i.i.d. and clustered (hierarchical) designs and uses a fast ‘Rcpp’ backend.

Curves are preprocessed via finite Fourier series (k.coef harmonics) and bootstrapped to generate empirical distributions, from which band limits are obtained as quantiles.

The main user function is:

band(data, type = c("prediction","confidence"),
     alpha = 0.05, iid = TRUE, id = NULL,
     B = 1000L, k.coef = 50L)

This document gives a quick tour of funbootband including examples with simulated data for both i.i.d. and hierarchical scenarios. The functional bootstrap method implemented in the package is an implementation of the algorithm described in Lenhoff et al. (1999) and further developed for hierarchical design in Koska et al. (2023). The vignette contains additional advice for choosing function arguments in band().

The vignette was written in R Markdown, using ‘knitr’.

Quick start (i.i.d.)

When curves are independent and identically distributed, the function argument iid in band() should be set to TRUE.

For this example, consider a set of simulated smooth curves from a Gaussian process on a common grid (it is generally assumed that all curves are of equal length and reasonably smooth).

library(funbootband)

set.seed(1)
T <- 200
n <- 10
x <- seq(0, 1, length.out = T)

# Simulate smooth Gaussian-process-like curves of equal length
mu  <- 10 * sin(2 * pi * x)
ell <- 0.12; sig <- 3
Kmat <- outer(x, x, function(s, t) sig^2 * exp(-(s - t)^2 / (2 * ell^2)))
ev <- eigen(Kmat + 1e-8 * diag(T), symmetric = TRUE)
Z  <- matrix(rnorm(T * n), T, n)
Y  <- mu + ev$vectors %*% (sqrt(pmax(ev$values, 0)) * Z)
Y  <- Y + matrix(rnorm(T * n, sd = 0.2), T, n) # observation noise

Simultaneous prediction and confidence bands are then computed by setting the type argument to either prediction or confidence:

# Fit prediction and confidence bands
fit_pred <- band(Y, type = "prediction", alpha = 0.11, iid = TRUE, B = B_demo, k.coef = k.coef_demo)
fit_conf <- band(Y, type = "confidence", alpha = 0.11, iid = TRUE, B = B_demo, k.coef = k.coef_demo)

When plotting the bands alongside the original curves, we see that the shaded region is calibrated to contain entire curves with probability \(1-\alpha\) (simultaneous coverage).

x_idx <- seq_len(fit_pred$meta$T)
ylim  <- range(c(Y, fit_pred$lower, fit_pred$upper), finite = TRUE)

plot(x_idx, fit_pred$mean, type = "n", ylim = ylim,
     xlab = "Index (Time)", ylab = "Amplitude",
     main = "Simultaneous bands (i.i.d.)")

matlines(x_idx, Y, col = "gray70", lty = 1, lwd = 1)
polygon(c(x_idx, rev(x_idx)), c(fit_pred$lower, rev(fit_pred$upper)),
        col = grDevices::adjustcolor("steelblue", alpha.f = 0.25), border = NA)
polygon(c(x_idx, rev(x_idx)), c(fit_conf$lower, rev(fit_conf$upper)),
        col = grDevices::adjustcolor("gray40", alpha.f = 0.3), border = NA)
lines(x_idx, fit_pred$mean, col = "black", lwd = 1)
Calculated prediction (blue) and confidence (gray) bands.
Calculated prediction (blue) and confidence (gray) bands.

Clustered (hierarchical) curves

When the i.i.d. assumption is violated, iid needs to be set to FALSE. band() will then automatically detect the cluster structure from the column names. Optionally, an integer/factor vector of length ncol(data) giving a cluster id for each curve can be used.

The clustered case is illustrated using a design, where each group (cluster) follows its own mean pattern with smooth within-cluster variation.

library(funbootband)

set.seed(2)
T <- 200
m <- c(5, 5)
x <- seq(0, 1, length.out = T)

# Cluster-specific means
mu <- list(
  function(z) 8 * sin(2 * pi * z),
  function(z) 8 * cos(2 * pi * z)
)

# Generate curves with smooth within-cluster variation
Bm <- cbind(sin(2 * pi * x), cos(2 * pi * x))
gen_curve <- function(k) {
  sc <- rnorm(ncol(Bm), sd = c(2.0, 1.5))
  mu[[k]](x) + as.vector(Bm %*% sc)
}

Ylist <- lapply(seq_along(m), function(k) {
  sapply(seq_len(m[k]), function(i) gen_curve(k) + rnorm(T, sd = 0.6))
})
Y <- do.call(cbind, Ylist)
colnames(Y) <- unlist(mapply(
  function(k, mk) paste0("C", k, "_", seq_len(mk)),
  seq_along(m), m
))


# Fit prediction and confidence bands
fit_pred <- band(Y, type = "prediction", alpha = 0.11, iid = FALSE, B = B_demo, k.coef = k.coef_demo)
fit_conf <- band(Y, type = "confidence", alpha = 0.11, iid = FALSE, B = B_demo, k.coef = k.coef_demo)

Note: When iid = FALSE, the bootstrap uses a two-stage scheme: sample clusters (with replacement), then choose curves within cluster (without replacement until exhausted, then with replacement), which respects within-cluster dependence.

x_idx <- seq_len(fit_pred$meta$T)
ylim   <- range(c(Y, fit_pred$lower, fit_pred$upper), finite = TRUE)

plot(x_idx, fit_pred$mean, type = "n", ylim = ylim,
     xlab = "Index (Time)", ylab = "Amplitude",
     main = "Simultaneous bands (clustered)")

matlines(x_idx, Y, col = "gray70", lty = 1, lwd = 1)
polygon(c(x_idx, rev(x_idx)), c(fit_pred$lower, rev(fit_pred$upper)),
        col = grDevices::adjustcolor("steelblue", alpha.f = 0.25), border = NA)
polygon(c(x_idx, rev(x_idx)), c(fit_conf$lower, rev(fit_conf$upper)),
        col = grDevices::adjustcolor("gray40", alpha.f = 0.3), border = NA)
lines(x_idx, fit_pred$mean, col = "black", lwd = 1)

Choosing k.coef (Fourier harmonics)

k.coef controls the number of sine/cosine harmonics (plus intercept) used to represent each curve before bootstrapping.

Heuristic for selecting k.coef

The appropriate value of k.coef depends on both the grid length T and the smoothness of the curves:

As a rule of thumb, increase k.coef only as far as necessary. Very high values mainly fit high-frequency noise, increase runtime, and may lead to numerical instability near the Nyquist limit.

One simple way to inspect the effect of k.coef is to evaluate the reconstruction error (e.g., mean squared error, MSE) for different choices of k.coef:

# MSE vs k.coef for the i.i.d. example (uses Y from above)

fourier_basis <- function(T, K) {
  t <- seq_len(T)
  if (K == 0L) return(cbind(1))
  cbind(
    1,
    sapply(1:K, function(k) cos(2*pi*k*t/T)),
    sapply(1:K, function(k) sin(2*pi*k*t/T))
  )
}

reconstruct <- function(B, Y) {
  coef <- qr.coef(qr(B), Y) # solve for all curves at once
  B %*% coef
}

mse <- function(A, B) mean((A - B)^2)

Ks <- c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 90L, 99L)
T  <- nrow(Y)

tab <- do.call(rbind, lapply(Ks, function(K){
  B <- fourier_basis(T, K)
  Yhat <- reconstruct(B, Y)
  data.frame(k.coef = K,
             mse = mse(Y, Yhat),
             pve = 100 * (1 - sum((Y - Yhat)^2) / sum((Y - mean(Y))^2)))
}))
row.names(tab) <- NULL
print(tab)

# Plot
op <- par(mar = c(4,4,2,1))
plot(tab$k.coef, tab$mse, type = "b", xlab = "k.coef", ylab = "MSE",
     main = "Fourier reconstruction error vs. k.coef")

Recommendation

If your curves are smooth and runtime is not a concern, moderately large k.coef values (e.g., 20–50 for dense grids) are often a good choice. For noisy data or when computation time matters, examine how the reconstruction error (or resulting band limits) changes with k.coef and select the smallest value beyond which improvements become negligible. Recomputing the bands for a few candidate values can also help confirm that results have converged.

Tuning B (bootstrap replicates)

The argument B controls the number of bootstrap replications.

API reference

band(data, type = c("prediction","confidence"),
     alpha = 0.05,
     iid   = TRUE,
     id    = NULL,
     B     = 1000L,
     k.coef = 50L)

Return value. A list with numeric vectors lower, mean, upper (length T) and meta (settings, n, T, etc.).

References

Lenhoff, M. W., Santner, T. J., Otis, J. C., Peterson, M. G., Williams, B. J., & Backus, S. I. (1999). Bootstrap prediction and confidence bands: a superior statistical method for analysis of gait data. Gait & Posture, 9(1), 10–17. doi:10.1016/S0966-6362(98)00043-5

Koska, D., Oriwol, D., & Maiwald, C. (2023). Comparison of statistical models for characterizing continuous differences between two biomechanical measurement systems. Journal of Biomechanics, 149, 111506. doi:10.1016/j.jbiomech.2023.111506

Session info

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Berlin
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] funbootband_0.2.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.53        
#>  [5] cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1 rmarkdown_2.29   
#>  [9] lifecycle_1.0.4   cli_3.6.5         sass_0.4.10       jquerylib_0.1.4  
#> [13] compiler_4.5.1    rstudioapi_0.17.1 tools_4.5.1       evaluate_1.0.5   
#> [17] bslib_0.9.0       Rcpp_1.1.0        yaml_2.3.10       rlang_1.1.6      
#> [21] jsonlite_2.0.0