funbootband: Simultaneous Prediction and Confidence Bands for Time Series Data

Overview

funbootband computes simultaneous prediction and confidence bands for dense time series data on a common grid. It supports both i.i.d. and clustered (hierarchical) designs and uses a fast ‘Rcpp’ backend.

Curves are preprocessed via finite Fourier series (k.coef harmonics) and bootstrapped to generate empirical distributions, from which band limits are obtained as quantiles.

Prediction bands: cover future individual curves with probability \(1-\alpha\).
Confidence bands: cover the mean function with probability \(1-\alpha\).

The main user function is:

band(data, type = c("prediction","confidence"),
     alpha = 0.05, iid = TRUE, id = NULL,
     B = 1000L, k.coef = 50L)

This document gives a quick tour of funbootband including examples with simulated data for both i.i.d. and hierarchical scenarios. The functional bootstrap method implemented in the package is an implementation of the algorithm described in Lenhoff et al. (1999) and further developed for hierarchical design in Koska et al. (2023). The vignette contains additional advice for choosing function arguments in band().

The vignette was written in R Markdown, using ‘knitr’.

Quick start (i.i.d.)

When curves are independent and identically distributed, the function argument iid in band() should be set to TRUE.

For this example, consider a set of simulated smooth curves from a Gaussian process on a common grid (it is generally assumed that all curves are of equal length and reasonably smooth).

library(funbootband)

set.seed(1)
T <- 200
n <- 10
x <- seq(0, 1, length.out = T)

# Simulate smooth Gaussian-process-like curves of equal length
mu  <- 10 * sin(2 * pi * x)
ell <- 0.12; sig <- 3
Kmat <- outer(x, x, function(s, t) sig^2 * exp(-(s - t)^2 / (2 * ell^2)))
ev <- eigen(Kmat + 1e-8 * diag(T), symmetric = TRUE)
Z  <- matrix(rnorm(T * n), T, n)
Y  <- mu + ev$vectors %*% (sqrt(pmax(ev$values, 0)) * Z)
Y  <- Y + matrix(rnorm(T * n, sd = 0.2), T, n) # observation noise

Simultaneous prediction and confidence bands are then computed by setting the type argument to either prediction or confidence:

# Fit prediction and confidence bands
fit_pred <- band(Y, type = "prediction", alpha = 0.11, iid = TRUE, B = B_demo, k.coef = k.coef_demo)
fit_conf <- band(Y, type = "confidence", alpha = 0.11, iid = TRUE, B = B_demo, k.coef = k.coef_demo)

When plotting the bands alongside the original curves, we see that the shaded region is calibrated to contain entire curves with probability \(1-\alpha\) (simultaneous coverage).

x_idx <- seq_len(fit_pred$meta$T)
ylim  <- range(c(Y, fit_pred$lower, fit_pred$upper), finite = TRUE)

plot(x_idx, fit_pred$mean, type = "n", ylim = ylim,
     xlab = "Index (Time)", ylab = "Amplitude",
     main = "Simultaneous bands (i.i.d.)")

matlines(x_idx, Y, col = "gray70", lty = 1, lwd = 1)
polygon(c(x_idx, rev(x_idx)), c(fit_pred$lower, rev(fit_pred$upper)),
        col = grDevices::adjustcolor("steelblue", alpha.f = 0.25), border = NA)
polygon(c(x_idx, rev(x_idx)), c(fit_conf$lower, rev(fit_conf$upper)),
        col = grDevices::adjustcolor("gray40", alpha.f = 0.3), border = NA)
lines(x_idx, fit_pred$mean, col = "black", lwd = 1)

Calculated prediction (blue) and confidence (gray) bands.

Clustered (hierarchical) curves

When the i.i.d. assumption is violated, iid needs to be set to FALSE. band() will then automatically detect the cluster structure from the column names. Optionally, an integer/factor vector of length ncol(data) giving a cluster id for each curve can be used.

The clustered case is illustrated using a design, where each group (cluster) follows its own mean pattern with smooth within-cluster variation.

library(funbootband)

set.seed(2)
T <- 200
m <- c(5, 5)
x <- seq(0, 1, length.out = T)

# Cluster-specific means
mu <- list(
  function(z) 8 * sin(2 * pi * z),
  function(z) 8 * cos(2 * pi * z)
)

# Generate curves with smooth within-cluster variation
Bm <- cbind(sin(2 * pi * x), cos(2 * pi * x))
gen_curve <- function(k) {
  sc <- rnorm(ncol(Bm), sd = c(2.0, 1.5))
  mu[[k]](x) + as.vector(Bm %*% sc)
}

Ylist <- lapply(seq_along(m), function(k) {
  sapply(seq_len(m[k]), function(i) gen_curve(k) + rnorm(T, sd = 0.6))
})
Y <- do.call(cbind, Ylist)
colnames(Y) <- unlist(mapply(
  function(k, mk) paste0("C", k, "_", seq_len(mk)),
  seq_along(m), m
))


# Fit prediction and confidence bands
fit_pred <- band(Y, type = "prediction", alpha = 0.11, iid = FALSE, B = B_demo, k.coef = k.coef_demo)
fit_conf <- band(Y, type = "confidence", alpha = 0.11, iid = FALSE, B = B_demo, k.coef = k.coef_demo)

Note: When iid = FALSE, the bootstrap uses a two-stage scheme: sample clusters (with replacement), then choose curves within cluster (without replacement until exhausted, then with replacement), which respects within-cluster dependence.

x_idx <- seq_len(fit_pred$meta$T)
ylim   <- range(c(Y, fit_pred$lower, fit_pred$upper), finite = TRUE)

plot(x_idx, fit_pred$mean, type = "n", ylim = ylim,
     xlab = "Index (Time)", ylab = "Amplitude",
     main = "Simultaneous bands (clustered)")

matlines(x_idx, Y, col = "gray70", lty = 1, lwd = 1)
polygon(c(x_idx, rev(x_idx)), c(fit_pred$lower, rev(fit_pred$upper)),
        col = grDevices::adjustcolor("steelblue", alpha.f = 0.25), border = NA)
polygon(c(x_idx, rev(x_idx)), c(fit_conf$lower, rev(fit_conf$upper)),
        col = grDevices::adjustcolor("gray40", alpha.f = 0.3), border = NA)
lines(x_idx, fit_pred$mean, col = "black", lwd = 1)

Choosing `k.coef` (Fourier harmonics)

k.coef controls the number of sine/cosine harmonics (plus intercept) used to represent each curve before bootstrapping.

The code automatically clamps k.coef to the maximum meaningful value on a length-T periodic grid. If a larger value is supplied, it is reduced accordingly.
In practice, choose the smallest k.coef that adequately reconstructs your curves.

Heuristic for selecting k.coef

The appropriate value of k.coef depends on both the grid length T and the smoothness of the curves:

Longer curves (larger T) can support higher k.coef values.
Smooth, slowly varying curves typically require only a small number of harmonics.
Highly oscillatory or noisy curves may need more harmonics to capture relevant detail.

As a rule of thumb, increase k.coef only as far as necessary. Very high values mainly fit high-frequency noise, increase runtime, and may lead to numerical instability near the Nyquist limit.

One simple way to inspect the effect of k.coef is to evaluate the reconstruction error (e.g., mean squared error, MSE) for different choices of k.coef:

# MSE vs k.coef for the i.i.d. example (uses Y from above)

fourier_basis <- function(T, K) {
  t <- seq_len(T)
  if (K == 0L) return(cbind(1))
  cbind(
    1,
    sapply(1:K, function(k) cos(2*pi*k*t/T)),
    sapply(1:K, function(k) sin(2*pi*k*t/T))
  )
}

reconstruct <- function(B, Y) {
  coef <- qr.coef(qr(B), Y) # solve for all curves at once
  B %*% coef
}

mse <- function(A, B) mean((A - B)^2)

Ks <- c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 90L, 99L)
T  <- nrow(Y)

tab <- do.call(rbind, lapply(Ks, function(K){
  B <- fourier_basis(T, K)
  Yhat <- reconstruct(B, Y)
  data.frame(k.coef = K,
             mse = mse(Y, Yhat),
             pve = 100 * (1 - sum((Y - Yhat)^2) / sum((Y - mean(Y))^2)))
}))
row.names(tab) <- NULL
print(tab)

# Plot
op <- par(mar = c(4,4,2,1))
plot(tab$k.coef, tab$mse, type = "b", xlab = "k.coef", ylab = "MSE",
     main = "Fourier reconstruction error vs. k.coef")

Recommendation

If your curves are smooth and runtime is not a concern, moderately large k.coef values (e.g., 20–50 for dense grids) are often a good choice. For noisy data or when computation time matters, examine how the reconstruction error (or resulting band limits) changes with k.coef and select the smallest value beyond which improvements become negligible. Recomputing the bands for a few candidate values can also help confirm that results have converged.

Tuning `B` (bootstrap replicates)

The argument B controls the number of bootstrap replications.

Increasing B generally yields tighter and more stable bands, as quantile estimates become more precise.
Beyond a certain threshold, however, band limits typically stabilize and no longer change meaningfully.
For exploratory analyses, smaller values of B can be used to reduce runtime. For final inference, rerun with a sufficiently large B to ensure convergence of the band limits.

API reference

band(data, type = c("prediction","confidence"),
     alpha = 0.05,
     iid   = TRUE,
     id    = NULL,
     B     = 1000L,
     k.coef = 50L)

data: numeric matrix T × n (rows: time points; cols: curves). A numeric data.frame is accepted.
type: "prediction" (future curve coverage) or "confidence" (mean).
alpha: e.g., 0.05 gives 95% bands.
iid / id: set iid = FALSE and supply id (length n) for clusters, or allow inference from column-name prefixes.
B: bootstrap reps.
k.coef: Fourier harmonics (default 50; clamped to a grid-specific max).

Return value. A list with numeric vectors lower, mean, upper (length T) and meta (settings, n, T, etc.).

References

Lenhoff, M. W., Santner, T. J., Otis, J. C., Peterson, M. G., Williams, B. J., & Backus, S. I. (1999). Bootstrap prediction and confidence bands: a superior statistical method for analysis of gait data. Gait & Posture, 9(1), 10–17. doi:10.1016/S0966-6362(98)00043-5

Koska, D., Oriwol, D., & Maiwald, C. (2023). Comparison of statistical models for characterizing continuous differences between two biomechanical measurement systems. Journal of Biomechanics, 149, 111506. doi:10.1016/j.jbiomech.2023.111506

Session info

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Berlin
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] funbootband_0.2.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.53        
#>  [5] cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1 rmarkdown_2.29   
#>  [9] lifecycle_1.0.4   cli_3.6.5         sass_0.4.10       jquerylib_0.1.4  
#> [13] compiler_4.5.1    rstudioapi_0.17.1 tools_4.5.1       evaluate_1.0.5   
#> [17] bslib_0.9.0       Rcpp_1.1.0        yaml_2.3.10       rlang_1.1.6      
#> [21] jsonlite_2.0.0

funbootband: Simultaneous Prediction and Confidence Bands for Time Series Data

Daniel Koska

2025-10-14

Overview

Quick start (i.i.d.)

Clustered (hierarchical) curves

Choosing `k.coef` (Fourier harmonics)

Tuning `B` (bootstrap replicates)

API reference

References

Session info

funbootband: Simultaneous Prediction and Confidence Bands for Time Series Data

Daniel Koska

2025-10-14

Overview

Quick start (i.i.d.)

Clustered (hierarchical) curves

Choosing k.coef (Fourier harmonics)

Tuning B (bootstrap replicates)

API reference

References

Session info

Choosing `k.coef` (Fourier harmonics)

Tuning `B` (bootstrap replicates)