funbootband
computes simultaneous
prediction and confidence bands for dense time series data on a common
grid. It supports both i.i.d. and clustered (hierarchical) designs and
uses a fast ‘Rcpp’ backend.
Curves are preprocessed via finite Fourier series
(k.coef
harmonics) and bootstrapped to
generate empirical distributions, from which band limits are obtained as
quantiles.
The main user function is:
band(data, type = c("prediction","confidence"),
alpha = 0.05, iid = TRUE, id = NULL,
B = 1000L, k.coef = 50L)
This document gives a quick tour of funbootband
including examples with simulated data for both i.i.d. and hierarchical
scenarios. The functional bootstrap method implemented in the package is
an implementation of the algorithm described in Lenhoff et al. (1999)
and further developed for hierarchical design in Koska et al. (2023).
The vignette contains additional advice for choosing function arguments
in band()
.
The vignette was written in R Markdown, using ‘knitr’.
When curves are independent and identically distributed, the function
argument iid
in band()
should be set to
TRUE
.
For this example, consider a set of simulated smooth curves from a Gaussian process on a common grid (it is generally assumed that all curves are of equal length and reasonably smooth).
library(funbootband)
set.seed(1)
T <- 200
n <- 10
x <- seq(0, 1, length.out = T)
# Simulate smooth Gaussian-process-like curves of equal length
mu <- 10 * sin(2 * pi * x)
ell <- 0.12; sig <- 3
Kmat <- outer(x, x, function(s, t) sig^2 * exp(-(s - t)^2 / (2 * ell^2)))
ev <- eigen(Kmat + 1e-8 * diag(T), symmetric = TRUE)
Z <- matrix(rnorm(T * n), T, n)
Y <- mu + ev$vectors %*% (sqrt(pmax(ev$values, 0)) * Z)
Y <- Y + matrix(rnorm(T * n, sd = 0.2), T, n) # observation noise
Simultaneous prediction and confidence bands are then computed by
setting the type
argument to either prediction
or confidence
:
# Fit prediction and confidence bands
fit_pred <- band(Y, type = "prediction", alpha = 0.11, iid = TRUE, B = B_demo, k.coef = k.coef_demo)
fit_conf <- band(Y, type = "confidence", alpha = 0.11, iid = TRUE, B = B_demo, k.coef = k.coef_demo)
When plotting the bands alongside the original curves, we see that the shaded region is calibrated to contain entire curves with probability \(1-\alpha\) (simultaneous coverage).
x_idx <- seq_len(fit_pred$meta$T)
ylim <- range(c(Y, fit_pred$lower, fit_pred$upper), finite = TRUE)
plot(x_idx, fit_pred$mean, type = "n", ylim = ylim,
xlab = "Index (Time)", ylab = "Amplitude",
main = "Simultaneous bands (i.i.d.)")
matlines(x_idx, Y, col = "gray70", lty = 1, lwd = 1)
polygon(c(x_idx, rev(x_idx)), c(fit_pred$lower, rev(fit_pred$upper)),
col = grDevices::adjustcolor("steelblue", alpha.f = 0.25), border = NA)
polygon(c(x_idx, rev(x_idx)), c(fit_conf$lower, rev(fit_conf$upper)),
col = grDevices::adjustcolor("gray40", alpha.f = 0.3), border = NA)
lines(x_idx, fit_pred$mean, col = "black", lwd = 1)
When the i.i.d. assumption is violated, iid
needs to be
set to FALSE
. band()
will then automatically
detect the cluster structure from the column names. Optionally, an
integer/factor vector of length ncol(data)
giving a cluster
id for each curve can be used.
The clustered case is illustrated using a design, where each group (cluster) follows its own mean pattern with smooth within-cluster variation.
library(funbootband)
set.seed(2)
T <- 200
m <- c(5, 5)
x <- seq(0, 1, length.out = T)
# Cluster-specific means
mu <- list(
function(z) 8 * sin(2 * pi * z),
function(z) 8 * cos(2 * pi * z)
)
# Generate curves with smooth within-cluster variation
Bm <- cbind(sin(2 * pi * x), cos(2 * pi * x))
gen_curve <- function(k) {
sc <- rnorm(ncol(Bm), sd = c(2.0, 1.5))
mu[[k]](x) + as.vector(Bm %*% sc)
}
Ylist <- lapply(seq_along(m), function(k) {
sapply(seq_len(m[k]), function(i) gen_curve(k) + rnorm(T, sd = 0.6))
})
Y <- do.call(cbind, Ylist)
colnames(Y) <- unlist(mapply(
function(k, mk) paste0("C", k, "_", seq_len(mk)),
seq_along(m), m
))
# Fit prediction and confidence bands
fit_pred <- band(Y, type = "prediction", alpha = 0.11, iid = FALSE, B = B_demo, k.coef = k.coef_demo)
fit_conf <- band(Y, type = "confidence", alpha = 0.11, iid = FALSE, B = B_demo, k.coef = k.coef_demo)
Note: When iid = FALSE
, the bootstrap
uses a two-stage scheme: sample clusters (with
replacement), then choose curves within cluster (without replacement
until exhausted, then with replacement), which respects within-cluster
dependence.
x_idx <- seq_len(fit_pred$meta$T)
ylim <- range(c(Y, fit_pred$lower, fit_pred$upper), finite = TRUE)
plot(x_idx, fit_pred$mean, type = "n", ylim = ylim,
xlab = "Index (Time)", ylab = "Amplitude",
main = "Simultaneous bands (clustered)")
matlines(x_idx, Y, col = "gray70", lty = 1, lwd = 1)
polygon(c(x_idx, rev(x_idx)), c(fit_pred$lower, rev(fit_pred$upper)),
col = grDevices::adjustcolor("steelblue", alpha.f = 0.25), border = NA)
polygon(c(x_idx, rev(x_idx)), c(fit_conf$lower, rev(fit_conf$upper)),
col = grDevices::adjustcolor("gray40", alpha.f = 0.3), border = NA)
lines(x_idx, fit_pred$mean, col = "black", lwd = 1)
k.coef
(Fourier harmonics)k.coef
controls the number of sine/cosine harmonics
(plus intercept) used to represent each curve before bootstrapping.
k.coef
to the maximum meaningful value on a length-T
periodic
grid. If a larger value is supplied, it is reduced accordingly.k.coef
that
adequately reconstructs your curves.Heuristic for selecting k.coef
The appropriate value of k.coef
depends on both the
grid length T
and the
smoothness of the curves:
T
) can support higher
k.coef
values.As a rule of thumb, increase k.coef
only as far as
necessary. Very high values mainly fit high-frequency noise, increase
runtime, and may lead to numerical instability near the Nyquist
limit.
One simple way to inspect the effect of k.coef
is to
evaluate the reconstruction error (e.g., mean squared
error, MSE) for different choices of k.coef
:
# MSE vs k.coef for the i.i.d. example (uses Y from above)
fourier_basis <- function(T, K) {
t <- seq_len(T)
if (K == 0L) return(cbind(1))
cbind(
1,
sapply(1:K, function(k) cos(2*pi*k*t/T)),
sapply(1:K, function(k) sin(2*pi*k*t/T))
)
}
reconstruct <- function(B, Y) {
coef <- qr.coef(qr(B), Y) # solve for all curves at once
B %*% coef
}
mse <- function(A, B) mean((A - B)^2)
Ks <- c(10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L, 90L, 99L)
T <- nrow(Y)
tab <- do.call(rbind, lapply(Ks, function(K){
B <- fourier_basis(T, K)
Yhat <- reconstruct(B, Y)
data.frame(k.coef = K,
mse = mse(Y, Yhat),
pve = 100 * (1 - sum((Y - Yhat)^2) / sum((Y - mean(Y))^2)))
}))
row.names(tab) <- NULL
print(tab)
# Plot
op <- par(mar = c(4,4,2,1))
plot(tab$k.coef, tab$mse, type = "b", xlab = "k.coef", ylab = "MSE",
main = "Fourier reconstruction error vs. k.coef")
Recommendation
If your curves are smooth and runtime is not a concern, moderately
large k.coef
values (e.g., 20–50 for dense grids) are often
a good choice. For noisy data or when computation time matters, examine
how the reconstruction error (or resulting band limits) changes with
k.coef and select the smallest value beyond which improvements become
negligible. Recomputing the bands for a few candidate values can also
help confirm that results have converged.
B
(bootstrap replicates)The argument B controls the number of bootstrap replications.
B
generally yields tighter and more stable
bands, as quantile estimates become more precise.B
can be
used to reduce runtime. For final inference, rerun with a sufficiently
large B to ensure convergence of the band limits.band(data, type = c("prediction","confidence"),
alpha = 0.05,
iid = TRUE,
id = NULL,
B = 1000L,
k.coef = 50L)
T × n
(rows: time
points; cols: curves). A numeric data.frame
is
accepted."prediction"
(future curve
coverage) or "confidence"
(mean).0.05
gives 95%
bands.iid = FALSE
and supply id
(length
n
) for clusters, or allow inference from column-name
prefixes.50
;
clamped to a grid-specific max).Return value. A list with numeric vectors
lower
, mean
, upper
(length
T
) and meta
(settings, n
,
T
, etc.).
Lenhoff, M. W., Santner, T. J., Otis, J. C., Peterson, M. G., Williams, B. J., & Backus, S. I. (1999). Bootstrap prediction and confidence bands: a superior statistical method for analysis of gait data. Gait & Posture, 9(1), 10–17. doi:10.1016/S0966-6362(98)00043-5
Koska, D., Oriwol, D., & Maiwald, C. (2023). Comparison of statistical models for characterizing continuous differences between two biomechanical measurement systems. Journal of Biomechanics, 149, 111506. doi:10.1016/j.jbiomech.2023.111506
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Berlin
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] funbootband_0.2.0
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 R6_2.6.1 fastmap_1.2.0 xfun_0.53
#> [5] cachem_1.1.0 knitr_1.50 htmltools_0.5.8.1 rmarkdown_2.29
#> [9] lifecycle_1.0.4 cli_3.6.5 sass_0.4.10 jquerylib_0.1.4
#> [13] compiler_4.5.1 rstudioapi_0.17.1 tools_4.5.1 evaluate_1.0.5
#> [17] bslib_0.9.0 Rcpp_1.1.0 yaml_2.3.10 rlang_1.1.6
#> [21] jsonlite_2.0.0