missoNet

R-CMD-check CRAN status CRAN downloads arXiv License: GPL-2

Multi-task regression and network estimation with missing responses — no imputation required!

missoNet jointly estimates regression coefficients and the response network (precision matrix) from multi-response data where some responses are missing (MCAR/MAR/MNAR). Estimation is based on unbiased estimating equations with separate L1 regularization for coefficients and the precision matrix, enabling robust multi-trait analysis under incomplete outcomes.


Why missoNet?

If you only have a single response, classical lasso/elastic net (e.g., glmnet) is simpler and likely faster.


Installation

CRAN (stable)

install.packages("missoNet")

GitHub (development)

# install.packages("devtools")
devtools::install_github("yixiao-zeng/missoNet", build_vignettes = TRUE)

Quick start

library(missoNet)

# Example data with ~15% missing responses (MCAR)
sim <- generateData(n = 300, p = 50, q = 10, rho = 0.15, missing.type = "MCAR")

# Fit along two lambda paths; choose via BIC (no CV)
fit <- missoNet(X = sim$X, Y = sim$Z, GoF = "BIC")

# Extract estimates at the selected solution
Beta  <- fit$est.min$Beta   # p x q regression coefficients
Theta <- fit$est.min$Theta  # q x q precision (conditional network)

# Visualize selection path
plot(fit, type = "scatter")

Cross‑validation & prediction

# 5-fold CV over (lambda.beta, lambda.theta)
cvfit <- cv.missoNet(X = sim$X, Y = sim$Z, kfold = 5)

# Inspect CV heatmap and selected models (min and 1-SE variants)
plot(cvfit, type = "heatmap")

# Predict responses on new data
Y_hat <- predict(cvfit, newx = sim$X, s = "lambda.min")

Tip: Try s = "lambda.1se.beta" or "lambda.1se.theta" for more conservative sparsity when available.


Parallel processing

library(parallel)

cl <- makeCluster(max(1, detectCores() - 1))
cvfit <- cv.missoNet(X = sim$X, Y = sim$Z, kfold = 5,
                     parallel = TRUE, cl = cl)
stopCluster(cl)

Advanced usage

Custom penalty factors

# Lessen the penalty for prior-important predictors
p <- ncol(sim$X); q <- ncol(sim$Z)
beta.pen.factor <- matrix(1, p, q)
beta.pen.factor[c(1, 2), ] <- 0.1

fit <- missoNet(X = sim$X, Y = sim$Z,
                beta.pen.factor = beta.pen.factor)

Adaptive search (faster large runs)

fit <- missoNet(X = sim$X, Y = sim$Z,
                adaptive.search   = TRUE,
                n.lambda.beta     = 50,
                n.lambda.theta    = 50)

Documentation

vignette("missoNet-introduction")
vignette("missoNet-cross-validation")
vignette("missoNet-case-study")

If vignettes are not available from CRAN binaries on your platform, install from source using the GitHub command above with build_vignettes = TRUE.


Performance notes

Actual performance will depend on sparsity, signal-to-noise, and missingness mechanisms.


When to use (and not)

Great for

Not ideal for - Single-response regression (use glmnet or similar) - Extremely sparse information (e.g., >50% missing responses across most traits)


Citation

If you use missoNet in your research, please cite:

@article{zeng2025missonet,
  title   = {Multivariate regression with missing response data for modelling regional DNA methylation QTLs},
  author  = {Zeng, Yixiao and Alam, Shomoita and Bernatsky, Sasha and Hudson, Marie and Colmegna, In{\'e}s and Stephens, David A and Greenwood, Celia MT and Yang, Archer Y},
  journal = {arXiv preprint arXiv:2507.05990},
  year    = {2025},
  url     = {https://arxiv.org/abs/2507.05990}
}

Contributing

Contributions and issues are welcome! Please open a discussion or pull request on the GitHub repository.


License

GPL-2. See the LICENSE file.