Welcome to ClientVPS Mirrors

README

fastml: Guarded Resampling Workflows for Safe and Automated Machine Learning in R

CRAN status CRAN downloads CRAN monthly License: MIT Lifecycle: stable R

Guarded Resampling Workflows for Safe and Automated Machine Learning in R

fastml is an R package for training, evaluating, and comparing machine learning models with a guarded resampling workflow. Rather than introducing new learning algorithms, fastml focuses on reducing leakage risk by keeping preprocessing, model fitting, and evaluation aligned within supported resampling paths.

In fastml, fast refers to the rapid construction of statistically valid workflows, not to computational shortcuts. By eliminating entire classes of user-induced errors — most notably preprocessing leakage — fastml allows practitioners to obtain reliable performance estimates with minimal configuration.

Core Principles

Installation

From CRAN

install.packages("fastml")

From GitHub (development version)

# install.packages("devtools")
devtools::install_github("selcukorkmaz/fastml")

Optional Dependencies

fastml uses a lightweight core. Feature-specific packages are installed only when needed:

# Explainability (SHAP, LIME, ALE, ICE, etc.)
install.packages(c("DALEX", "iml", "lime", "iBreakDown", "pdp"))

# Additional model engines
install.packages(c("ranger", "glmnet", "kernlab", "kknn", "lightgbm", "C50"))

# Discriminant analysis and bagged trees
install.packages(c("discrim", "baguette"))

# Survival analysis extensions
install.packages(c("censored", "flexsurv", "rstpm2", "survRM2", "aorsf"))

# Fairness and interactive dashboards
install.packages(c("fairmodels", "modelStudio"))

Quick Start

library(fastml)
library(dplyr)

data(iris)

iris_binary <- iris %>%
  filter(Species != "setosa") %>%
  mutate(Species = factor(Species))

# Train and evaluate models in one call
fit <- fastml(
  data = iris_binary,
  label = "Species",
  algorithms = c("rand_forest", "logistic_reg")
)

# Inspect results
summary(fit)

# Visualize
plot(fit, type = "bar")
plot(fit, type = "roc")
plot(fit, type = "calibration")

# Predict on new data
predict(fit, newdata = iris_binary[1:5, ])

Supported Tasks

fastml auto-detects the task type based on the target variable:

Task Detection Metrics
Classification Factor or character target (binary/multiclass) Accuracy, ROC AUC, Sensitivity, Specificity, Precision, F1, Kappa, Logloss
Regression Numeric target RMSE, R-squared, MAE
Survival Two-column label (time + status) C-index, Integrated Brier Score, time-dependent Brier, RMST

Supported Algorithms

Classification

Algorithm Default Engine Alternative Engines
logistic_reg glm glmnet, brulee, stan, keras, LiblineaR
multinom_reg nnet glmnet, brulee, keras
rand_forest ranger randomForest, h2o, partykit
xgboost xgboost
lightgbm lightgbm
decision_tree rpart C5.0, partykit
bag_tree rpart
svm_rbf kernlab
svm_linear kernlab LiblineaR
nearest_neighbor kknn
naive_Bayes klaR naivebayes, h2o
mlp nnet brulee, keras, h2o
discrim_linear MASS
discrim_quad sparsediscrim

Regression

Algorithm Default Engine Alternative Engines
linear_reg lm
ridge_reg glmnet
lasso_reg glmnet
elastic_net glmnet
rand_forest ranger randomForest, h2o
xgboost xgboost
lightgbm lightgbm
decision_tree rpart
svm_rbf kernlab
svm_linear kernlab
nearest_neighbor kknn
mlp nnet brulee, keras, h2o
pls mixOmics
bayes_glm stan

Survival

Algorithm Default Engine
cox_ph survival
penalized_cox glmnet
stratified_cox survival
time_varying_cox survival
survreg survival
rand_forest aorsf / ranger
xgboost / xgboost_aft xgboost
parametric_surv flexsurv
piecewise_exp flexsurv
royston_parmar rstpm2

Resampling Methods

fit <- fastml(
  data = df,
  label = "target",
  resampling_method = "cv",  # default
  cv_folds = 10
)
Method Description
cv K-fold cross-validation (default)
repeatedcv Repeated cross-validation
boot Bootstrap resampling
grouped_cv Grouped cross-validation (keeps groups intact)
blocked_cv Blocked/time-series CV (respects temporal ordering)
rolling_origin Rolling window resampling
nested_cv Nested cross-validation (unbiased tuning)
validation_split Simple train/validation split
none No resampling (single holdout)

Hyperparameter Tuning

fit <- fastml(
  data = iris_binary,
  label = "Species",
  algorithms = c("rand_forest", "xgboost"),
  tuning_strategy = "bayes",
  tuning_complexity = "balanced",
  tuning_iterations = 25
)
Strategy Description
grid Grid search (default)
bayes Bayesian optimization
none No tuning, use defaults

Tuning complexity presets control search breadth:

Preset Grid Levels Use Case
quick 2 Prototyping, debugging
balanced 3 Most production use (default)
thorough 5 Final model selection, publications
exhaustive 7 Research, competitions

Preprocessing

fastml isolates preprocessing inside the resampling loop to prevent leakage:

fit <- fastml(
  data = df,
  label = "target",
  impute_method = "knnImpute",   # medianImpute, bagImpute, remove, error
  scale = c("center", "scale"),
  balance = "upsample"           # downsample, or none (default)
)

Transformations applied per fold: imputation, dummy encoding, centering, scaling, zero-variance removal, novel/unknown level handling.

Explainability

Model explainability is provided through fastexplain() with 10 methods:

# DALEX-based: variable importance, SHAP values, partial dependence
fastexplain(fit, method = "dalex")

# LIME local explanations
fastexplain(fit, method = "lime", observation = df[1, ])

# Accumulated Local Effects
fastexplain(fit, method = "ale", features = "Sepal.Length")

# Individual Conditional Expectation curves
fastexplain(fit, method = "ice", features = "Sepal.Length")

# Surrogate decision tree
fastexplain(fit, method = "surrogate")

# Feature interaction strength
fastexplain(fit, method = "interaction")

# iBreakDown contributions
fastexplain(fit, method = "breakdown", observation = df[1, ])

# Counterfactual explanations
fastexplain(fit, method = "counterfactual", observation = df[1, ])

# Interactive modelStudio dashboard
fastexplain(fit, method = "studio")

# Fairness diagnostics
fastexplain(fit, method = "fairness", protected = df$gender)

Feature Importance Stability

Analyze how feature importance varies across cross-validation folds:

fit <- fastml(data = df, label = "target", store_fold_models = TRUE)
stability <- explain_stability(fit)
print(stability)
plot(stability)

Exploratory Data Analysis

fastexplore() provides read-only diagnostics prior to model training:

fastexplore(iris, label = "Species")

Generates: summary statistics, distribution plots (histograms, boxplots, bar charts), correlation heatmaps, Q-Q plots, missingness analysis, and more — without invoking any modeling.

Visualization

# Performance comparison
plot(fit, type = "bar")

# ROC curves (classification)
plot(fit, type = "roc")

# Calibration plot (classification)
plot(fit, type = "calibration")

# Residual diagnostics (regression)
plot(fit, type = "residual")

# All plots at once
plot(fit, type = "all")

Parallel Processing

fit <- fastml(
  data = df,
  label = "target",
  algorithms = c("rand_forest", "xgboost", "svm_rbf"),
  n_cores = 4
)

Model Persistence

# Save
save.fastml(fit, path = "my_model.rds")

# Load
fit <- load_model("my_model.rds")

Survival Analysis

library(survival)
data(lung)

fit_surv <- fastml(
  data = lung,
  label = c("time", "status"),
  algorithms = c("cox_ph", "rand_forest"),
  eval_times = c(180, 365, 730)
)

summary(fit_surv)

# Predict survival probabilities
predict_survival(fit_surv, newdata = lung[1:5, ], eval_times = c(365, 730))

# Predict risk scores
predict_risk(fit_surv, newdata = lung[1:5, ])

Advanced Options

fit <- fastml(
  data = df,
  label = "target",

  # Algorithms and engines
  algorithms = c("rand_forest", "xgboost"),
  algorithm_engines = list(rand_forest = "ranger", xgboost = "xgboost"),
  engine_params = list(xgboost = list(nthread = 4)),

  # Resampling
  resampling_method = "cv",
  cv_folds = 10,
  store_fold_models = TRUE,

  # Tuning
  tuning_strategy = "bayes",
  tuning_complexity = "thorough",
  tuning_iterations = 30,

  # Classification-specific
  class_threshold = "auto",          # auto-tune threshold
  multiclass_auc = "macro_weighted", # prevalence-weighted AUC

  # Bootstrap confidence intervals
  bootstrap_ci = TRUE,
  bootstrap_samples = 500,

  # Reproducibility
  seed = 42
)

Scope

fastml is intended for users who require reliable performance estimation under cross-validation, particularly in:

It prioritizes correctness-oriented defaults and workflow clarity over maximum flexibility.

License

MIT License.

Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.

This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.