Package: shapley
Type: Package
Title: Weighted Mean SHAP for Feature Selection in ML Grid and Ensemble
Version: 0.3
Authors@R: 
    person("E. F. Haghish",
           role = c("aut", "cre", "cph"),
           email = "haghish@hotmail.com")
Depends: R (>= 3.5.0)
Description: This R package introduces Weighted Mean SHapley Additive exPlanations (WMSHAP),
    an innovative method for calculating SHAP values for a grid of fine-tuned base-learner machine  
    learning models as well as stacked ensembles, a method not previously available due to the 
    common reliance on single best-performing models. By integrating the weighted mean 
    SHAP values from individual base-learners comprising the ensemble or individual 
    base-learners in a tuning grid search, the package weights SHAP contributions 
    according to each model's performance, assessed by multiple either R squared 
    (for both regression and classification models). alternatively, this software 
    also offers weighting SHAP values based on the area under the precision-recall
    curve (AUCPR), the area under the curve (AUC), and F2 measures for binary classifiers. 
    It further extends this framework to implement weighted confidence intervals for 
    weighted mean SHAP values, offering a more comprehensive and robust feature importance 
    evaluation over a grid of machine learning models, instead of solely computing SHAP 
    values for the best model. This methodology is particularly beneficial for addressing 
    the severe class imbalance (class rarity) problem by providing a transparent, 
    generalized measure of feature importance that mitigates the risk of reporting 
    SHAP values for an overfitted or biased model and maintains robustness under severe 
    class imbalance, where there is no universal criteria of identifying the absolute 
    best model. Furthermore, the package implements hypothesis testing to ascertain the 
    statistical significance of SHAP values for individual features, as well as 
    comparative significance testing of SHAP contributions between features. Additionally, 
    it tackles a critical gap in feature selection literature by presenting criteria for 
    the automatic feature selection of the most important features across a grid of models 
    or stacked ensembles, eliminating the need for arbitrary determination of the number 
    of top features to be extracted. This utility is invaluable for researchers analyzing 
    feature significance, particularly within severely imbalanced outcomes where 
    conventional methods fall short. Moreover, it is also expected to report democratic 
    feature importance across a grid of models, resulting in a more comprehensive and 
    generalizable feature selection. The package further implements a novel method for 
    visualizing SHAP values both at subject level and feature level as well as a plot 
    for feature selection based on the weighted mean SHAP ratios.
License: MIT + file LICENSE
Encoding: UTF-8
Imports: ggplot2 (>= 3.4.2), h2o (>= 3.34.0.0), curl (>= 4.3.0), waffle
        (>= 1.0.2)
RoxygenNote: 7.3.1
URL: https://github.com/haghish/shapley,
        https://www.sv.uio.no/psi/english/people/academic/haghish/
BugReports: https://github.com/haghish/shapley/issues
NeedsCompilation: no
Packaged: 2024-05-29 11:50:22 UTC; U-Shaped-Valley
Author: E. F. Haghish [aut, cre, cph]
Maintainer: E. F. Haghish <haghish@hotmail.com>
Repository: CRAN
Date/Publication: 2024-05-30 07:00:20 UTC
