Welcome to ClientVPS Mirrors

Help for package rSRD

Package {rSRD}


Type: Package
Title: Sum of Ranking Differences Statistical Test
Version: 0.2.0
Maintainer: Jochen Staudacher <jochen.staudacher@hs-kempten.de>
Description: We provide an implementation for Sum of Ranking Differences (SRD), a novel statistical test introduced by Héberger (2010) <doi:10.1016/j.trac.2009.09.009>. The test allows the comparison of different solutions through a reference by first performing a rank transformation on the input, then calculating and comparing the distances between the solutions and the reference - the latter is measured in the L1 norm. The reference can be an external benchmark (e.g. an established gold standard) or can be aggregated from the data. The calculated distances, called SRD scores, are validated in two ways, see Héberger and Kollár-Hunek (2011) <doi:10.1002/cem.1320>. A randomization test (also called permutation test) compares the SRD scores of the solutions to the SRD scores of randomly generated rankings. The second validation option is cross-validation that checks whether the rankings generated from the solutions come from the same distribution or not. For a detailed analysis about the cross-validation process see Sziklai, Baranyi and Héberger (2021) <doi:10.48550/arXiv.2105.11939>. The package offers a wide array of features related to SRD including the computation of the SRD scores, validation options, input preprocessing and plotting tools.
License: GPL-3
Encoding: UTF-8
LinkingTo: Rcpp
Depends: R (≥ 4.1.0)
Imports: Rcpp (≥ 1.0.13), dplyr, janitor, tibble, ggplot2, methods, rlang, ggrepel, gplots, stats
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0)
VignetteBuilder: knitr
Config/testthat/edition: 3
SystemRequirements: C++17, Rtools (>= 4.2) for Windows
NeedsCompilation: yes
RoxygenNote: 7.3.3
Author: Jochen Staudacher ORCID iD [aut, cph, cre], Balázs R. Sziklai ORCID iD [aut, cph], Linus Olsson [aut, cph], Dennis Horn [ctb], Alexander Pothmann [ctb], Ali Tugay Sen [ctb], Attila Gere ORCID iD [ctb], Károly Hébeger ORCID iD [ctb]
Packaged: 2026-06-30 10:44:10 UTC; Jochen
Repository: CRAN
Date/Publication: 2026-06-30 12:20:09 UTC

rSRD: Sum of Ranking Differences Statistical Test

Description

We provide an implementation for Sum of Ranking Differences (SRD), a novel statistical test introduced by Héberger (2010) doi:10.1016/j.trac.2009.09.009. The test allows the comparison of different solutions through a reference by first performing a rank transformation on the input, then calculating and comparing the distances between the solutions and the reference - the latter is measured in the L1 norm. The reference can be an external benchmark (e.g. an established gold standard) or can be aggregated from the data. The calculated distances, called SRD scores, are validated in two ways, see Héberger and Kollár-Hunek (2011) doi:10.1002/cem.1320. A randomization test (also called permutation test) compares the SRD scores of the solutions to the SRD scores of randomly generated rankings. The second validation option is cross-validation that checks whether the rankings generated from the solutions come from the same distribution or not. For a detailed analysis about the cross-validation process see Sziklai, Baranyi and Héberger (2021) doi:10.48550/arXiv.2105.11939. The package offers a wide array of features related to SRD including the computation of the SRD scores, validation options, input preprocessing and plotting tools.

Author(s)

Maintainer: Jochen Staudacher jochen.staudacher@hs-kempten.de (ORCID) [copyright holder]

Authors:

Other contributors:


calculateCrossValidation

Description

R interface to test whether the rankings induced by the columns come from the same distribution. If the number of folds and the test method are not specified, the default is the 8-fold Wilcoxon test combined with cross-validation. If the number of rows is less than 8, leave-one-out cross-validation is applied. Columns are ordered based on the SRD values of the different folds, then each consecutive column-pairs are tested. Test statistics for Alpaydin test follows F distribution with df1=2k, df2=k degrees of freedom. Dietterich test statistics follow t-distribution with k degrees of freedom (two-tailed). Wilcoxon test statistics is calculated as the absolute value of the difference of the sum of the positive ranks (W+) and sum of the negative ranks (W-). The distribution for this test statistics can be derived from the Wilcoxon signed rank distribution. For more information about the cross-validation process see Sziklai, Baranyi and Héberger (2021).

Usage

calculateCrossValidation(
  data_matrix,
  method = "Wilcoxon",
  number_of_folds = 8,
  precision = 5,
  output_to_file = TRUE,
  seed = NULL
)

Arguments

data_matrix

A data frame. All columns must be numeric and free of NA values. The last column is treated as the reference.

method

A string specifying the method. The methods "Wilcoxon", "Alpaydin" and "Dietterich" are available.

number_of_folds

The number of folds used in the cross validation. Ranges between 5 to 10.

precision

The precision used for the the ranking matrix transformation.

output_to_file

Boolean flag to enable file output.

seed

An optional integer seed for the random number generator in the C++ shuffling routines, enabling reproducible results. When NULL (the default), the random number generator is seeded from the system entropy pool, preserving the original stochastic behaviour.

Value

A List containing

Author(s)

Balázs R. Sziklai sziklai.balazs@krtk.hu, Linus Olsson linusmeol@gmail.com, Jochen Staudacher jochen.staudacher@hs-kempten.de

References

Sziklai, Balázs R., Máté Baranyi, and Károly Héberger (2021). "Testing Cross-Validation Variants in Ranking Environments", arXiv preprint arXiv:2105.11939 (2021).

Examples

df <- data.frame(
Sol_1=c(7, 6, 5, 4, 3, 2, 1),
Sol_2=c(1, 2, 3, 4, 5, 7, 6),
Sol_3=c(1, 2, 3, 4, 7, 5, 6),
Ref=c(1, 2, 3, 4, 5, 6, 7))

calculateCrossValidation(df, output_to_file = FALSE)
calculateCrossValidation(df, output_to_file = FALSE, seed = 42)

calculateSRDDistribution

Description

R interface to calculate the SRD distribution that corresponds to the data. The simulation draws 1,000,000 random rankings; for small n the resulting thresholds (XX1, XX19) can vary slightly between runs. Use the seed parameter for fully reproducible results.

Usage

calculateSRDDistribution(
  data_matrix,
  option = "f",
  tie_probability = 0,
  output_to_file = FALSE,
  seed = NULL
)

Arguments

data_matrix

A data frame. All columns must be numeric and free of NA values. The last column is treated as the reference.

option

A character to specify how ties are generated in the simulation. The following options are available:

  • 'n': No ties for solution vectors; reference is fixed.

  • 'r': No ties; both solution and reference are random.

  • 't': Ties with a fixed user-supplied probability for both solution vectors and reference.

  • 'p': Ties with a fixed user-supplied probability for solution vectors; reference is fixed.

  • 'd': Tie distribution reflects the tie frequencies in the solution vectors; reference is fixed.

  • Default (recommended): Tie distribution reflects tie frequencies in the reference; reference is fixed.

tie_probability

The probability with which ties can occur.

output_to_file

Boolean flag to enable file output. Default FALSE.

seed

An optional integer seed for the random number generator in the C++ simulation, enabling reproducible results. When NULL (the default), the random number generator is seeded from the system entropy pool, preserving the original stochastic behaviour.

Value

A list containing the SRD distribution and related descriptive statistics. xx1 indicates the 5 percent significance threshold. SRD values between xx1 and xx19 are indistinguishable from random rankings; a value above xx19 indicates reverse ordering (5 percent significance).

Author(s)

Balázs R. Sziklai sziklai.balazs@krtk.hu

Linus Olsson linusmeol@gmail.com

Jochen Staudacher jochen.staudacher@hs-kempten.de

Examples


df <- data.frame(
A=c(32, 52, 44, 44, 47),
B=c(73, 75, 65, 76, 70),
C=c(60, 59, 57, 55, 60),
D=c(35, 24, 44, 83, 47),
E=c(41, 52, 46, 50, 65))

calculateSRDDistribution(df, option = 'p', tie_probability = 0.5)

# Reproducible run:
calculateSRDDistribution(df, seed = 42)


calculateSRDDistribution_impl

Description

Internal C++ implementation for calculateSRDDistribution. Do not call directly; use calculateSRDDistribution() instead.

Usage

calculateSRDDistribution_impl(
  data_matrix,
  option = "f",
  tie_probability = 0,
  output_to_file = FALSE,
  seed = -1L
)

calculateSRDValues

Description

R interface to calculate SRD values. To test the results' significance run calculateSRDDistribution(). For more information about SRD scores and their validation see Héberger and Kollár-Hunek (2011).

Usage

calculateSRDValues(data_matrix, output_to_file = FALSE)

Arguments

data_matrix

A data frame. All columns must be numeric and free of NA values. The last column is treated as the reference.

output_to_file

Boolean flag to enable file output. Default FALSE.

Value

A numeric vector containing the normalised SRD values (each in [0, 1]) for every non-reference column.

Author(s)

Balázs R. Sziklai sziklai.balazs@krtk.hu

Linus Olsson linusmeol@gmail.com

Jochen Staudacher jochen.staudacher@hs-kempten.de

References

Héberger K., Kollár-Hunek K. (2011) "Sum of ranking differences for method discrimination and its validation: comparison of ranks with random numbers", Journal of Chemometrics, 25(4), pp. 151-158.

Examples

df <- data.frame(
A=c(32, 52, 44, 44, 47),
B=c(73, 75, 65, 76, 70),
C=c(60, 59, 57, 55, 60),
D=c(35, 24, 44, 83, 47),
E=c(41, 52, 46, 50, 65))

calculateSRDValues(df)

calculateSRDValues_impl

Description

Internal C++ implementation for calculateSRDValues. Do not call directly; use calculateSRDValues() instead.

Usage

calculateSRDValues_impl(data_matrix, output_to_file = FALSE)

plotCrossValidation

Description

Plots data generated by the calculateCrossValidation function as a boxplot. Includes max and min as whiskers as well as the average (marked by a crossed circle), median (marked by a horizontal bold line) and the 1st and 3rd quartile of the values. Box geometry (min, Q1, median, Q3, max) is taken directly from the precomputed 'boxplot_values', the single source of truth also reported to the user elsewhere, rather than being re-derived from the fold-wise values. This avoids a second, independent quantile calculation that could drift from the official figures – and, in particular, guarantees that identical reported statistics (whether within one solution, e.g. Q1 equal to the median, or across different solutions sharing the same value) are always drawn at exactly the same height.

Usage

plotCrossValidation(cv_results)

Arguments

cv_results

The List of results returned by the calculateCrossValidation function.

Value

None.

Author(s)

Linus Olsson linusmeol@gmail.com

Alexander Pothmann

Jochen Staudacher jochen.staudacher@hs-kempten.de

Examples

df <- data.frame(
Sol_1=c(7, 6, 5, 4, 3, 2, 1),
Sol_2=c(1, 2, 3, 4, 5, 7, 6),
Sol_3=c(1, 2, 3, 4, 7, 5, 6),
Ref=c(1, 2, 3, 4, 5, 6, 7))

cv_results <- rSRD::calculateCrossValidation(df, output_to_file = FALSE)
rSRD::plotCrossValidation(cv_results)

plotHeatmapSRD

Description

Heatmap is generated based on the pairwise distance - measured in SRD - of the columns. Each column is set as reference once, then SRD values are calculated for the other columns.

Usage

plotHeatmapSRD(df, output_to_file = FALSE, color = utilsColorPalette)

Arguments

df

A DataFrame.

output_to_file

Logical. If true, the distance matrix will be saved to the hard drive.

color

Vector of colors used for the image. Defaults to colors utilsColorPalette.

Value

Returns a heatmap and the corresponding distance matrix.

Author(s)

Attila Gere gereattilaphd@gmail.com, Linus Olsson linusmeol@gmail.com, Jochen Staudacher jochen.staudacher@hs-kempten.de

Examples

srdInput <- data.frame(
A=c(32, 52, 44, 44, 47),
B=c(73, 75, 65, 76, 70),
C=c(60, 59, 57, 55, 60),
D=c(35, 24, 44, 83, 47),
E=c(41, 52, 46, 50, 65))

plotHeatmapSRD(srdInput)

mycolors<- c("#e3f2fd", "#bbdefb", "#90caf9","#64b5f6","#42a5f5",
             "#2196f3","#1e88e5","#1976d2","#1565c0","#0d47a1")
plotHeatmapSRD(srdInput, color=mycolors)


plotPermTest

Description

Plots the permutation test for the given data frame by using the simulation data created by the calculateSRDDistribution() function. Vertical lines mark the positions of the solution columns on the simulated SRD distribution. Dashed lines indicate the 5 percent significance thresholds (XX1, XX19) and the median (Med).

Usage

plotPermTest(df, simulationData, densityToDistr = FALSE)

Arguments

df

A DataFrame.

simulationData

The output of the calculateSRDDistribution() function.

densityToDistr

Logical. If FALSE (default) the y-axis shows relative frequency (probability density). If TRUE it shows cumulative relative frequency (CDF).

Value

None.

Author(s)

Linus Olsson linusmeol@gmail.com

Examples


df <- data.frame(
A=c(32, 52, 44, 44, 47),
B=c(73, 75, 65, 76, 70),
C=c(60, 59, 57, 55, 60),
D=c(35, 24, 44, 83, 47),
E=c(41, 52, 46, 50, 65))

simulationData <- rSRD::calculateSRDDistribution(df)
plotPermTest(df, simulationData)


utilsCalculateDistance

Description

Calculates the distance of two rankings in $L_1$ norm and inserts the result after the first.

Usage

utilsCalculateDistance(df, nameCol, refCol)

Arguments

df

A DataFrame.

nameCol

The current Column of the iteration.

refCol

The reference Column of the dataFrame.

Value

Returns a new df that has a Distance Column based on the nameCol.

Author(s)

Ali Tugay Sen, Jochen Staudacher jochen.staudacher@hs-kempten.de

Examples

SRDInput <- data.frame(
A=c(32, 52, 44, 44, 47),
B=c(73, 75, 65, 76, 70),
C=c(60, 59, 57, 55, 60),
D=c(35, 24, 44, 83, 47),
E=c(41, 52, 46, 50, 65))
nameCol <- "A"
refCol <- "B"
rSRD::utilsCalculateDistance(SRDInput,nameCol,refCol)

utilsCalculateRank

Description

Calculates the ranking of a given column.

Usage

utilsCalculateRank(df, nameCol)

Arguments

df

A DataFrame.

nameCol

The name of the column to be ranked. Note that this parameter needs to be specified as there is no default value.

Value

Returns a new df that has an additional column with the rankings of the column specified by nameCol.

Author(s)

Jochen Staudacher jochen.staudacher@hs-kempten.de

Examples

SRDInput <- data.frame(
A=c(32, 52, 44, 44, 47),
B=c(73, 75, 65, 76, 70),
C=c(60, 59, 57, 55, 60),
D=c(35, 24, 44, 83, 47),
E=c(41, 52, 46, 50, 65))
columnName <- "A"
rSRD::utilsCalculateRank(SRDInput,columnName)

utilsColorPalette

Description

Unique color palette for heatmaps.

Usage

utilsColorPalette

Format

An object of class character of length 250.

Author(s)

Attila Gere gereattilaphd@uni-mate.hu

Balázs R. Sziklai sziklai.balazs@krtk.hu,

Jochen Staudacher jochen.staudacher@hs-kempten.de

Examples

barplot(rep(1,250), col = utilsColorPalette)

utilsCreateReference

Description

Adds a new reference column based on the input DataFrame df and the given method. This function iterates over the rows and applies the given method to define the value of the reference. Available options are: max, min, median, mean and mixed. This column is appended to the DataFrame. When "mixed" is specified the function will consider the refVector for creating the reference column.

Usage

utilsCreateReference(df, method = "max", refVector = c())

Arguments

df

A DataFrame.

method

A string value specifying the reference creating method. Available options: max, min, median, mean and mixed.

refVector

A vector of strings that specifies a method for each row. Vector size should be equal to the number of rows in the DataFrame df.

Value

Returns a new DataFrame appended with the reference column created by the method.

Author(s)

Ali Tugay Sen, Linus Olsson linusmeol@gmail.com

Examples

SRDInput <- data.frame(
A=c(32, 52, 44, 44, 47),
B=c(73, 75, 65, 76, 70),
C=c(60, 59, 57, 55, 60),
D=c(35, 24, 44, 83, 47),
E=c(41, 52, 46, 50, 65))
proc_data <- rSRD::utilsPreprocessDF(SRDInput)
ref <- c("min","max","min","max","mean")
rSRD::utilsCreateReference(proc_data, method = "mixed", ref)

utilsDetailedSRD

Description

Detailed calculation of the SRD values including the computation of the ranking transformation. Unless there is a column specified with referenceCol the last column will always taken as the reference.

Usage

utilsDetailedSRD(
  df,
  referenceCol,
  createRefCol = function() {
 }
)

Arguments

df

A DataFrame.

referenceCol

Optional. A string that contains a column of df which will be used as the reference column.

createRefCol

Optional. Can be max, min, median, mean. Creates a new Column based on the existing df and attaches it to df as the reference Column.

Value

Returns a new DataFrame that shows the detailed SRD computation (ranking transformation and distance calculation). A newly added row contains the SRD values (displayed without normalization).

Author(s)

Ali Tugay Sen, Jochen Staudacher jochen.staudacher@hs-kempten.de

Examples

SRDInput <- data.frame(
A=c(32, 52, 44, 44, 47),
B=c(73, 75, 65, 76, 70),
C=c(60, 59, 57, 55, 60),
D=c(35, 24, 44, 83, 47),
E=c(41, 52, 46, 50, 65))
rSRD::utilsDetailedSRD(SRDInput)

utilsDetailedSRDNoChars

Description

Detailed calculation of the SRD values including the computation of the ranking transformation. Unless there is a column specified with referenceCol the last column will always taken as the reference. This variant differs from utilsDetailedSRD in that non-numeric columns will not be converted to chars, i.e. the data types of non-numeric columns will be preserved in the output.

Usage

utilsDetailedSRDNoChars(
  df,
  referenceCol,
  createRefCol = function() {
 }
)

Arguments

df

A DataFrame.

referenceCol

Optional. A string that contains a column of df which will be used as the reference column.

createRefCol

Optional. Can be max, min, median, mean. Creates a new Column based on the existing df and attaches it to df as the reference Column.

Value

Returns a new DataFrame that shows the detailed SRD computation (ranking transformation and distance calculation). A newly added row contains the SRD values (displayed without normalization).

Author(s)

Ali Tugay Sen, Jochen Staudacher jochen.staudacher@hs-kempten.de

Examples

SRDInput <- data.frame(
A=c(32, 52, 44, 44, 47),
B=c(73, 75, 65, 76, 70),
C=c(60, 59, 57, 55, 60),
D=c(35, 24, 44, 83, 47),
E=c(41, 52, 46, 50, 65))
rSRD::utilsDetailedSRDNoChars(SRDInput)

utilsMaxSRD

Description

Calculates the maximum distance between two rankings of size n. This function is used to normalize SRD values.

Usage

utilsMaxSRD(rowsCount)

Arguments

rowsCount

The number of rows in the SRD calculation.

Value

The maximum achievable SRD value.

Author(s)

Dennis Horn

Examples

maxSRD <- rSRD::utilsMaxSRD(5)

utilsPreprocessDF

Description

This function preprocesses the DataFrame depending on the method.

Usage

utilsPreprocessDF(df, method = "range_scale")

Arguments

df

A DataFrame.

method

A string that should contain "scale_to_unit", "standardize", "range_scale" or "scale_to_max".

Value

Returns a new df that has been preprocessed based on the method chosen.

Author(s)

Ali Tugay Sen, Dennis Horn dennishorn@hotmail.de, Linus Olsson linusmeol@gmail.com

Examples

SRDInput <- data.frame(
A=c(32, 52, 44, 44, 47),
B=c(73, 75, 65, 76, 70),
C=c(60, 59, 57, 55, 60),
D=c(35, 24, 44, 83, 47),
E=c(41, 52, 46, 50, 65))
method <- "standardize"
utilsPreprocessDF(SRDInput,method)

utilsRankingMatrix

Description

R interface to perform the rank transformation on the columns of the input data frame. Ties are resolved by fractional ranking.

Usage

utilsRankingMatrix(data_matrix)

Arguments

data_matrix

A data frame. All columns must be numeric and free of NA values.

Value

A data frame containing the ranking matrix.

Author(s)

Balázs R. Sziklai sziklai.balazs@krtk.hu, Linus Olsson linusmeol@gmail.com

Examples

df <- data.frame(
A=c(32, 52, 44, 44, 47),
B=c(73, 75, 65, 76, 70),
C=c(60, 59, 57, 55, 60),
D=c(35, 24, 44, 83, 47),
E=c(41, 52, 46, 50, 65))

utilsRankingMatrix(df)

utilsRankingMatrix_impl

Description

Internal C++ implementation for utilsRankingMatrix. Do not call directly; use utilsRankingMatrix() instead.

Usage

utilsRankingMatrix_impl(data_matrix)

utilsTieProbability

Description

Calculates the tie probability for a given vector. The tie probability is defined as the number of consecutive tied component-pairs in the sorted vector divided by the size of the vector minus 1.

Usage

utilsTieProbability(x)

Arguments

x

A vector.

Value

Returns the tie probability as a numeric value.

Author(s)

Ali Tugay Sen, Linus Olsson linusmeol@gmail.com

Examples

x <-c(1,2,4,4,5,5,6)
rSRD::utilsTieProbability(x)

validateSRDInput

Description

Validate a data frame before it is passed to SRD functions

Checks that df is a numeric data frame with at least 2 columns and 2 rows, that it contains no constant columns, and — unless allow_na = TRUE — that it contains no NA values. Called internally by every public function that forwards data to C++, so that the C++ layer never receives invalid input (which can cause infinite loops in the ranking and simulation code).

Usage

validateSRDInput(df, allow_na = FALSE, arg_name = "data_matrix")

Arguments

df

The object to validate.

allow_na

Logical. If TRUE the NA check is skipped. Default FALSE.

arg_name

Character string used in error messages to identify the argument. Default "data_matrix".

Value

invisible(TRUE) on success; stops with an informative message on failure.

Author(s)

Jochen Staudacher jochen.staudacher@hs-kempten.de

Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.

This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.