Processing math: 100%

Introduction

In this vignette, we will briefly describe and motivate how we constructed the test statistics used by the function m_test and how it derives a test decision.

Asymptotic distribution of the M-test statistics

For a more detailed description of the asymptotic behaviour of M-estimators, we refer to Maronna et al. (2019, p. 36ff.), which is the main reference for the following motivation.

We consider two independent samples X1,,Xm and Y1,,Yn of i.i.d. random variables which are symmetrically distributed with variances σ2X and σ2Y.

For M-estimators ˆμX and ˆμY with a ψ-function ψ, it can be shown under these conditions that m(ˆμXμX)asympt.N(0,σ2XνX)andn(ˆμYμY)asympt.N(0,σ2YνY),

where μXR and μYR are the values for which E(ψ(XμXσX))=0andE(ψ(YμYσY))=0,

and

νX=E(ψ(XμXσX)2)(E(ψ(XμXσX)))2andνY=E(ψ(YμYσY)2)(E(ψ(YμYσY)))2.

From this, it follows that ˆμXasympt.N(μX,σ2XνXm)andˆμyasympt.N(μY,σ2YνYn),

implying

ˆμXˆμY(μXμY)nσ2XνX+mσ2YνYmnasympt.N(0,1).

In order to use this statistic as a test statistic for our M-tests, we need to estimate σX, σY, νX, and νY. We use the τ-scale estimator (Maronna and Zamar, 2002) to estimate σ2X and σ2Y by ˆσ2X and ˆσ2Y robustly and estimate νX and νY by

ˆνX=1mmi=1ψ(XiˆμXˆσX)2(1mmi=1ψ(XiˆμXˆσX))2andˆνY=1nnj=1ψ(YjˆμYˆσY)2(1nnj=1ψ(YjˆμYˆσY))2.

Under the previous considerations, the test statistic of the M-tests we implemented in the package is given by

ˆμXˆμYΔnˆσ2XˆνX+mˆσ2YˆνYmnasympt.N(0,1),

where Δ=μXμY is the location difference between both distributions.

The M-tests are implemented in the function m_test. More details on the usage of the function can be found in the vignette Getting started with robnptests. Inside m_test, we use the function scaleTau2 from the R package robustbase (Maechler et al., 2022) to compute the τ-scale estimates for the samples.

Simulation results

The following figure shows the simulated test sizes from a small simulation study with 1000 replications, where we applied the M-tests with different ψ-functions to samples from the N(0,1)-distribution, the t2-distribution, and the χ23-distribution. We chose the significance level α=0.05. The results are shown in the following figure.

Under the N(0,1)- and the t2-distribution we make similar observations: For equal sample sizes m=n30, the simulated test size is quite close to the the specified value of α. When mn, it seems to be important that both values are rather large and do not deviate too much from each other. Otherwise, the tests may become very anti-conservative. In general, the three test statistics lead to similar results for the considered sample sizes.

Under the χ23-distribution, all tests are anti-conservative. While there seems to be some improvement when the sample sizes become larger, the estimated sizes are still rather far away from 0.05. A reason might be that the asymptotic variance we use is only a good approximation for symmetric distributions (Maronna et al., 2019, p. 37).

Based on these results, we discourage using the tests for asymmetric distributions. For symmetric distributions, the asymptotic test should only be used for large samples. In all other cases, the randomization or permutation test might be preferable.

Session Info

library(robnptests)

sessionInfo()
#> R version 4.2.2 Patched (2022-11-10 r83330)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 19.1
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] robnptests_1.1.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] codetools_0.2-19 digest_0.6.29    rbibutils_2.2.8  R6_2.5.1        
#>  [5] jsonlite_1.8.0   magrittr_2.0.3   evaluate_0.15    highr_0.9       
#>  [9] Rdpack_2.4       stringi_1.7.6    rlang_1.0.4      cli_3.3.0       
#> [13] rstudioapi_0.13  jquerylib_0.1.4  bslib_0.3.1      rmarkdown_2.19  
#> [17] tools_4.2.2      stringr_1.4.0    xfun_0.31        yaml_2.3.5      
#> [21] fastmap_1.1.0    compiler_4.2.2   htmltools_0.5.2  knitr_1.39      
#> [25] sass_0.4.1

References

Maechler, M., Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibián-Barrera, M., Verbeke, T., Koller, M., Conceicao, E.L.T., di Palma, M.A., 2022. robustbase: Basic robust statistics.
Maronna, R.A., Martin, D.R., Yohai, V.J., Salibián-Barrera, M., 2019. Robust Statistics: Theory and Methods (with R), Second edition. ed, Wiley series in probability and statistics. Wiley, Hoboken, NJ. https://doi.org/10.1002/9781119214656
Maronna, R.A., Zamar, R.H., 2002. Robust estimates of location and dispersion of high-dimensional datasets. Technometrics 44, 307–317. https://doi.org/10.1198/004017002188618509