---
title: "Extensions"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{extensions}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
tidy = TRUE,
tidy.opts=list(arrow=TRUE,width.cutoff = 50),
eval=T
)
```
We present some extensions to the package. These aim to demonstrate additional functionality of the package which has not yet been subject to extensive testing.
```{r setup, echo = F}
library(mlpwr)
set.seed(1)
```
Since the Monte Carlo simulations required to generate the results can be time-consuming, we have precomputed them and included them as a data file in the package. This enables a quick overview of the methods without having to rerun the simulations. Below, we load the precomputed results using the `data()` function. We also provide the code used to generate these results later in this vignette.
```{r}
data(extensions_results)
```
# Beyond Power Analysis
Here, we demonstrate two other application scenarios, sensitivity analysis and compromise analysis.
## Sensitivity Analysis
Sensitivity analysis involves fixing the sample size, desired power, and alpha level, and determining the effect size that can be detected with the desired power. We demonstrate this functionality using a t-test.
The fixed parameters are the sample size N, the alpha level, and the goal power. We want to determine the corresponding effect size:
```{r}
N = 100
alpha = .01
goal_power = .95
```
We define our simulation function:
```{r}
simfun_sensitivity = function(esize) {
# Generate a data set
dat <- rnorm(n = N, mean = esize)
# Test the hypothesis
res <- t.test(dat)
res$p.value < alpha
}
```
We can find the corresponding effect size using:
```{r}
# Effect Size Finding
# res1 <- find.design(simfun = simfun_sensitivity,
# boundaries = c(0,1),integer=FALSE, power = goal_power,surrogate = "gpr",evaluations=8000)
res1 = extensions_results[[1]]
summary(res1)
```
This leads us to an effect size of about 0.43, as measured by Cohen's d. To confirm the correctness of this result, we check it analytically using the pwr package.
```{r}
library(pwr)
esize_correct = pwr.t.test(n=N,sig.level=alpha,power=goal_power,type="one.sample")$d
esize_correct
```
The analytical result is very close to the result obtained with mlpwr. With a small simulation study, we further confirm that the analytical result is correct.
```{r}
a = replicate(10000,simfun_sensitivity(esize_correct))
mean(a)
```
## Compromise Analysis
In compromise analysis, the sample size and effect size are fixed in the DGF, and the goal is to find a decision criterion that optimizes the desired ratio of alpha and beta errors. Again, we demonstrate this functionality using a t-test scenario.
The fixed parameters are the sample size N, the effect size (given by Cohen's d) and the desired ratio of the alpha and beta errors:
```{r}
N = 100
esize = .3
desired_ratio = 1
```
We now define our simulation function. Its argument is a given threshold for the test statistic. This function then generates two data sets - one under the null, one under the alternative hypothesis - and evaluates whether the use of this threshold leads to a Type I error (for the data set generated under the null hypothesis) or a Type II error (for the data set generated under the alternative hypothesis).
```{r}
simfun_compromise = function(crit) {
# Generate a data set
dat <- rnorm(n = N, mean = 0)
# Test the hypothesis
res <- t.test(dat)
a = res$statistic>crit
# Generate a data set
dat <- rnorm(n = N, mean = esize)
# Test the hypothesis
res <- t.test(dat)
b = res$statistic
as.data.frame() |>
gather() # format
# Test the hypothesis
res <- aov(value ~ key, data = dat) # perform ANOVA
summary(res)[[1]][1, 5] < 0.01 # extract significance
}
```
We assume that there are different costs for different clusters. We assume that the costs per cluster is cheaper if there is only a small number of clusters. Adding additional clusters gets increasingly expensive (for example, because assessing a large number of groups is more costly).
Specifically, we assume that the first five clusters cost 10, the next five cost 15, the next five cost 20, and so on.
```{r}
prices = c(rep(10,5),rep(15,5),rep(20,5),rep(25,5),rep(30,5),rep(35,5),rep(40,5))
costfun = function(n, n.groups) {
5 * n + n.groups * sum(prices[1:n.groups])
}
```
We can now find a good design using:
```{r}
# res3 <- find.design(
# simfun = simfun_anova,
# costfun = costfun,
# boundaries = list(n = c(10, 150), n.groups = c(5, 30)),
# power = .95
# )
res3 = extensions_results[[3]]
summary(res3)
```
The proposed solution contains 10 groups and 135 participants per group.
# 3-Dimensional Designs
As an example for a study design with more than 2 dimensions, we consider a multilevel design with the three dimensions:
* Number of schools
* Number of students per school
* Number of observations per student
```{r}
library(lme4)
library(lmerTest)
```
```{r}
simfun_3d <- function(n.per.school,n.schools, n.obs) {
# generate data
school = rep(1:n.schools,each=n.per.school*n.obs)
student = rep(1:(n.schools*n.per.school),each=n.obs)
pred = factor(rep(c("old","new"),n.per.school*n.schools*n.obs,each=n.obs),levels=c("old","new"))
dat = data.frame(school = school, student = student, pred = pred)
params <- list(theta = c(.5,0,.5,.5), beta = c(0,1),sigma = 1.5)
names(params$theta) = c("school.(Intercept)","school.prednew.(Intercept)","school.prednew","student.(Intercept)")
names(params$beta) = c("(Intercept)","prednew")
dat$y <- simulate.formula(~pred + (1 + pred | school) + (1 | student), newdata = dat, newparams = params)[[1]]
# test hypothesis
mod <- lmer(y ~ pred + (1 + pred | school) + (1 | student), data = dat)
pvalue <- summary(mod)[["coefficients"]][2,"Pr(>|t|)"]
pvalue < .01
}
costfun_3d <- function(n.per.school, n.schools,n.obs) {
100 * n.per.school + 200 * n.schools + .1 * n.obs * n.per.school * n.schools
}
```
```{r}
# res4 = find.design(simfun = simfun_3d, costfun = costfun_3d, boundaries = list(n.per.school = c(5, 25), n.schools = c(10, 30), n.obs = c(3,10)), power = .95)
res4 = extensions_results[[4]]
summary(res4)
```
In this case, going with the minimum number of observations per student seems to be the ideal choice.