This vignette describes the use of the functions
SA_step1
SA_step2
and SA_step3
.
The goal of these functions are to conduct a sensitivity analysis with
phantom variables. Phantom variables are variables that you did not
observe, but that you can specify in a structural equation model by
specifying the variance and covariances of the phantom variable with
other variables. Creating a single phantom variable is fairly
straightforward in the R package lavaan
. There are
scenarios where a researcher may wish to see how the results from an
analysis might differ if they had observed additional variables. For
instance, a researcher may want to assess whether a certain regression
coefficient would be statistically significant if they had controlled
for another variable in their analysis. Because the researcher does not
know how this phantom variable covaries with the variables they did
observe, it would be useful to try different values for the covariances
in a sensitivity analysis. The phantSEM
package is built
for researchers to test different values for the covariances between
their phantom variables and observed variables.
In this vignette, a full example is illustrated using the functions
of phantSEM
.
The example that we will use to illustrate the use of
phantSEM
is one in which we have fit a cross-sectional
mediation model and would like to see if our estimates of the mediated
effect would hold if we had collected an additional wave of data. To
start, we will load the packages we need.
library(phantSEM)
library(lavaan)
#> Warning: package 'lavaan' was built under R version 4.3.1
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.3.1
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.3.1
#> Warning: package 'ggplot2' was built under R version 4.3.1
#> Warning: package 'tibble' was built under R version 4.3.1
#> Warning: package 'readr' was built under R version 4.3.1
#> Warning: package 'purrr' was built under R version 4.3.1
#> Warning: package 'dplyr' was built under R version 4.3.1
#> Warning: package 'stringr' was built under R version 4.3.1
#> Warning: package 'forcats' was built under R version 4.3.1
#> Warning: package 'lubridate' was built under R version 4.3.1
The example we use is based on a memory study described by MacKinnon et al., (2018). Participants in the study were given a list of words and were randomly assigned to either make mental images of the words, or repeat the words. The individuals were then asked to recall as many words from the list as they could. Therefore, the predictor X was the condition (1=imagery, 0=repetition), the mediator M was the extent to which they used mental imagery and the outcome Y was the number of words they recalled.
We first define a 3 x 3 cross-sectional covariance matrix
memory_CrossSectional <- matrix(c(
0.2509804, 0.9511983, 0.4257081,
0.9511983, 8.8661765, 2.6609477,
0.4257081, 2.6609477, 10.8592048
), nrow = 3, byrow = T)
colnames(memory_CrossSectional) <- c("X", "M2", "Y2")
Next, we fit a single-mediator model to the covariance matrix. The model is defined using lavaan syntax.
Observed_Model <- "
M2 ~ X
Y2 ~ M2+X
"
fit_obs <- sem(model = Observed_Model, sample.cov = memory_CrossSectional, sample.nobs = 138)
We now define the model with phantom variables for M and Y using lavaan syntax. It is very important to add labels to the parameters that will be of interest for the sensitivity analysis. For this example, the a-path, b-path and c’-path may be of interest.
we now have the necessary parts for the first step of the sensitivity analysis.
SA_step1
The purpose of this function is to determine which variables are the phantom variables and provide the covariance parameters involving the phantom variables so that the user can enter them into the next step.
Step1 <- SA_step1(
lavoutput = fit_obs,
mod_obs = Observed_Model,
mod_phant = Phantom_Model
)
#> Here are the phantom covariance matrix parameters (copy and paste and add values/names for step2):
#>
#>
#> phantom_assignment <-("CovM1M2"= ,
#> "CovM1X"= ,
#> "CovM1Y2"= ,
#> "CovY1M1"= ,
#> "CovY1M2"= ,
#> "CovY1X"= ,
#> "CovY1Y2"= ,
#> "VarM1"= ,
#> "VarY1"= )
#>
#> Choose the names of the phantom covariances that you want to fix to single values and put in a vector. These will be used for the fixed_names argument in the SA_step2 function. The phantom covariance parameters that you want to vary should be put in a list and used as the test_names argument.
#> Here are the observed covariance matrix parameters:
#> CovXM2,CovXY2,CovY2M2
#> Choose which values you want to use for your fixed parameters and put their names in a vector (fixed_values). Make sure the order is the same for both vectors.
This function prints the names of the phantom covariance parameters which will be used in the next function. Those parameters are: “CovM1M2”,“CovM1X”,“CovM1Y2”,“CovY1M1”,“CovY1M2”,“CovY1X”,“CovY1Y2”,“VarM1”,“VarY1”. The function also provides the names of the observed covariance parameters for the user’s reference.
The second step of the sensitivity analysis is to create covariance matrices that the phantom model will be fit to. To do this, the user must provide information about how the phantom variables covary with the observed variables. For this example, the user may wish to test different values for “CovM1M2”, “CovY1Y2”, “CovY1M2” and “CovM1Y2” and fix the other phantom covariances to single values. The easiest way to create the arguments for SA_step2 is to copy and paste the list of parameter names from SA_step1 and remove the names of the parameters they want to vary.
Now that the phantom parameters that will be fixed to single values have been put into a vector, the user must define a vector of the same length that provides the values that these parameters will be fixed at. For this example, random assignment would suggest that CovM1X and CovY1X would be zero and the variances for the phantom variables can be set at 1. That leaves CovY1M1, which can be fixed to be equal to CovY2M2
The next arguments that the user has to define are the parameters that will be varied. This is done by creating a list of the names of the parameters. If we wanted to vary all four of these parameters, we could write this argument as follows:
However, suppose that we would like to allow CovM1M2 and CovY1Y2 to be equal and CovY1M2 and CovM1Y2 to be equal. we can choose that option by writing the test_names list as follows:
The next argument tells the function a sequence of values to use for
the parameters. The user could provide a vector of values, or a sequence
using seq()
.
Now we are ready to use SA_step2
# Step2 <- SA_step2(fixed_names=fixed_names,
# fixed_values=fixed_values,
# test_names=test_names,
# test_values=test_values,
# step1=Step1)
phantom_assignment <- list(
"CovM1X" = 0,
"CovY1M1" = "CovY2M2",
"CovY1X" = 0,
"VarM1" = 1,
"VarY1" = 1,
"CovM1M2" = seq(0, .6, .1),
"CovY1Y2" = "CovM1M2",
"CovY1M2" = seq(-.6, .6, .1),
"CovM1Y2" = "CovY1M2"
)
Step2 <- SA_step2(
phantom_assignment = phantom_assignment,
step1 = Step1
)
SA_step2
creates a covariance matrix for each
combination of the parameter test values provided. The function returns
a list of five objects that are then passed to SA_step3
,
they are the lavaan syntax for the phantom variable model (“mod_phant”),
a vector of variances (“var_phant”), a data frame of the combinations of
phantom covariances (“combos”), a list of correlation matrices
(“correlation_matrix_list”) and a list of covariance matrices
(“covariance_matrix_list”).
names(Step2)
#> [1] "mod_phant" "var_phant"
#> [3] "combos" "correlation_matrix_list"
#> [5] "covariance_matrix_list"
The third function is SA_step3
, which is where the
sensitivity analysis is run. This function can take a great amount of
time depending on how many different phantom parameter combinations the
user has entered. For this function, there are only two arguments. The
first is for the output of SA_step2
and the second is the
sample size from the observed data.
The function returns a list of four objects. The first object is a list of the lavaan output from each model, the second object is the list of covariance matrices used for each model, the third object is a list of correlation matrices, and the fourth object is the data frame of phantom covariance combinations.
At this point, the sensitivity analysis is complete, but the results
are not in the most user-friendly form. The package contains a function
to help to organize the results so that they can be examined or plotted.
This function is called ghost_par_ests
and it is used to
pull out parameter estimatesfor a particular parameter (which must be
named) from each phantom covariance combination. It has three arguments:
step3 is the results from SA_step3
, parameter_label is the
label used in the lavaan code to identify the parameter of interest, and
remove_NA
is a logical object indicating whether the
results returned by the function should include rows in which the model
did not produce estimates.
Let’s see what the b-estimates would be for our example. The output from this function is a data frame that has the values of the phantom covariances that were varied as well as the lavaan output for the chosen parameter.
b_ests <- ghost_par_ests(
step3 = Step3,
parameter_label = "b",
remove_NA = FALSE
)
a_ests <- ghost_par_ests(
step3 = Step3,
parameter_label = "a",
remove_NA = FALSE
)
b_ests$aest <- a_ests$est
b_ests$abest <- b_ests$est * b_ests$aest
b_ests <- cbind(Step2[["combos"]], b_ests)
b_ests$`CovM1M2,CovY1Y2` <- round(b_ests$`CovM1M2,CovY1Y2`, 2)
b_ests$`CovY1M2,CovM1Y2` <- round(b_ests$`CovY1M2,CovM1Y2`, 2)
head(b_ests)
#> CovM1M2,CovY1Y2 CovY1M2,CovM1Y2 Var.3 rep lhs op rhs label
#> reptemp 0.0 -0.6 -0.3 1 Y2 ~ M2 b
#> reptemp.1 0.1 -0.6 -0.3 2 <NA> <NA> <NA> <NA>
#> reptemp.2 0.2 -0.6 -0.3 3 <NA> <NA> <NA> <NA>
#> reptemp.3 0.3 -0.6 -0.3 4 <NA> <NA> <NA> <NA>
#> reptemp.4 0.4 -0.6 -0.3 5 <NA> <NA> <NA> <NA>
#> reptemp.5 0.5 -0.6 -0.3 6 <NA> <NA> <NA> <NA>
#> est se z pvalue ci.lower ci.upper aest
#> reptemp 1.146209 0.1187104 9.655506 0 0.9135409 1.378877 3.789931
#> reptemp.1 NA NA NA NA NA NA NA
#> reptemp.2 NA NA NA NA NA NA NA
#> reptemp.3 NA NA NA NA NA NA NA
#> reptemp.4 NA NA NA NA NA NA NA
#> reptemp.5 NA NA NA NA NA NA NA
#> abest
#> reptemp 4.344053
#> reptemp.1 NA
#> reptemp.2 NA
#> reptemp.3 NA
#> reptemp.4 NA
#> reptemp.5 NA
We can then plot the results using ggplot2
(plotting
function to be developed later).
library(ggplot2)
ggplot(b_ests, aes(x = `CovM1M2,CovY1Y2`, y = est)) +
geom_point(size = 2) +
scale_shape_manual() + # scale_fill_manual(values = c("black", "lightgray")) +
scale_color_manual(values = c("grey", "black")) +
facet_grid(~ b_ests$`CovY1M2,CovM1Y2`) +
theme_bw() +
scale_fill_grey() +
theme(
axis.title.y = element_text(size = 18),
axis.text.x = element_text(size = 12),
axis.title.x = element_text(size = 20),
axis.text = element_text(size = 15),
strip.text.y = element_text(size = 15),
strip.text.x = element_text(size = 20),
legend.text = element_text(size = 15),
legend.title = element_text(size = 18),
legend.position = "bottom",
panel.spacing.x = unit(2, "mm")
)
Another option is to use the lookup table that is part of the
package. The function SA_lookup
can be used to do this.
Note that for this version, the SA_lookup function only provides
standardized estimates as it uses the correlation matrix only.
cov2cor(memory_CrossSectional)
#> X M2 Y2
#> [1,] 1.0000000 0.6376509 0.2578654
#> [2,] 0.6376509 1.0000000 0.2711872
#> [3,] 0.2578654 0.2711872 1.0000000
lookup_results <- SA_lookup(CorXM = .64, CorXY = .26, CorMY = .27)
lookup_results_plot <- lookup_results %>% filter(round(corr, 1) == .3 & (stabr > -.1 & stabr < .7))
ggplot(lookup_results_plot, aes(x = stabr, y = est_ab)) +
geom_point(size = 2) +
scale_shape_manual() + # scale_fill_manual(values = c("black", "lightgray")) +
scale_color_manual(values = c("grey", "black")) +
facet_grid(~ lookup_results_plot$clr) +
theme_bw() +
scale_fill_grey() +
theme(
axis.title.y = element_text(size = 18),
axis.text.x = element_text(size = 12),
axis.title.x = element_text(size = 20),
axis.text = element_text(size = 15),
strip.text.y = element_text(size = 15),
strip.text.x = element_text(size = 20),
legend.text = element_text(size = 15),
legend.title = element_text(size = 18),
legend.position = "bottom",
panel.spacing.x = unit(2, "mm")
)
Comparing this result to the earlier result, the pattern of the results is similar, but the scale of the y-axis is different. This is because these results are standardized.