Title: | The Directed Prediction Index for Causal Inference from Observational Data |
Version: | 2025.10 |
Date: | 2025-10-15 |
Maintainer: | Han Wu Shuang Bao <baohws@foxmail.com> |
Description: | The Directed Prediction Index ('DPI') is a quasi-causal inference (causal discovery) method for observational data designed to quantify the relative endogeneity (relative dependence) of outcome (Y) versus predictor (X) variables in regression models. By comparing the proportion of variance explained (R-squared) between the Y-as-outcome model and the X-as-outcome model while controlling for a sufficient number of possible confounders, it can suggest a plausible (admissible) direction of influence from a more exogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at https://psychbruce.github.io/DPI/. This package also provides functions for data simulation and network analysis (correlation, partial correlation, and Bayesian networks). |
License: | GPL-3 |
Encoding: | UTF-8 |
URL: | https://psychbruce.github.io/DPI/ |
BugReports: | https://github.com/psychbruce/DPI/issues |
Depends: | R (≥ 4.0.0) |
Imports: | glue, crayon, cli, ggplot2, cowplot, qgraph, bnlearn, MASS |
Suggests: | bruceR, aplot, bayestestR |
RoxygenNote: | 7.3.3 |
NeedsCompilation: | no |
Packaged: | 2025-10-16 02:17:30 UTC; Bruce |
Author: | Han Wu Shuang Bao |
Repository: | CRAN |
Date/Publication: | 2025-10-16 02:40:02 UTC |
DPI: The Directed Prediction Index for Causal Inference from Observational Data
Description
The Directed Prediction Index ('DPI') is a quasi-causal inference (causal discovery) method for observational data designed to quantify the relative endogeneity (relative dependence) of outcome (Y) versus predictor (X) variables in regression models. By comparing the proportion of variance explained (R-squared) between the Y-as-outcome model and the X-as-outcome model while controlling for a sufficient number of possible confounders, it can suggest a plausible (admissible) direction of influence from a more exogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at https://psychbruce.github.io/DPI/. This package also provides functions for data simulation and network analysis (correlation, partial correlation, and Bayesian networks).
Author(s)
Maintainer: Han Wu Shuang Bao baohws@foxmail.com (ORCID)
See Also
Useful links:
Directed acyclic graphs (DAGs) via Bayesian networks (BNs).
Description
Directed acyclic graphs (DAGs) via Bayesian networks (BNs). It uses bnlearn::boot.strength()
to estimate the strength of each edge as its empirical frequency over a set of networks learned from bootstrap samples. It computes (1) the probability of each edge (modulo its direction) and (2) the probabilities of each edge's directions conditional on the edge being present in the graph (in either direction). Stability thresholds are usually set as 0.85
for strength (i.e., an edge appearing in more than 85% of BNs bootstrap samples) and 0.50
for direction (i.e., a direction appearing in more than 50% of BNs bootstrap samples) (Briganti et al., 2023). Finally, for each chosen algorithm, it returns the stable Bayesian network as the final DAG.
Usage
BNs_dag(
data,
algorithm = c("pc.stable", "hc", "rsmax2"),
algorithm.args = list(),
n.boot = 1000,
seed = NULL,
strength = 0.85,
direction = 0.5,
node.text.size = 1.2,
edge.width.max = 1.5,
edge.label.mrg = 0.01,
file = NULL,
width = 6,
height = 4,
dpi = 500,
verbose = TRUE,
...
)
Arguments
data |
Data. |
algorithm |
Structure learning algorithms for building Bayesian networks (BNs). Should be function name(s) from the Defaults to the most common algorithms:
|
algorithm.args |
An optional list of extra arguments passed to the algorithm. |
n.boot |
Number of bootstrap samples (for learning a more "stable" network structure). Defaults to |
seed |
Random seed for replicable results. Defaults to |
strength |
Stability threshold of edge strength: the minimum proportion (probability) of BNs (among the
|
direction |
Stability threshold of edge direction: the minimum proportion (probability) of BNs (among the
|
node.text.size |
Scalar on the font size of node (variable) labels. Defaults to |
edge.width.max |
Maximum value of edge strength to scale all edge widths. Defaults to |
edge.label.mrg |
Margin of the background box around the edge label. Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot. Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
verbose |
Print information about BN algorithm and number of bootstrap samples when running the analysis. Defaults to |
... |
Arguments passed on to |
Value
Return a list (class bns.dag
) of Bayesian network results and qgraph
object.
References
Briganti, G., Scutari, M., & McNally, R. J. (2023). A tutorial on Bayesian networks for psychopathology researchers. Psychological Methods, 28(4), 947–961. doi:10.1037/met0000479
Burger, J., Isvoranu, A.-M., Lunansky, G., Haslbeck, J. M. B., Epskamp, S., Hoekstra, R. H. A., Fried, E. I., Borsboom, D., & Blanken, T. F. (2023). Reporting standards for psychological network analyses in cross-sectional data. Psychological Methods, 28(4), 806–824. doi:10.1037/met0000471
Scutari, M., & Denis, J.-B. (2021). Bayesian networks: With examples in R (2nd ed.). Chapman and Hall/CRC. doi:10.1201/9780429347436
See Also
Examples
bn = BNs_dag(airquality, seed=1)
bn
# bn$pc.stable
# bn$hc
# bn$rsmax2
## All DAG objects can be directly plotted
## or saved with print(..., file="xxx.png")
# bn$pc.stable$DAG.edge
# bn$pc.stable$DAG.strength
# bn$pc.stable$DAG.direction
# bn$pc.stable$DAG
# ...
## Not run:
print(bn, file="airquality.png")
# will save three plots with auto-modified file names:
- "airquality_BNs.DAG.01_pc.stable.png"
- "airquality_BNs.DAG.02_hc.png"
- "airquality_BNs.DAG.03_rsmax2.png"
# arrange multiple plots using aplot::plot_list()
# install.packages("aplot")
c1 = cor_net(airquality, "cor")
c2 = cor_net(airquality, "pcor")
bn = BNs_dag(airquality, seed=1)
mytheme = theme(plot.title=element_text(hjust=0.5))
p = aplot::plot_list(
plot(c1),
plot(c2),
plot(bn$pc.stable$DAG) + mytheme,
plot(bn$hc$DAG) + mytheme,
plot(bn$rsmax2$DAG) + mytheme,
design="111222
334455",
tag_levels="A"
) # return a patchwork object
ggsave(p, filename="p.png", width=12, height=8, dpi=500)
ggsave(p, filename="p.pdf", width=12, height=8)
## End(Not run)
The Directed Prediction Index (DPI).
Description
The Directed Prediction Index (DPI) is a quasi-causal inference method for cross-sectional data designed to quantify the relative endogeneity (relative dependence) of outcome (Y) vs. predictor (X) variables in regression models. By comparing the proportion of variance explained (R-squared) between the Y-as-outcome model and the X-as-outcome model while controlling for a sufficient number of possible confounders, it can suggest a plausible (admissible) direction of influence from a more exogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at https://psychbruce.github.io/DPI/.
Usage
DPI(
model,
x,
y,
data = NULL,
k.cov = 1,
n.sim = 1000,
alpha = 0.05,
bonf = FALSE,
pseudoBF = FALSE,
seed = NULL,
progress,
file = NULL,
width = 6,
height = 4,
dpi = 500
)
Arguments
model |
Model object ( |
x |
Independent (predictor) variable. |
y |
Dependent (outcome) variable. |
data |
[Optional] Defaults to |
k.cov |
Number of random covariates (simulating potential omitted variables) added to each simulation sample.
|
n.sim |
Number of simulation samples. Defaults to |
alpha |
Significance level for computing the
|
bonf |
Bonferroni correction to control for false positive rates:
|
pseudoBF |
Use normalized pseudo Bayes Factors Defaults to |
seed |
Random seed for replicable results. Defaults to |
progress |
Show progress bar. Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot. Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
Value
Return a data.frame of simulation results:
-
DPI = Direction * Significance
-
= (R2.Y - R2.X) * (1 - tanh(p.beta.xy/alpha/2))
if
pseudoBF=FALSE
(default, suggested)more conservative estimates
-
= (R2.Y - R2.X) * plogis(log(pseudo.BF.xy))
if
pseudoBF=TRUE
less conservative for insignificant X-Y relationship
-
-
delta.R2
-
R2.Y - R2.X
-
-
R2.Y
-
R^2
of regression model predicting Y using X and all other covariates
-
-
R2.X
-
R^2
of regression model predicting X using Y and all other covariates
-
-
t.beta.xy
-
t value for coefficient of X predicting Y (always equal to t value for coefficient of Y predicting X) when controlling for all other covariates
-
-
p.beta.xy
-
p value for coefficient of X predicting Y (always equal to p value for coefficient of Y predicting X) when controlling for all other covariates
-
-
df.beta.xy
residual degree of freedom (df) of
t.beta.xy
-
r.partial.xy
partial correlation (always with the same t value as
t.beta.xy
) between X and Y when controlling for all other covariates
-
sigmoid.p.xy
sigmoid p value as
1 - tanh(p.beta.xy/alpha/2)
-
pseudo.BF.xy
pseudo Bayes Factors (
BF_{10}
) computed from p valuep.beta.xy
and sample sizenobs(model)
, seep_to_bf()
See Also
Examples
# input a fitted model
model = lm(Ozone ~ ., data=airquality)
DPI(model, x="Solar.R", y="Ozone", seed=1) # DPI > 0
DPI(model, x="Wind", y="Ozone", seed=1) # DPI > 0
DPI(model, x="Solar.R", y="Wind", seed=1) # unrelated
# or input raw data, test with more random covs
DPI(data=airquality, x="Solar.R", y="Ozone",
k.cov=10, seed=1)
DPI(data=airquality, x="Wind", y="Ozone",
k.cov=10, seed=1)
DPI(data=airquality, x="Solar.R", y="Wind",
k.cov=10, seed=1)
# or use pseudo Bayes Factors for the significance score
# (less conservative for insignificant X-Y relationship)
DPI(data=airquality, x="Solar.R", y="Ozone", k.cov=10,
pseudoBF=TRUE, seed=1) # DPI > 0 (true positive)
DPI(data=airquality, x="Wind", y="Ozone", k.cov=10,
pseudoBF=TRUE, seed=1) # DPI > 0 (true positive)
DPI(data=airquality, x="Solar.R", y="Wind", k.cov=10,
pseudoBF=TRUE, seed=1) # DPI > 0 (false positive!)
DPI curve analysis across multiple random covariates.
Description
DPI curve analysis across multiple random covariates.
Usage
DPI_curve(
model,
x,
y,
data = NULL,
k.covs = 1:10,
n.sim = 1000,
alpha = 0.05,
bonf = FALSE,
pseudoBF = FALSE,
seed = NULL,
progress,
file = NULL,
width = 6,
height = 4,
dpi = 500
)
Arguments
model |
Model object ( |
x |
Independent (predictor) variable. |
y |
Dependent (outcome) variable. |
data |
[Optional] Defaults to |
k.covs |
An integer vector of number of random covariates (simulating potential omitted variables) added to each simulation sample. Defaults to |
n.sim |
Number of simulation samples. Defaults to |
alpha |
Significance level for computing the
|
bonf |
Bonferroni correction to control for false positive rates:
|
pseudoBF |
Use normalized pseudo Bayes Factors Defaults to |
seed |
Random seed for replicable results. Defaults to |
progress |
Show progress bar. Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot. Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
Value
Return a data.frame of DPI curve results.
See Also
Examples
model = lm(Ozone ~ ., data=airquality)
DPIs = DPI_curve(model, x="Solar.R", y="Ozone", seed=1)
plot(DPIs) # ggplot object
Directed acyclic graphs (DAGs) via DPI exploratory analysis (causal discovery) for all significant partial rs.
Description
Directed acyclic graphs (DAGs) via DPI exploratory analysis (causal discovery) for all significant partial rs.
Usage
DPI_dag(
data,
k.covs = 1,
n.sim = 1000,
alpha = 0.05,
bonf = FALSE,
pseudoBF = FALSE,
seed = NULL,
progress,
file = NULL,
width = 6,
height = 4,
dpi = 500
)
Arguments
data |
A dataset with at least 3 variables. |
k.covs |
An integer vector (e.g., |
n.sim |
Number of simulation samples. Defaults to |
alpha |
Significance level for computing the
|
bonf |
Bonferroni correction to control for false positive rates:
|
pseudoBF |
Use normalized pseudo Bayes Factors Defaults to |
seed |
Random seed for replicable results. Defaults to |
progress |
Show progress bar. Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot. Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
Value
Return a data.frame (class dpi.dag
) of DPI exploration results.
See Also
Examples
# partial correlation networks (undirected)
cor_net(airquality, "pcor")
# directed acyclic graphs
dpi.dag = DPI_dag(airquality, k.covs=c(1,3,5), seed=1)
print(dpi.dag, k=1) # DAG with DPI(k=1)
print(dpi.dag, k=3) # DAG with DPI(k=3)
print(dpi.dag, k=5) # DAG with DPI(k=5)
# modify ggplot attributes
gg = plot(dpi.dag, k=5, show.label=FALSE)
gg + labs(title="DAG with DPI(k=5)")
# visualize DPIs of multiple paths
ggplot(dpi.dag$DPI, aes(x=k.cov, y=DPI)) +
geom_ribbon(aes(ymin=Sim.LLCI, ymax=Sim.ULCI, fill=path),
alpha=0.1) +
geom_line(aes(color=path), linewidth=0.7) +
geom_point(aes(color=path)) +
geom_hline(yintercept=0, color="red", linetype="dashed") +
scale_y_continuous(limits=c(NA, 0.5)) +
labs(color="Directed Prediction",
fill="Directed Prediction") +
theme_classic()
[S3 methods] for DPI()
and DPI_curve()
.
Description
summary(dpi)
-
Summarize DPI results. Return a list (class
summary.dpi
) of summarized results and raw DPI data.frame. print(summary.dpi)
-
Print DPI summary.
plot(dpi)
-
Plot DPI results. Return a
ggplot
object. print(dpi)
-
Print DPI summary and plot.
plot(dpi.curve)
-
Plot DPI curve analysis results. Return a
ggplot
object.
Usage
## S3 method for class 'dpi'
summary(object, ...)
## S3 method for class 'summary.dpi'
print(x, digits = 3, ...)
## S3 method for class 'dpi'
plot(x, file = NULL, width = 6, height = 4, dpi = 500, ...)
## S3 method for class 'dpi'
print(x, digits = 3, ...)
## S3 method for class 'dpi.curve'
plot(x, file = NULL, width = 6, height = 4, dpi = 500, ...)
Arguments
object |
Object (class |
... |
Other arguments (currently not used). |
x |
Object (class |
digits |
Number of decimal places. Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot. Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
[S3 methods] for cor_net()
, BNs_dag()
, and DPI_dag()
.
Description
Transform
qgraph
intoggplot
-
plot(cor.net)
-
plot(bns.dag)
-
plot(dpi.dag)
-
Plot network results
-
print(cor.net)
-
print(bns.dag)
-
print(dpi.dag)
-
Usage
## S3 method for class 'cor.net'
plot(x, scale = 1.2, ...)
## S3 method for class 'cor.net'
print(x, scale = 1.2, file = NULL, width = 6, height = 4, dpi = 500, ...)
## S3 method for class 'bns.dag'
plot(x, algorithm, scale = 1.2, ...)
## S3 method for class 'bns.dag'
print(
x,
algorithm = names(x),
scale = 1.2,
file = NULL,
width = 6,
height = 4,
dpi = 500,
...
)
## S3 method for class 'dpi.dag'
plot(
x,
k = min(x$DPI$k.cov),
show.label = TRUE,
digits.dpi = 2,
color.dpi.insig = "#EEEEEEEE",
scale = 1.2,
...
)
## S3 method for class 'dpi.dag'
print(
x,
k = min(x$DPI$k.cov),
show.label = TRUE,
digits.dpi = 2,
color.dpi.insig = "#EEEEEEEE",
scale = 1.2,
file = NULL,
width = 6,
height = 4,
dpi = 500,
...
)
Arguments
x |
Object (class |
scale |
Scale the |
... |
Other arguments (currently not used). |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot. Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
algorithm |
[For |
k |
[For |
show.label |
[For |
digits.dpi |
[For |
color.dpi.insig |
[For |
Value
Return a ggplot
object that can be further modified and used in ggplot2::ggsave()
and cowplot::plot_grid()
.
Produce a symmetric correlation matrix from values.
Description
Produce a symmetric correlation matrix from values.
Usage
cor_matrix(...)
Arguments
... |
Correlation values to transform into the symmetric correlation matrix (by row). |
Value
Return a symmetric correlation matrix.
Examples
cor_matrix(
1.0, 0.7, 0.3,
0.7, 1.0, 0.5,
0.3, 0.5, 1.0
)
cor_matrix(
1.0, NA, NA,
0.7, 1.0, NA,
0.3, 0.5, 1.0
)
Correlation and partial correlation networks.
Description
Correlation and partial correlation networks (also called Gaussian graphical models, GGMs).
Usage
cor_net(
data,
index = c("cor", "pcor"),
show.label = TRUE,
show.insig = FALSE,
show.cutoff = FALSE,
faded = FALSE,
node.text.size = 1.2,
node.group = NULL,
node.color = NULL,
edge.color.pos = "#0571B0",
edge.color.neg = "#CA0020",
edge.color.non = "#EEEEEEEE",
edge.width.min = "sig",
edge.width.max = NULL,
edge.label.mrg = 0.01,
file = NULL,
width = 6,
height = 4,
dpi = 500,
...
)
Arguments
data |
Data. |
index |
Type of graph: |
show.label |
Show labels of correlation coefficients and their significance on edges. Defaults to |
show.insig |
Show edges with insignificant correlations (p > 0.05). Defaults to |
show.cutoff |
Show cut-off values of correlations. Defaults to |
faded |
Transparency of edges according to the effect size of correlation. Defaults to |
node.text.size |
Scalar on the font size of node (variable) labels. Defaults to |
node.group |
A list that indicates which nodes belong together, with each element of list as a vector of integers identifying the column numbers of variables that belong together. |
node.color |
A vector with a color for each element in |
edge.color.pos |
Color for (significant) positive values. Defaults to |
edge.color.neg |
Color for (significant) negative values. Defaults to |
edge.color.non |
Color for insignificant values. Defaults to |
edge.width.min |
Minimum value of edge strength to scale all edge widths. Defaults to |
edge.width.max |
Maximum value of edge strength to scale all edge widths. Defaults to |
edge.label.mrg |
Margin of the background box around the edge label. Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot. Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
... |
Arguments passed on to |
Value
Return a list (class cor.net
) of (partial) correlation results and qgraph
object.
See Also
Examples
# correlation network
cor_net(airquality)
cor_net(airquality, show.insig=TRUE)
# partial correlation network
cor_net(airquality, "pcor")
cor_net(airquality, "pcor", show.insig=TRUE)
# modify ggplot attributes
p = cor_net(airquality, "pcor")
gg = plot(p) # return a ggplot object
gg + labs(title="Partial Correlation Network")
Convert p values to approximate (pseudo) Bayes Factors (PseudoBF10).
Description
Convert p values to approximate (pseudo) Bayes Factors (PseudoBF10). This transformation has been suggested by Wagenmakers (2022).
Usage
p_to_bf(p, n, log = FALSE, label = FALSE)
Arguments
p |
p value(s). |
n |
Number of observations. |
log |
Return |
label |
Add labels (i.e., names) to returned values. Defaults to |
Value
A (named) numeric vector of pseudo Bayes Factors (\text{PseudoBF}_{10}
).
References
Wagenmakers, E.-J. (2022). Approximate objective Bayes factors from p-values and sample size: The 3p\sqrt{n}
rule. PsyArXiv.
doi:10.31234/osf.io/egydq
See Also
Examples
p_to_bf(0.05, 100)
p_to_bf(c(0.01, 0.05), 100)
p_to_bf(c(0.001, 0.01, 0.05, 0.1), 100, label=TRUE)
p_to_bf(c(0.001, 0.01, 0.05, 0.1), 1000, label=TRUE)
Simulate data from a multivariate normal distribution.
Description
Simulate data from a multivariate normal distribution.
Usage
sim_data(n, k, cor = NULL, exact = TRUE, seed = NULL)
Arguments
n |
Number of observations (cases). |
k |
Number of variables. Will be ignored if |
cor |
A correlation value or correlation matrix of the variables. Defaults to |
exact |
Ensure the sample correlation matrix to be exact as specified in |
seed |
Random seed for replicable results. Defaults to |
Value
Return a data.frame of simulated data.
See Also
Examples
d1 = sim_data(n=100, k=5, seed=1)
cor_net(d1)
d2 = sim_data(n=100, k=5, cor=0.2, seed=1)
cor_net(d2)
cor.mat = cor_matrix(
1.0, 0.7, 0.3,
0.7, 1.0, 0.5,
0.3, 0.5, 1.0
)
d3 = sim_data(n=100, cor=cor.mat, seed=1)
cor_net(d3)
Simulate experiment-like data with independent binary Xs.
Description
Simulate experiment-like data with independent binary Xs.
Usage
sim_data_exp(
n,
r.xy,
approx = TRUE,
tol = 0.01,
max.iter = 30,
verbose = FALSE,
seed = NULL
)
Arguments
n |
Number of observations (cases). |
r.xy |
A vector of expected correlations of each X (binary independent variable: 0 or 1) with Y. |
approx |
Make the sample correlation matrix approximate more to values as specified in |
tol |
Tolerance of absolute difference between specified and empirical correlations. Defaults to |
max.iter |
Maximum iterations for approximation. More iterations produce more approximate correlations, but the absolute differences will be convergent after about 30 iterations. Defaults to |
verbose |
Print information about iterations that satisfy tolerance. Defaults to |
seed |
Random seed for replicable results. Defaults to |
Value
Return a data.frame of simulated data.
See Also
Examples
data = sim_data_exp(n=1000, r.xy=c(0.5, 0.3), seed=1)
cor(data) # tol = 0.01
data = sim_data_exp(n=1000, r.xy=c(0.5, 0.3), seed=1,
verbose=TRUE)
cor(data) # print iteration information
data = sim_data_exp(n=1000, r.xy=c(0.5, 0.3), seed=1,
verbose=TRUE, tol=0.001)
cor(data) # more approximate, though not exact
data = sim_data_exp(n=1000, r.xy=c(0.5, 0.3), seed=1,
approx=FALSE)
cor(data) # far less exact