ergm
ergm
version 4.8.1 (2025-01-21)This vignette provides an introduction to statistical modeling of
network data with Exponential family Random Graph Models
(ERGMs) using ergm
package. It is based on the
ergm
tutorial used in the statnet
workshops,
but covers a subset of that material.
The complete tutorial can be found on the statnet
workshops
page.
A more complete overview of the advanced functionality available in
the ergm
package can be found in Krivitsky et al. (2023).
If you are reading this, you have probably already installed the
ergm
package. But in case you need to do that:
Set a seed for simulations – this is not necessary, but it ensures that you will get the same results as this vignette (if you execute the same commands in the same order).
This is a very brief overview of the modeling framework, as
the primary purpose of this tutorial is to show how to implement
statistical analysis of network data with ERGMs using the
ergm
package. For more detail (and to really understand
ERGMs) please see further reading at the
end of this tutorial.
Exponential-family random graph models (ERGMs) are a general class of models based on exponential-family theory for specifying the tie probability distribution for a set of random graphs or networks. Within this framework, one can—among other tasks:
Define a model for a network that includes covariates representing features like nodal attribute homophily, mutuality, triad effects, and a wide range of other structural features of interest;
Obtain maximum-likehood estimates for the parameters of the specified model for a given data set;
Test the statistical significance of individual coefficients, assess models for convergence and goodness-of-fit and perform various types of model comparison; and
Simulate new networks from the underlying probability distribution implied by the fitted model.
ERGMs are a class of models, like linear regression or GLMs. The general form of the model specifies the probability of the entire network (on the left hand side), as a function of terms that represent network features we hypothesize may occur more or less likely than expected by chance (on the right hand side). The general form of the model can be written as:
\[ P(Y=y)=\frac{\exp(\theta'g(y))}{k(\theta)} \]
where
\(Y\) is the random variable for the state of the network (with realization \(y\)),
\(g(y)\) is a vector of model statistics (“ERGM terms”) for network \(y\),
\(\theta\) is the vector of coefficients for those statistics, and
\(k(\theta)\) represents the quantity in the numerator summed over all possible networks (typically constrained to be all networks with the same node set as \(y\)).
If you’re not familiar with the compact notation here, note that the numerator represents a formula that is linear in the log form:
\[ \log({\exp(\theta'g(y))}) = \theta_1g_1(y) + \theta_2g_2(y)+ ... + \theta_pg_p(y) \] where \(p\) is the number of terms in the model. From this one can more easily observe the analogy to a traditional statistical model: the coefficients \(\theta\) represent the size and direction of the effects of the covariates \(g(y)\) on the overall probability of the network.
The statistics \(g(y)\) can be thought of as the “covariates” in the model. In the network modeling context, these represent network features like density, homophily, triads, etc. In one sense, they are like covariates you might use in other statistical models. But they are different in one important respect: these \(g(y)\) statistics are functions of the network itself – each is defined by the frequency of a specific configuration of dyads observed in the network – so they are not measured by a question you include in a survey (e.g., the income of a node), but instead need to be computed on the specific network you have, after you have collected the data.
As a result, every term in an ERGM must have an associated algorithm
for computing its value for your network. The ergm
package
in statnet
includes about 150 term-computing algorithms. We
will explore some of these terms in this tutorial. You can get the list
of all available terms, and the syntax for using them, by typing:
and you can look up help for a specific term, say,
edges
, by typing:
You can also search for terms with keywords:
## Found 0 matching ergm terms:
For more information, see the vignette on ergm
terms:
One key distinction in model terms is worth keeping in mind: terms are either dyad independent or dyad dependent.
Dyad independent terms (like nodal homophily terms) imply no dependence between dyads—the presence or absence of a tie may depend on nodal attributes, but not on the state of other ties.
Dyad dependent terms (like degree terms, or triad terms), by contrast, imply dependence between dyads. Such terms have very different effects, and much of what is different about network models comes from these terms. They introduce complex cascading effects that can often lead to counter-intuitive and highly non-linear outcomes. In addition, a model with dyad dependent terms requires a different estimation algorithm, so when we use them below you will see some different components in the output.
An overview and discussion of many of these terms can be found in Morris, Handcock, and Hunter (2008).
ergm
termsThere is a statnet
package — ergm.userterms
— that facilitates the writing of new ergm
terms. The
package is available on GitHub, and
installing it will include the tutorial (ergmuserterms.pdf). The
tutorial can also be found in Hunter, Goodreau,
and Handcock (2013), and some introductory slides and
installation instructions from the workshop we teach on coding
ergm
terms can be found here. For the most recent API
available for implementing terms, see the Terms API vignette.
Note that writing up new ergm
terms requires some
knowledge of C and the ability to build R from source. While the latter
is covered in the tutorial, the many environments for building R and the
rapid changes in these environments make these instructions obsolete
quickly.
The ERGM expression for the probability of the entire graph shown above can be re-expressed in terms of the conditional log-odds of a single tie between two actors:
\[ \operatorname{logit}{(Y_{ij}=1|y^{c}_{ij})=\theta'\delta(y_{ij})} \]
where
\(Y_{ij}\) is the random variable for the state of the actor pair \(i,j\) (with realization \(y_{ij}\)), and
\(y^{c}_{ij}\) signifies the complement of \(y_{ij}\), i.e. all dyads in the network other than \(y_{ij}\).
\(\delta(y_{ij})\) is a vector of the “change statistics” for each model term. The change statistic records how the \(g(y)\) term changes if the \(y_{ij}\) tie is toggled on or off. So:
\[ \delta(y_{ij}) = g(y^{+}_{ij})-g(y^{-}_{ij}) \]
where
So \(\delta(y_{ij})\) equals the value of \(g(y)\) when \(y_{ij}=1\) minus the value of \(g(y)\) when \(y_{ij}=0\), but all other dyads are as in \(y\).
This expression shows that the coefficient \(\theta\) can be interpreted as that term’s contribution to the log-odds of an individual tie, conditional on all other dyads remaining the same. The coefficient for each term in the model is multiplied by the number of configurations that tie will create (or remove) for that specific term.
summary
and ergm
functions, and
supporting functionsWe’ll start by running some simple models to demonstrate the most commonly used functions for ERG modeling.
The syntax for specifying a model in the ergm
package
follows R’s formula convention:
\[ my.network \sim my.vector.of.model.terms \]
This syntax is used for both the summary
and
ergm
functions. The summary
function simply
returns the numerical values of the network statistics in the model. The
ergm
function estimates the model with those
statistics.
It is good practice to always run a summmary
command on
a model before fitting it with ergm
. This is the ERGM
equivalent of performing some descriptive analysis on your covariates.
This can help you make sure you understand what the term represents, and
it can help to flag potential problems that will lead to poor modeling
results.
Network data can come in many different forms – as adjacency
matrices, edgelists and data frames. For use with ergm
these need to be converted into a network
class object. The
network
package provides the utilities for this, and more
information can be found in the examples and vignette there.
Here, we will use some of the network objects included with the
ergm
package.
We’ll start with Padgett’s data on Renaissance Florentine families for our first example. As with all data analysis, we start by looking at our data using graphical and numerical descriptives.
data(florentine) # loads flomarriage and flobusiness data
flomarriage # Look at the flomarriage network properties (uses `network`), esp. the vertex attributes
## Network attributes:
## vertices = 16
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 20
## missing edges= 0
## non-missing edges= 20
##
## Vertex attribute names:
## priorates totalties vertex.names wealth
##
## No edge attributes
plot(flomarriage,
main="Florentine Marriage",
cex.main=0.8,
label = network.vertex.names(flomarriage)) # Plot the network
wealth <- flomarriage %v% 'wealth' # %v% references vertex attributes
wealth
## [1] 10 36 55 44 20 32 8 42 103 48 49 3 27 10 146 48
plot(flomarriage,
vertex.cex=wealth/25,
main="Florentine marriage by wealth", cex.main=0.8) # Plot the network with vertex size proportional to wealth
We begin with a simple model, containing only one term that
represents the total number of edges in the network, \(\sum{y_{ij}}\). The name of this
ergm
term is edges
, and when included in an
ERGM its coefficient controls the overall density of the network.
## edges
## 20
## Starting maximum pseudolikelihood estimation (MPLE):
## Obtaining the responsible dyads.
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Evaluating log-likelihood at the estimate.
## Call:
## ergm(formula = flomarriage ~ edges)
##
## Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -1.6094 0.2449 0 -6.571 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 166.4 on 120 degrees of freedom
## Residual Deviance: 108.1 on 119 degrees of freedom
##
## AIC: 110.1 BIC: 112.9 (Smaller is better. MC Std. Err. = 0)
This simple model specifies a single homogeneous probability for all
ties, which is captured by the coefficient of the edges
term. How should we interpret this coefficient? The easiest way is to
return to the logit form of the ERGM. The log-odds that a tie is present
is \[\begin{align*}
\operatorname{logit}(p(y)) &= \theta \times \delta(g(y)) \\
& = -1.61 \times \mbox{change in the number of ties}\\
& = -1.61 \times 1
\end{align*}\] for every tie, since the addition of any tie to
the network always increases the total number of ties by 1.
The corresponding probability is obtained by taking the expit, or inverse logit, of \(\theta\): \[\begin{align*} & = \exp(-1.61)/ (1+\exp(-1.61))\\ & = 0.17 \end{align*}\] This probability corresponds to the density we observe in the flomarriage network: there are \(20\) ties and \(\binom{16}{2} = (16 \times 15)/2 = 120\) dyads, so the probability of a tie is \(20/120 = 0.17\).
Let’s add a term often thought to be a measure of “clustering”: the
number of completed triangles in the network, or \(\sum{y_{ij}y_{ik}y_{jk}} \div 3\).
The name for this ergm
term is triangle
.
This is an example of a dyad dependent term: The status of any
triangle containing dyad \(y_{ij}\)
depends on the status of dyads of the form \(y_{ik}\) and \(y_{jk}\). This means that any model
containing the ergm
term triangle
has the
property that dyads are not probabilistically independent of one
another. As a result, the estimation algorithm automatically changes to
MCMC, and because this is a form of stochastic estimation your results
may differ slightly.
## edges triangle
## 20 3
## Call:
## ergm(formula = flomarriage ~ edges + triangle)
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -1.6525 0.3502 0 -4.718 <1e-04 ***
## triangle 0.1676 0.5766 0 0.291 0.771
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 166.4 on 120 degrees of freedom
## Residual Deviance: 108.1 on 118 degrees of freedom
##
## AIC: 112.1 BIC: 117.7 (Smaller is better. MC Std. Err. = 0.01244)
Now, how should we interpret coefficients?
The conditional log-odds of two actors having a tie, keeping the rest of the network fixed, is \[ -1.65 \times\mbox{change in the number of ties} + 0.17 \times\mbox{change in number of triangles.} \]
For a tie that will create no triangles, the conditional log-odds is: \(-1.65\).
if one triangle: \(-1.65 + 0.17 = -1.48\)
if two triangles: \(-1.65 + 2 \times0.17 = -1.32\)
the corresponding probabilities are 0.16, 0.18, and 0.20.
Let’s take a closer look at the ergm object that the function outputs:
## [1] "ergm"
## [1] "coefficients" "sample" "iterations"
## [4] "MCMCtheta" "loglikelihood" "gradient"
## [7] "hessian" "covar" "failure"
## [10] "newnetwork" "coef.init" "est.cov"
## [13] "coef.hist" "stats.hist" "steplen.hist"
## [16] "control" "etamap" "MCMCflag"
## [19] "nw.stats" "call" "network"
## [22] "ergm_version" "info" "MPLE_is_MLE"
## [25] "drop" "offset" "estimable"
## [28] "formula" "reference" "constraints"
## [31] "obs.constraints" "estimate" "estimate.desc"
## [34] "null.lik" "mle.lik"
## edges triangle
## -1.6524648 0.1676376
We saw earlier that wealth appeared to be associated with higher
degree in this network. We can use ergm
to test this.
Wealth is a nodal covariate, so we use the ergm
term
nodecov
.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 17.50 39.00 42.56 48.25 146.00
# plot(flomarriage,
# vertex.cex=wealth/25,
# main="Florentine marriage by wealth",
# cex.main=0.8) # network plot with vertex size proportional to wealth
summary(flomarriage~edges+nodecov('wealth')) # observed statistics for the model
## edges nodecov.wealth
## 20 2168
## Starting maximum pseudolikelihood estimation (MPLE):
## Obtaining the responsible dyads.
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Evaluating log-likelihood at the estimate.
## Call:
## ergm(formula = flomarriage ~ edges + nodecov("wealth"))
##
## Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -2.594929 0.536056 0 -4.841 <1e-04 ***
## nodecov.wealth 0.010546 0.004674 0 2.256 0.0241 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 166.4 on 120 degrees of freedom
## Residual Deviance: 103.1 on 118 degrees of freedom
##
## AIC: 107.1 BIC: 112.7 (Smaller is better. MC Std. Err. = 0)
And yes, there is a significant positive wealth effect on the probability of a tie.
Question: What does the value of the nodecov statistic represent?
How do we interpret the coefficients here? Note that the wealth effect operates on both nodes in a dyad. The conditional log-odds of a tie between two actors is: \[ -2.59\times\mbox{change in the number of ties} + 0.01\times\mbox{the wealth of node 1} + 0.01\times\mbox{the wealth of node 2} \] or \[ -2.59\times\mbox{change in the number of ties} + 0.01\times\mbox{the sum of the wealth of the two nodes}. \]
This model specification does not include a term for homophily by
wealth, i.e., a term accounting for similarity in wealth of the two end
nodes of a potential tie. It just specifies a relation between wealth
and mean degree. To specify homophily on wealth, you could use the
ergm
term absdiff
.
Let’s try a larger network, a simulated mutual friendship network
based on one of the schools from the AddHealth study. Here, we’ll
examine the homophily in friendships by grade and race. Both are
discrete attributes so we use the ergm
term
nodematch
.
## Network attributes:
## vertices = 205
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 203
## missing edges= 0
## non-missing edges= 203
##
## Vertex attribute names:
## Grade Race Sex
##
## No edge attributes
fauxmodel.01 <- ergm(mesa ~edges +
nodefactor('Grade') + nodematch('Grade',diff=T) +
nodefactor('Race') + nodematch('Race',diff=T))
## Observed statistic(s) nodematch.Race.Black and nodematch.Race.Other are at their smallest attainable values. Their coefficients will be fixed at -Inf.
## Starting maximum pseudolikelihood estimation (MPLE):
## Obtaining the responsible dyads.
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Evaluating log-likelihood at the estimate.
## Call:
## ergm(formula = mesa ~ edges + nodefactor("Grade") + nodematch("Grade",
## diff = T) + nodefactor("Race") + nodematch("Race", diff = T))
##
## Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -8.0538 1.2561 0 -6.412 < 1e-04 ***
## nodefactor.Grade.8 1.5201 0.6858 0 2.216 0.026663 *
## nodefactor.Grade.9 2.5284 0.6493 0 3.894 < 1e-04 ***
## nodefactor.Grade.10 2.8652 0.6512 0 4.400 < 1e-04 ***
## nodefactor.Grade.11 2.6291 0.6563 0 4.006 < 1e-04 ***
## nodefactor.Grade.12 3.4629 0.6566 0 5.274 < 1e-04 ***
## nodematch.Grade.7 7.4662 1.1730 0 6.365 < 1e-04 ***
## nodematch.Grade.8 4.2882 0.7150 0 5.997 < 1e-04 ***
## nodematch.Grade.9 2.0371 0.5538 0 3.678 0.000235 ***
## nodematch.Grade.10 1.2489 0.6233 0 2.004 0.045111 *
## nodematch.Grade.11 2.4521 0.6124 0 4.004 < 1e-04 ***
## nodematch.Grade.12 1.2987 0.6981 0 1.860 0.062824 .
## nodefactor.Race.Hisp -1.6659 0.2963 0 -5.622 < 1e-04 ***
## nodefactor.Race.NatAm -1.4725 0.2869 0 -5.132 < 1e-04 ***
## nodefactor.Race.Other -2.9618 1.0372 0 -2.856 0.004296 **
## nodefactor.Race.White -0.8488 0.2958 0 -2.869 0.004112 **
## nodematch.Race.Black -Inf 0.0000 0 -Inf < 1e-04 ***
## nodematch.Race.Hisp 0.6912 0.3451 0 2.003 0.045153 *
## nodematch.Race.NatAm 1.2482 0.3550 0 3.517 0.000437 ***
## nodematch.Race.Other -Inf 0.0000 0 -Inf < 1e-04 ***
## nodematch.Race.White 0.3140 0.6405 0 0.490 0.623947
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 28987 on 20910 degrees of freedom
## Residual Deviance: 1798 on 20889 degrees of freedom
##
## AIC: 1836 BIC: 1987 (Smaller is better. MC Std. Err. = 0)
##
## Warning: The following terms have infinite coefficient estimates:
## nodematch.Race.Black nodematch.Race.Other
Note that two of the coefficients are estimated as -Inf (the nodematch coefficients for race Black and Other). Why is this?
##
## Black Hisp NatAm Other White
## 6 109 68 4 18
## Black Hisp NatAm Other White
## Black 0 8 13 0 5
## Hisp 8 53 41 1 22
## NatAm 13 41 46 0 10
## Other 0 1 0 0 0
## White 5 22 10 0 4
## Note: Marginal totals can be misleading for undirected mixing matrices.
The problem is that there are very few students in the Black and Other race categories, and these few students form no within-group ties. The empty cells are what produce the -Inf estimates.
Note that we would have caught this earlier if we had looked at the \(g(y)\) stats at the beginning:
summary(mesa ~edges +
nodefactor('Grade') + nodematch('Grade',diff=T) +
nodefactor('Race') + nodematch('Race',diff=T))
## edges nodefactor.Grade.8 nodefactor.Grade.9
## 203 75 65
## nodefactor.Grade.10 nodefactor.Grade.11 nodefactor.Grade.12
## 36 49 28
## nodematch.Grade.7 nodematch.Grade.8 nodematch.Grade.9
## 75 33 23
## nodematch.Grade.10 nodematch.Grade.11 nodematch.Grade.12
## 9 17 6
## nodefactor.Race.Hisp nodefactor.Race.NatAm nodefactor.Race.Other
## 178 156 1
## nodefactor.Race.White nodematch.Race.Black nodematch.Race.Hisp
## 45 0 53
## nodematch.Race.NatAm nodematch.Race.Other nodematch.Race.White
## 46 0 4
Moral: It’s important to check the descriptive statistics of a model in the observed network before fitting the model.
See also the ergm
terms nodemix
and
mm
for fitting mixing patterns other than homophily on
discrete nodal attributes.
Let’s try a model for a directed network, and examine the tendency
for ties to be reciprocated (“mutuality”). The ergm
term
for this is mutual
. We’ll fit this model to the third wave
of the classic Sampson Monastery data, and we’ll start by taking a look
at the network.
## [1] "faux.mesa.high" "fauxmodel.01" "flobusiness" "flomarriage"
## [5] "flomodel.01" "flomodel.02" "flomodel.03" "mesa"
## [9] "samplk1" "samplk2" "samplk3" "terms"
## [13] "wealth"
## Network attributes:
## vertices = 18
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 56
## missing edges= 0
## non-missing edges= 56
##
## Vertex attribute names:
## cloisterville group vertex.names
##
## No edge attributes
## edges mutual
## 56 15
The plot now shows the direction of a tie, and the \(g(y)\) statistics for this model in this network are 56 total ties and 15 mutual dyads. This means 30 of the 56 ties are reciprocated, i.e., they are part of dyads in which both directional ties are present.
## Call:
## ergm(formula = samplk3 ~ edges + mutual)
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -2.1259 0.2133 0 -9.968 <1e-04 ***
## mutual 2.2540 0.4886 0 4.613 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 424.2 on 306 degrees of freedom
## Residual Deviance: 267.6 on 304 degrees of freedom
##
## AIC: 271.6 BIC: 279 (Smaller is better. MC Std. Err. = 0.3466)
There is a strong and significant mutuality effect. The coefficients for the edges and mutual terms roughly cancel for a mutual tie, so the conditional log-odds of a mutual tie are about zero, which means the probability is about 50%. (Do you see why a log-odds of zero corresponds to a probability of 50%?) By contrast, a non-mutual tie has a conditional log-odds of -2.16, or 10% probability.
Triangle terms in directed networks can have many different
configurations, given the directional ties. Many of these configurations
are coded as ergm
terms (and we’ll talk about these more
below).
It is important to distinguish between the absence of a tie and the
absence of data on whether a tie exists. The former is an observed zero,
whereas the latter is unobserved. You should not code both of these as
“0”. The ergm
package recognizes and handles missing data
appropriately, as long as you identify the data as missing. Let’s
explore this with a simple example.
Start by estimating an ergm on a network with two missing ties, where both ties are identified as missing.
missnet <- network.initialize(10,directed=F) # initialize an empty net with 10 nodes
missnet[1,2] <- missnet[2,7] <- missnet[3,6] <- 1 # add a few ties
missnet[4,6] <- missnet[4,9] <- missnet[5,6] <- NA # mark a few dyads missing
summary(missnet)
## Network attributes:
## vertices = 10
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 6
## missing edges = 3
## non-missing edges = 3
## density = 0.06666667
##
## Vertex attributes:
## vertex.names:
## character valued attribute
## 10 valid vertex names
##
## No edge attributes
##
## Network adjacency matrix:
## 1 2 3 4 5 6 7 8 9 10
## 1 0 1 0 0 0 0 0 0 0 0
## 2 1 0 0 0 0 0 1 0 0 0
## 3 0 0 0 0 0 1 0 0 0 0
## 4 0 0 0 0 0 NA 0 0 NA 0
## 5 0 0 0 0 0 NA 0 0 0 0
## 6 0 0 1 NA NA 0 0 0 0 0
## 7 0 1 0 0 0 0 0 0 0 0
## 8 0 0 0 0 0 0 0 0 0 0
## 9 0 0 0 NA 0 0 0 0 0 0
## 10 0 0 0 0 0 0 0 0 0 0
# plot missnet with missing dyads colored red.
tempnet <- missnet
tempnet[4,6] <- tempnet[4,9] <- tempnet[5,6] <- 1
missnetmat <- as.matrix(missnet)
missnetmat[is.na(missnetmat)] <- 2
plot(tempnet,label = network.vertex.names(tempnet),
edge.col = missnetmat)
## edges
## 3
## Starting maximum pseudolikelihood estimation (MPLE):
## Obtaining the responsible dyads.
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Evaluating log-likelihood at the estimate.
## Call:
## ergm(formula = missnet ~ edges)
##
## Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -2.5649 0.5991 0 -4.281 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 58.22 on 42 degrees of freedom
## Residual Deviance: 21.61 on 41 degrees of freedom
##
## AIC: 23.61 BIC: 25.35 (Smaller is better. MC Std. Err. = 0)
The coefficient equals -2.56, which corresponds to a probability of 7.14%. Our network has 3 ties, out of the 42 non-missing nodal pairs (10 choose 2 minus 3): 3/42 = 7.14%. So our estimate represents the probability of a tie in the observed sample.
Now let’s assign those missing ties the (observed) value “0” and
check how the value of the coefficient will change. Can you predict
whether it will get bigger or smaller? Can you calculate it directly
before checking the output of an ergm
fit? Let’s see what
happens.
missnet_bad <- missnet # create network with missing dyads set to 0
missnet_bad[4,6] <- missnet_bad[4,9] <- missnet_bad[5,6] <- 0
# fit an ergm to the network with missing dyads set to 0
summary(missnet_bad)
## Network attributes:
## vertices = 10
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 3
## missing edges = 0
## non-missing edges = 3
## density = 0.06666667
##
## Vertex attributes:
## vertex.names:
## character valued attribute
## 10 valid vertex names
##
## No edge attributes
##
## Network adjacency matrix:
## 1 2 3 4 5 6 7 8 9 10
## 1 0 1 0 0 0 0 0 0 0 0
## 2 1 0 0 0 0 0 1 0 0 0
## 3 0 0 0 0 0 1 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0
## 6 0 0 1 0 0 0 0 0 0 0
## 7 0 1 0 0 0 0 0 0 0 0
## 8 0 0 0 0 0 0 0 0 0 0
## 9 0 0 0 0 0 0 0 0 0 0
## 10 0 0 0 0 0 0 0 0 0 0
## Starting maximum pseudolikelihood estimation (MPLE):
## Obtaining the responsible dyads.
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Evaluating log-likelihood at the estimate.
## Call:
## ergm(formula = missnet_bad ~ edges)
##
## Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -2.6391 0.5976 0 -4.416 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 62.38 on 45 degrees of freedom
## Residual Deviance: 22.04 on 44 degrees of freedom
##
## AIC: 24.04 BIC: 25.85 (Smaller is better. MC Std. Err. = 0)
The coefficient is smaller now because the missing ties are counted
as “0”, and this translates to a conditional tie probability of
6.67%.
It’s a small difference in this case (and a small network, with little
missing data).
MORAL: If you have missing data on ties, be sure to identify them by assigning the “NA” code. This is particularly important if you’re reading in data as an edgelist, as all dyads without edges are implicitly set to “0” in this case.
When dyad dependent terms are in the model, the computational
algorithms in ergm
use MCMC (with a Metropolis-Hastings
sampler) to estimate the parameters.
For these models, it is important to assess model convergence before
interpreting the model results – before evaluating statistical
significance, interpreting coefficients, or assessing goodness of
fit.
To do this, we use the function mcmc.diagnostics
.
Below we show a simple example of a model that converges, and how to use the MCMC diagnostics to identify this.
We will first consider a simple dyadic dependent model where the algorithm works using the program defaults, with a degree(1) term that captures whether there are more (or less) degree 1 nodes than we would expect, given the density.
## edges degree1
## 15 3
## Call:
## ergm(formula = flobusiness ~ edges + degree(1))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -2.1038 0.2800 0 -7.514 <1e-04 ***
## degree1 -0.5628 0.7402 0 -0.760 0.447
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 166.36 on 120 degrees of freedom
## Residual Deviance: 89.43 on 118 degrees of freedom
##
## AIC: 93.43 BIC: 99.01 (Smaller is better. MC Std. Err. = 0.03369)
## Sample statistics summary:
##
## Iterations = 7168:131072
## Thinning interval = 512
## Number of chains = 1
## Sample size per chain = 243
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## edges 0.4486 3.994 0.25621 0.2562
## degree1 -0.4033 1.511 0.09693 0.1115
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## edges -7 -2 0 3 7
## degree1 -3 -1 -1 1 3
##
##
## Are sample statistics significantly different from observed?
## edges degree1 (Omni)
## diff. 0.44855967 -0.403292181 NA
## test stat. 1.75077918 -3.615942344 13.284101750
## P-val. 0.07998395 0.000299257 0.001657501
##
## Sample statistics cross-correlations:
## edges degree1
## edges 1.0000000 -0.4396489
## degree1 -0.4396489 1.0000000
##
## Sample statistics auto-correlation:
## Chain 1
## edges degree1
## Lag 0 1.00000000 1.00000000
## Lag 512 0.08269575 0.13740188
## Lag 1024 -0.04222338 -0.06091563
## Lag 1536 -0.08709292 0.02710151
## Lag 2048 -0.08232951 -0.08106464
## Lag 2560 -0.04677418 -0.09764930
##
## Sample statistics burn-in diagnostic (Geweke):
## Chain 1
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## edges degree1
## -0.0638761398 -0.0009104091
##
## Individual P-values (lower = worse):
## edges degree1
## 0.9490689 0.9992736
## Joint P-value (lower = worse): 0.9982535
##
## Note: MCMC diagnostics shown here are from the last round of
## simulation, prior to computation of final parameter estimates.
## Because the final estimates are refinements of those used for
## this simulation run, these diagnostics may understate model
## performance. To directly assess the performance of the final
## model on in-model statistics, please use the GOF command:
## gof(ergmFitObject, GOF=~model).
What this shows is statistics from the final iteration of the MCMC chain, on the left as a “traceplot” (the deviation of the statistic value in each sampled network from the observed value), and on the right as the distribution of the sample statistic deviations.
This is what you want to see in the MCMC diagnostics: a “fuzzy caterpillar”. In the last section we’ll look at some models that don’t converge properly, and how to use MCMC diagnostics to identify and address this. There are many control parameters for the MCMC algorithm (“help(control.ergm)”), to improve model convergence.
Once we have estimated the coefficients of an ERGM, the model is completely specified. It defines a probability distribution across all networks of this size. If the model is a good fit to the observed data, then networks drawn from this distribution will be more likely to “resemble” the observed data.
## [1] "network.list"
## Number of Networks: 10
## Model: flomarriage ~ edges + nodecov("wealth")
## Reference: ~Bernoulli
## Constraints: ~. ~. - observed
## Stored network statistics:
## edges nodecov.wealth
## [1,] 23 2259
## [2,] 22 2316
## [3,] 27 2506
## [4,] 26 2668
## [5,] 24 2468
## [6,] 20 1907
## [7,] 22 2326
## [8,] 13 1671
## [9,] 27 3054
## [10,] 20 2397
## attr(,"monitored")
## [1] FALSE FALSE
## Number of Networks: 10
## Model: flomarriage ~ edges + nodecov("wealth")
## Reference: ~Bernoulli
## Constraints: ~. ~. - observed
## $coefficients
## edges nodecov.wealth
## -2.59492903 0.01054591
##
## $control
## Control parameter list generated by 'control.simulate.formula' or equivalent. Non-empty parameters:
## MCMC.burnin: 16384
## MCMC.interval: 1024
## MCMC.scale: 1
## MCMC.prop: ~sparse + .triadic
## MCMC.prop.weights: "default"
## MCMC.batch: 0
## MCMC.effectiveSize.damp: 10
## MCMC.effectiveSize.maxruns: 1000
## MCMC.effectiveSize.burnin.pval: 0.2
## MCMC.effectiveSize.burnin.min: 0.05
## MCMC.effectiveSize.burnin.max: 0.5
## MCMC.effectiveSize.burnin.nmin: 16
## MCMC.effectiveSize.burnin.nmax: 128
## MCMC.effectiveSize.burnin.PC: FALSE
## MCMC.effectiveSize.burnin.scl: 1024
## MCMC.maxedges: Inf
## MCMC.runtime.traceplot: FALSE
## network.output: "network"
## parallel: 0
## parallel.version.check: TRUE
## parallel.inherit.MT: FALSE
## MCMC.samplesize: 10
## obs.MCMC.mul: 0.25
## obs.MCMC.samplesize.mul: 0.5
## obs.MCMC.interval.mul: 0.5
## obs.MCMC.burnin.mul: 0.5
## obs.MCMC.prop: ~sparse + .triadic
## obs.MCMC.prop.weights: "default"
## MCMC.save_networks: TRUE
##
## $response
## [1] NA
##
## $class
## [1] "network.list"
##
## $stats
## edges nodecov.wealth
## [1,] 23 2259
## [2,] 22 2316
## [3,] 27 2506
## [4,] 26 2668
## [5,] 24 2468
## [6,] 20 1907
## [7,] 22 2326
## [8,] 13 1671
## [9,] 27 3054
## [10,] 20 2397
## attr(,"monitored")
## [1] FALSE FALSE
##
## $formula
## flomarriage ~ edges + nodecov("wealth")
## attr(,".Basis")
## Network attributes:
## vertices = 16
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 20
## missing edges= 0
## non-missing edges= 20
##
## Vertex attribute names:
## priorates totalties vertex.names wealth
##
## No edge attributes
##
## $constraints
## $constraints[[1]]
## ~.
## <environment: base>
##
## $constraints[[2]]
## ~. - observed
## <environment: base>
##
##
## $reference
## ~Bernoulli
## <environment: base>
# are the simulated stats centered on the observed stats?
rbind("obs"=summary(flomarriage~edges+nodecov("wealth")),
"sim mean"=colMeans(attr(flomodel.03.sim, "stats")))
## edges nodecov.wealth
## obs 20.0 2168.0
## sim mean 22.4 2357.2
## Network attributes:
## vertices = 16
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 23
## missing edges= 0
## non-missing edges= 23
##
## Vertex attribute names:
## priorates totalties vertex.names wealth
##
## No edge attributes
plot(flomodel.03.sim[[1]],
label= flomodel.03.sim[[1]] %v% "vertex.names",
vertex.cex = (flomodel.03.sim[[1]] %v% "wealth")/25)
Voila. Of course, yours will look somewhat different.
ERGMs can be seen as generative models when they represent the
process that governs the global patterns of tie prevalence from a local
perspective: the perspective of the nodes involved in the particular
micro-configurations represented by the ergm
terms in the
model. The locally generated processes in turn aggregate up to produce
characteristic global network properties, even though these global
properties are not explicit terms in the model.
One test of whether a local model “fits the data” is therefore how well it reproduces the observed global network properties that are not in the model. We do this by choosing a network statistic that is not in the model, and comparing the value of this statistic observed in the original network to the distribution of values we get in simulated networks from our model, using the gof function.
The gof function is a bit different than the
summary, ergm, and
simulate functions, in that it currently only takes 3
ergm
terms as arguments: degree, esp (edgwise share
partners), and distance (geodesic distances). Each of these terms
captures an aggregate network distribution, at either the node level
(degree), the edge level (esp), or the dyad level (distance).
##
## Goodness-of-fit for degree
##
## obs min mean max MC p-value
## degree0 1 0 1.35 5 1.00
## degree1 4 0 3.14 9 0.72
## degree2 2 1 4.30 8 0.22
## degree3 6 0 3.21 8 0.18
## degree4 2 0 2.04 6 1.00
## degree5 0 0 1.08 4 0.74
## degree6 1 0 0.53 3 0.82
## degree7 0 0 0.24 3 1.00
## degree8 0 0 0.03 1 1.00
## degree9 0 0 0.07 1 1.00
## degree10 0 0 0.01 1 1.00
##
## Goodness-of-fit for edgewise shared partner
##
## obs min mean max MC p-value
## esp0 12 6 11.85 19 1.00
## esp1 7 0 6.48 13 0.98
## esp2 1 0 1.80 7 1.00
## esp3 0 0 0.25 3 1.00
##
## Goodness-of-fit for minimum geodesic distance
##
## obs min mean max MC p-value
## 1 20 11 20.38 30 0.88
## 2 35 12 34.89 64 1.00
## 3 32 6 26.22 44 0.58
## 4 15 0 10.86 25 0.56
## 5 3 0 3.30 16 0.80
## 6 0 0 1.05 12 1.00
## 7 0 0 0.33 8 1.00
## 8 0 0 0.10 3 1.00
## 9 0 0 0.02 2 1.00
## Inf 15 0 22.85 85 1.00
##
## Goodness-of-fit for model statistics
##
## obs min mean max MC p-value
## edges 20 11 20.38 30 0.88
## nodecov.wealth 2168 1259 2218.74 3163 0.90
## Starting maximum pseudolikelihood estimation (MPLE):
## Obtaining the responsible dyads.
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Evaluating log-likelihood at the estimate.
## Warning in gof.formula(object = object$formula, coef = coef, GOF = GOF, :
## No parameter values given, using 0.
For a good example of model exploration and fitting for the Add Health Friendship networks, see Goodreau, Kitts, and Morris (2009). For more technical details on the approach, see Hunter, Goodreau, and Handcock (2008).
When a model is not a good representation of the observed network, the simulated networks produced in the MCMC chains may be far enough away from the observed network that the estimation process is affected. In the worst case scenario, the simulated networks will be so different that the algorithm fails altogether. When this happens, it basically means the model you specified would not have produced the network you observed. Some models, we now know, would almost never produce an interesting network with this density – this is what we call “model degneracy.”
For more detailed discussion of model degeneracy in the ERGM context, see Handcock (2003), Snijders et al. (2006), and Schweinberger (2011).
In that worst case scenario, we end up not being able to obtain
coefficent estimates, so we can’t use the GOF function to identify how
the model simulations deviate from the observed data. We can, however,
still use the MCMC diagnostics to observe what is happening with the
simulation algorithm, and this (plus some experience and intuition about
the behavior of ergm
terms) can help us improve the model
specification.
The computational algorithms in ergm
use MCMC to
estimate the likelihood function. Part of this process involves
simulating a set of networks to approximate unknown components of the
likelihood.
When a model is not a good representation of the observed network the estimation process may be affected. In the worst case scenario, the simulated networks will be so different from the observed network that the algorithm fails altogether. This can occur for two general reasons. First, the simulation algorithm may fail to converge, and the sampled networks are thus not from the specified distribution. Second, the model parameters used to simulate the networks are too different from the MLE, so even though the simulation algorithm is producing a representative sample of networks, this is not the sample that would be produced under the MLE.
For more detailed discussions of model degeneracy in the ERGM context, see the papers in J Stat Software v. 24. (link is available online at (www.statnet.org))
We can use diagnostics to see what is happening with the simulation algorithm, and these can lead us to ways to improve it.
We will first consider a simulation where the algorithm works. To understand the algorithm, consider
fit <- ergm(flobusiness~edges+degree(1),
control=control.ergm(MCMC.interval=1, MCMC.burnin=1000, seed=1))
This runs a version with every network returned. Let us look at the diagnostics produced:
Let’s look more carefully at a default model fit:
Now let us look at a more interesting case, using a larger network:
## Starting maximum pseudolikelihood estimation (MPLE):
## Obtaining the responsible dyads.
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Error in ergm.MCMLE(init, s, s.obs, control = control, verbose = verbose, : Number of edges in a simulated network exceeds that in the observed by a factor of more than 20. This is a strong indicator of model degeneracy or a very poor starting parameter configuration. If you are reasonably certain that neither of these is the case, increase the MCMLE.density.guard control.ergm() parameter.
Very interesting. This model produced degenerate networks. You could have gotten some more feedback about this during the fitting, by using:
One approach to solving model degeneracy would be to modify the parameters of MCMC algorithm. As one example, increasing the MCMC sample size can sometimes help:
fit <- ergm(magnolia~edges+triangle,seed=1,
control = control.ergm(seed=1, MCMC.samplesize=20000),
verbose=T)
mcmc.diagnostics(fit, center=F)
But it does not solve the problem in this case, and in general this type of degeneracy is unlikely to be fixed by tuning the MCMC parameters.
How about trying a different model specification: use GWESP, the more robust approach to modeling triangles? (For a technical introduction to GWESP see Hunter 2007; for a more intuitive description and empirical application see Goodreau Kitts and Morris 2009 in the references.)
Note that we are running with a somewhat lower precision than the default, to save time.
fit <- ergm(magnolia~edges+gwesp(0.5,fixed=T)+nodematch('Grade')+nodematch('Race')+
nodematch('Sex'),
control = control.ergm(seed=1, MCMLE.MCMC.precision=2))
mcmc.diagnostics(fit)
Still degenerate, though a bit better. Let’s try adding some dyad-independent terms that can help to restrict the effects of the tie dependence.
One more try…
fit <- ergm(magnolia~edges+gwesp(0.25,fixed=T)+nodematch('Grade')+nodematch('Race')+
nodematch('Sex'),
control = control.ergm(seed=1, MCMLE.MCMC.precision=2),
verbose=T)
## Evaluating network in model.
## Initializing unconstrained Metropolis-Hastings proposal: 'ergm:MH_SPDyad'.
## Initializing model...
## Model initialized.
## Using initial method 'MPLE'.
## Fitting initial model.
## Starting maximum pseudolikelihood estimation (MPLE):
## Obtaining the responsible dyads.
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Density guard set to 19563 from an initial count of 974 edges.
##
## Iteration 1 of at most 60 with parameter:
## edges gwesp.fixed.0.25 nodematch.Grade nodematch.Race
## -9.6939224 1.6624034 2.8170436 0.9870588
## nodematch.Sex
## 0.6188166
## Starting unconstrained MCMC...
## Back from unconstrained MCMC.
## New interval = 1024.
## Average estimating function values:
## edges gwesp.fixed.0.25 nodematch.Grade nodematch.Race
## 39.65879 -31.62220 36.40682 40.39108
## nodematch.Sex
## -14.76378
## Starting MCMLE Optimization...
## 1 Optimizing with step length 0.5942.
## Using lognormal metric (see control.ergm function).
## Using log-normal approx (no optim)
## The log-likelihood improved by 2.1519.
## Estimating equations are not within tolerance region.
##
## Iteration 2 of at most 60 with parameter:
## edges gwesp.fixed.0.25 nodematch.Grade nodematch.Race
## -9.7204189 1.7335310 2.7682215 0.9552951
## nodematch.Sex
## 0.6785759
## Starting unconstrained MCMC...
## Back from unconstrained MCMC.
## New interval = 1024.
## Average estimating function values:
## edges gwesp.fixed.0.25 nodematch.Grade nodematch.Race
## 8.880759 -25.666729 6.411924 12.514905
## nodematch.Sex
## -20.403794
## Starting MCMLE Optimization...
## 1 Optimizing with step length 1.0000.
## Using lognormal metric (see control.ergm function).
## Using log-normal approx (no optim)
## The log-likelihood improved by 2.4055.
## Distance from origin on tolerance region scale: 2.39901 (previously 7.958435).
## Estimating equations are not within tolerance region.
##
## Iteration 3 of at most 60 with parameter:
## edges gwesp.fixed.0.25 nodematch.Grade nodematch.Race
## -9.7676025 1.7931174 2.7314946 0.9050333
## nodematch.Sex
## 0.7765053
## Starting unconstrained MCMC...
## Back from unconstrained MCMC.
## New interval = 2048.
## Average estimating function values:
## edges gwesp.fixed.0.25 nodematch.Grade nodematch.Race
## -0.3056478 -4.3497703 -2.5116279 -1.6976744
## nodematch.Sex
## -0.1362126
## Starting MCMLE Optimization...
## 1 Optimizing with step length 1.0000.
## Using lognormal metric (see control.ergm function).
## Using log-normal approx (no optim)
## Starting MCMC s.e. computation.
## The log-likelihood improved by 0.0293.
## Distance from origin on tolerance region scale: 0.02916117 (previously 2.983332).
## Test statistic: T^2 = 54.57368, with 5 free parameter(s) and 101.2789 degrees of freedom.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Initializing model to obtain the list of dyad-independent terms...
## Fitting the dyad-independent submodel...
## Dyad-independent submodel MLE has likelihood -6528.714 at:
## [1] -10.0127658 0.0000000 3.2310453 1.1964585 0.8843782 0.0000000
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Initializing model and proposals...
## Model and proposals initialized.
## Using 16 bridges: Running theta=[-9.7925991, 1.7438864, 2.7560607, 0.9200732, 0.7786069, 0.0000000].
## Running theta=[-9.8068034, 1.6313776, 2.7867049, 0.9379045, 0.7854308, 0.0000000].
## Running theta=[-9.8210077, 1.5188688, 2.8173491, 0.9557358, 0.7922548, 0.0000000].
## Running theta=[-9.8352120, 1.4063600, 2.8479932, 0.9735671, 0.7990788, 0.0000000].
## Running theta=[-9.8494163, 1.2938512, 2.8786374, 0.9913984, 0.8059027, 0.0000000].
## Running theta=[-9.8636206, 1.1813424, 2.9092816, 1.0092297, 0.8127267, 0.0000000].
## Running theta=[-9.8778249, 1.0688336, 2.9399257, 1.0270610, 0.8195506, 0.0000000].
## Running theta=[-9.8920292, 0.9563248, 2.9705699, 1.0448924, 0.8263746, 0.0000000].
## Running theta=[-9.9062335, 0.8438160, 3.0012141, 1.0627237, 0.8331985, 0.0000000].
## Running theta=[-9.9204378, 0.7313072, 3.0318582, 1.0805550, 0.8400225, 0.0000000].
## Running theta=[-9.9346422, 0.6187984, 3.0625024, 1.0983863, 0.8468465, 0.0000000].
## Running theta=[-9.9488465, 0.5062896, 3.0931466, 1.1162176, 0.8536704, 0.0000000].
## Running theta=[-9.9630508, 0.3937808, 3.1237907, 1.1340489, 0.8604944, 0.0000000].
## Running theta=[-9.9772551, 0.2812720, 3.1544349, 1.1518802, 0.8673183, 0.0000000].
## Running theta=[-9.9914594, 0.1687632, 3.1850790, 1.1697115, 0.8741423, 0.0000000].
## Running theta=[-10.0056637, 0.0562544, 3.2157232, 1.1875429, 0.8809662, 0.0000000].
## .
## Bridge sampling finished. Collating...
## Bridging finished.
##
## This model was fit using MCMC. To examine model diagnostics and
## check for degeneracy, use the mcmc.diagnostics() function.
## Sample statistics summary:
##
## Iterations = 69632:1298432
## Thinning interval = 4096
## Number of chains = 1
## Sample size per chain = 301
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## edges -0.3056 42.47 2.448 6.993
## gwesp.fixed.0.25 -4.3498 39.42 2.272 7.525
## nodematch.Grade -2.5116 40.73 2.348 7.188
## nodematch.Race -1.6977 37.20 2.144 5.995
## nodematch.Sex -0.1362 35.79 2.063 6.932
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## edges -92.00 -26.00 -3.000 26.00 81.50
## gwesp.fixed.0.25 -76.19 -28.29 -4.756 20.46 73.31
## nodematch.Grade -83.50 -25.00 -8.000 24.00 72.50
## nodematch.Race -76.50 -26.00 0.000 23.00 71.00
## nodematch.Sex -81.50 -21.00 0.000 24.00 65.00
##
##
## Are sample statistics significantly different from observed?
## edges gwesp.fixed.0.25 nodematch.Grade nodematch.Race
## diff. -0.30564784 -4.3497703 -2.5116279 -1.6976744
## test stat. -0.04370733 -0.5780192 -0.3494084 -0.2832045
## P-val. 0.96513769 0.5632512 0.7267827 0.7770201
## nodematch.Sex (Omni)
## diff. -0.13621262 NA
## test stat. -0.01964918 11.03863445
## P-val. 0.98432323 0.06932526
##
## Sample statistics cross-correlations:
## edges gwesp.fixed.0.25 nodematch.Grade nodematch.Race
## edges 1.0000000 0.8574745 0.9484522 0.9388034
## gwesp.fixed.0.25 0.8574745 1.0000000 0.8732103 0.8460917
## nodematch.Grade 0.9484522 0.8732103 1.0000000 0.8935610
## nodematch.Race 0.9388034 0.8460917 0.8935610 1.0000000
## nodematch.Sex 0.8799087 0.7700025 0.8369442 0.8386091
## nodematch.Sex
## edges 0.8799087
## gwesp.fixed.0.25 0.7700025
## nodematch.Grade 0.8369442
## nodematch.Race 0.8386091
## nodematch.Sex 1.0000000
##
## Sample statistics auto-correlation:
## Chain 1
## edges gwesp.fixed.0.25 nodematch.Grade nodematch.Race
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 4096 0.7056675 0.6133532 0.7692439 0.7257324
## Lag 8192 0.5814604 0.4474297 0.6312852 0.5762295
## Lag 12288 0.4581094 0.3358273 0.5063366 0.4849884
## Lag 16384 0.3681041 0.2895174 0.3933437 0.3967814
## Lag 20480 0.3242175 0.2516283 0.3442952 0.3352525
## nodematch.Sex
## Lag 0 1.0000000
## Lag 4096 0.7418177
## Lag 8192 0.6622365
## Lag 12288 0.5366724
## Lag 16384 0.4686849
## Lag 20480 0.4199578
##
## Sample statistics burn-in diagnostic (Geweke):
## Chain 1
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## edges gwesp.fixed.0.25 nodematch.Grade nodematch.Race
## -1.829251 -0.693051 -1.334354 -1.347646
## nodematch.Sex
## -1.788516
##
## Individual P-values (lower = worse):
## edges gwesp.fixed.0.25 nodematch.Grade nodematch.Race
## 0.06736203 0.48827755 0.18208798 0.17777223
## nodematch.Sex
## 0.07369277
## Joint P-value (lower = worse): 0.1495111
##
## Note: MCMC diagnostics shown here are from the last round of
## simulation, prior to computation of final parameter estimates.
## Because the final estimates are refinements of those used for
## this simulation run, these diagnostics may understate model
## performance. To directly assess the performance of the final
## model on in-model statistics, please use the GOF command:
## gof(ergmFitObject, GOF=~model).
Success! Of course, in real life one might have a lot more trial and error.
Degeneracy is often an indicator of a poorly specified model. It is
not a property of all ERGMs, but it is associated with some
dyadic-dependent terms, in particular, the reduced homogenous Markov
specifications (e.g., 2-stars and triangle terms). For a good technical
discussion of unstable terms see Schweinberger
(2011). For a discussion of alternative terms that exhibit more
stable behavior see Snijders et al. (2006)
and for the gwesp
term (and the curved exponential family
terms in general) see Hunter and Handcock
(2006).
One of the most powerful features of ERGMs is that they can be used to estimate models from from egocentrically sampled data, and the fitted models can then be used to simulate complete networks (of any size) that will have the properties of the original network that are observed and represented in the model.
The egocentric estimation/simulation framework extends to temporal ERGMs (“TERGMs”) as well, with the minimal addition of an estimate of partnership duration. This makes it possible to simulate complete dynamic networks from a single cross-sectional egocentrically sampled network. For an example of what you can do with this, check out the network movie we developed to explore the impact of dynamic network structure on HIV transmission, see https://statnet.org/movies/ .
While the ergm
package can be used with egocentric data,
we recommend instead to use the package ergm.ego
. This
package includes accurate statistical inference and many utilities that
simplify the task of reading in the data, conducting exploratory
analyses, calculating the sample “target statistics”, and specifying
model options.
We have a workshop/tutorial for ergm.ego
at the statnet Workshops site.
statnet
is a suite of packages that are designed to work
together, and provide tools for a wide range of different types of
network data analysis. Examples include temporal network models and
dynamic network vizualizations, multilevel network modeling, latent
cluster models and network diffusion and epidemic models. Development is
ongoing, and there are new packages, and new functionality added to
existing packages on a regular basis.
All of these packages can be downloaded from CRAN. For more detailed
information, please visit the statnet
webpage www.statnet.org.
Packages developed by statnet team that are not covered in this tutorial:
sna
– classical social network analysis utilitiestsna
– descriptive statistics for temporal network
datatergm
– temporal ERGMs for dynamic networksergm.ego
– estimation/simulation of ergms from
egocentrically sampled dataergm.count
– models for tie count network dataergm.rank
– models for tie rank network dataergm.multi
– models of multilayer networks and for
samples of networksrelevent
– relational event models for networkslatentnet
– latent space and latent cluster
analysisdegreenet
– MLE estimation for degree distributions
(negative binomial, Poisson, scale-free, etc.)networksis
– simulation of bipartite networks with
given degree distributionsndtv
– network movie makerEpiModel
– network modeling of infectious disease and
social diffusion processesergm
There are now a number of excellent packages developed by others that
extend the functionality of statnet. The easiest way to find these is to
look at the “reverse depends” of the ergm
package on CRAN.
Examples include:
Bergm
– Bayesian Exponential Random Graph Modelsbtergm
– Temporal Exponential Random Graph Models by
Bootstrapped Pseudolikelihoodhergm
– hierarchical ERGMs for multi-level network
dataxergm
– extensions to ERGM modelingstatnet
development teamPavel N. Krivitsky p.krivitsky@unsw.edu.au
Martina Morris morrism@u.washington.edu
Mark S. Handcock handcock@stat.ucla.edu
David R. Hunter dhunter@stat.psu.edu
Carter T. Butts buttsc@uci.edu
Steven M. Goodreau goodreau@u.washington.edu
Skye Bender-deMoll skyebend@skyeome.net
Samuel M. Jenness samuel.m.jenness@emory.edu
The best place to start is the special issue of the Journal of
Statistical Software (JSS) devoted to statnet
: link
The nine papers in this issue cover a wide range of theoretical and
practical topics related to ERGMs, and their implementation in
statnet
.
HOWEVER: Note that this issue was written in 2008. The statnet code base has evolved considerably since that time, so some of the syntax specified in the articles may no longer work (in most cases because it has been replace with something better).
An overview of most recent update, ergm 4
, can be found
at Krivitsky et al. (2023).
For social scientists, a good introductory application paper is Goodreau, Kitts, and Morris (2009).
Handcock (2003)
Schweinberger (2011)
Snijders et al. (2006)
Hunter (2007)
Krivitsky and Handcock (2014)
Krivitsky, Handcock, and Morris (2011)
Krivitsky and Morris (2017)