Abstract
This document aims at introducing various GMA-SMA models, most notably those from Forst et al (2019) for varietal mixtures, notably by showing how plant-plant interactions are encoded into the design matrices.suppressPackageStartupMessages(library(plantmix))
The phenotypic variation among varietal mixtures can be decomposed into contributions of general and specific mixing abilities, noted GMA and SMA respectively. Note that these models tackle the case where only a single phenotypic observation is available per plot of a mixed stand: there is no access to genotypic performances within the mixture.
Forst et al (2019) introduced three models to jointly analyze stands of any order, that is, monovarietal stands (order 1) and mixed stands (order > 1). This article also distinguished between inter-genotypic interactions, noted \(SMA_{ij}\), and intra-genotypic interactions, noted \(SMA_{ii}\). Moreover, all models were framed into the formalism of linear mixed models, GMAs and SMAs being random effects, in order to deal with incomplete designs.
All three models have a contribution of \(GMA\), but they distinguish from each other based on the kind of interactions that are explicitly modeled:
model 1 assumes no interaction (no \(SMA\));
model 2 assumes inter-genotypic interactions (\(SMA_{ij}\)) in mixed stands and intra-genotypic interactions (\(SMA_{ii}\)) in monovarietal stands only, both kinds of interactions coming from a single distribution;
model 3 assumes inter-genotypic interactions in mixed stands (\(SMA_{ij}\)) and intra-genotypic interactions (\(SMA_{ii}\)) in both monovarietal and mixed stands, and here again with both kinds of interactions coming from a single distribution.
Other models are also conceivable, such as:
2’, a variant of model 2: \(SMA_{ij}\) in mixed stands and no other interactions;
2’’, another variant of model 2: \(SMA_{ij}\) in mixed stands and \(SMA_{ii}\) in monovarietal stands, but with a different distribution for each kind of interactions;
3’, a variant of model 3: \(SMA_{ij}\) in mixed stands and \(SMA_{ii}\) in both monovarietal and mixed stands, but with a different distribution for each kind of interactions.
\(n\) is the index of a given plot that can contain a monovarietal or a mixed stand;
\(K(n)\) is the number of genotypes in plot \(n\) (1 if monovarietal stand, >1 if mixed);
\(y_{nb}\) is the response variable corresponding to the yield of plot \(n\) in block \(b\);
\(\mu\) is the intercept;
\(\alpha_b\) is the effect of block \(b\), modeled as “fixed”;
\(GMA_{k(n)}\) is the general mixture ability of genotype \(k\) in plot \(n\), modeled as “random”;
\(SMA_{k(n)k'(n)}\) is the specific mixture ability of the genotype pair \(k\) and \(k'\) in plot \(n\), modeled as “random”.
Only first-order (i.e., pairwise) interactions are taken into account; high-order interactions are ignored.
Genotypes in mixed stands are assumed to be sowned in equal proportions, \(1 / K(n)\), and to be harvested in the same proportions (but the proportions can be adapted as indicated at the end of section 4.7 in the article).
The magnitude of interactions is inversely proportional to mixture size.
There are only GMA effects and no SMA effect:
\[ y_{nb} = \mu + \alpha_b + \frac{1}{K(n)} \sum_{k(n)=1}^{K(n)} GMA_{k(n)} + \epsilon_{nb} \]
There are both GMA and SMA effects:
in mixed stands, SMAs only contain inter-genotypic interactions;
monovarietal stands also have SMAs, corresponding to intra-genotypic interactions.
\[ y_{nb} = \mu + \alpha_b + \frac{1}{K(n)} \sum_{k(n)=1}^{K(n)} GMA_{k(n)} + \frac{1}{C_{K(n)}^2} \sum_{k(n)=1}^{K(n)-1} \sum_{k'(n)=k(n)+1}^{K(n)} SMA_{k(n)k'(n)} + \epsilon_{nb} \]
where \(C_{K(n)}^2\) corresponds to the number of combinations of \(K(n)\) genotypes taken two at a time (used for mixed stands only).
Note that interactions are not oriented, i.e., \(SMA_{k(n)k'(n)} = SMA_{k'(n)k(n)}\).
There are also both GMA and SMA effects:
in mixed stands, SMAs contains inter-genotypic interactions as well as intra-genotypic interactions;
monovarietal stands also have SMAs, corresponding to intra-genotypic interactions.
\[ y_{nb} = \mu + \alpha_b + \frac{1}{K(n)} \sum_{k(n)=1}^{K(n)} GMA_{k(n)} + \frac{1}{K(n)^2} \sum_{k(n)=1}^{K(n)} \sum_{k'(n)=1}^{K(n)} SMA_{k(n)k'(n)} + \epsilon_{nb} \]
All three models can be written in vector form:
\[ \boldsymbol{y} = X \, \boldsymbol{\beta} + Z_G \, \boldsymbol{GMA} + (\, Z_S \, \boldsymbol{SMA} \, ) + \boldsymbol{\epsilon} \]
where vectors are in bold.
The design matrices are \(X\), \(Z_G\) and \(Z_S\). They make the correspondence between \(\boldsymbol{y}\) and the levels of the effects, respectively \(\boldsymbol{\beta}\) (fixed), \(\boldsymbol{GMA}\) (random) and \(\boldsymbol{SMA}\) (random). All three matrices have as many rows as the length of \(\boldsymbol{y}\).
\(\boldsymbol{GMA} \sim \mathcal{N}(\boldsymbol{0}, \sigma^2_{GMA} A_1)\) where \(A_1\) is known
\(\boldsymbol{SMA} \sim \mathcal{N}(\boldsymbol{0}, \sigma^2_{SMA} A_2)\) for both \(SMA_{ij}\) and \(SMA_{ii}\) where \(A_2\) is also known
\(\boldsymbol{\epsilon} \sim \mathcal{N}\boldsymbol{0}, \sigma^2_e Id)\)
model 2’, with \(SMA_{ij}\) in mixed stands and no other interactions: same as model 2 except that there is no \(SMA_{ii}\)
model 2’’, with \(SMA_{ij}\) in mixed stands and \(SMA_{ii}\) only in monovarietal stands, with a different distribution for each kind of interactions: same as model 2 except that \(\boldsymbol{SMA}_{ij} \sim \mathcal{N}(\boldsymbol{0}, \sigma^2_{SMA_{ij}} A_2)\) and \(\boldsymbol{SMA}_{ii} \sim \mathcal{N}(\boldsymbol{0}, \sigma^2_{SMA_{ii}} A_2)\)
model 3’, with \(SMA_{ij}\) in mixed stands and \(SMA_{ii}\) in both monovarietal and mixed stands, with a different distribution for each kind of interactions: same as model 3 except that \(\boldsymbol{SMA}_{ij} \sim \mathcal{N}(\boldsymbol{0}, \sigma^2_{SMA_{ij}} A_2)\) and \(\boldsymbol{SMA}_{ii} \sim \mathcal{N}(\boldsymbol{0}, \sigma^2_{SMA_{ii}} A_2)\)
For pedagogical purposes, let us imagine a design with three
genotypes (g1, g2 and g3)
containing one monovarietal stand (stand1), one two-way
mixture (stand2) and one three-way mixture
(stand3).
| id | genos | type | order | |
|---|---|---|---|---|
| 1 | stand1 | g1 | monovarietal | 1 |
| 2 | stand2 | g1-g2 | mixed | 2 |
| 3 | stand3 | g1-g2-g3 | mixed | 3 |
\(Z_G\) has as many columns as the number of genotypes in the experimental design. From the example data frame above, \(Z_G\) hence has 3 rows and 3 columns.
Weights:
for a monovarietal stand: \(1 \, / \, K(n) = 1 \, / \, 1\)
for a binary mixture: \(1 \, / \, K(n) = 1 \, / \, 2\)
for a ternary mixture: \(1 \, / \, K(n) = 1 \, / \, 3\)
etc
| g1 | g2 | g3 | |
|---|---|---|---|
| g1 | 1.00 | 0.00 | 0.00 |
| g1-g2 | 0.50 | 0.50 | 0.00 |
| g1-g2-g3 | 0.33 | 0.33 | 0.33 |
\(Z_S\) has as many columns as the total number of possible intra-genotypic interactions plus the total number of possible (pairwise) inter-genotypic interactions. From the example data frame above, \(Z_S\) hence has \(3\) rows and \(3 + C_3^2 = 6\) columns.
Weights:
for a binary mixture: \(1 \, / \, C_2^2 = 1 \, / \, 1\)
for a ternary mixture: \(1 \, / \, C_3^2 = 1 \, / \, 3\)
for a quaternary mixture: \(1 \, / \, C_4^2 = 1 \, / \, 6\)
etc
| g1-g1 | g1-g2 | g1-g3 | g2-g3 | |
|---|---|---|---|---|
| g1 | 1 | 0.00 | 0.00 | 0.00 |
| g1-g2 | 0 | 1.00 | 0.00 | 0.00 |
| g1-g2-g3 | 0 | 0.33 | 0.33 | 0.33 |
Weights:
for a binary mixture: \(1 \, / \, K(n)^2 = 1 \, / \, 4\)
for a ternary mixture: \(1 \, / \, K(n)^2 = 1 \, / \, 9\)
| g1-g1 | g1-g2 | g1-g3 | g2-g2 | g2-g3 | g3-g3 | |
|---|---|---|---|---|---|---|
| g1 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| g1-g2 | 0.25 | 0.50 | 0.00 | 0.25 | 0.00 | 0.00 |
| g1-g2-g3 | 0.11 | 0.22 | 0.22 | 0.11 | 0.22 | 0.11 |
Same as model 2 but no \(SMA_{ii}\):
| g1-g2 | g1-g3 | g2-g3 | |
|---|---|---|---|
| g1 | 0.00 | 0.00 | 0.00 |
| g1-g2 | 1.00 | 0.00 | 0.00 |
| g1-g2-g3 | 0.33 | 0.33 | 0.33 |
Same as model 2 but \(SMA_{ii}\) and \(SMA_{ij}\) with different distributions:
Design matrix for \(SMA_{ij}\) only:
| g1-g2 | g1-g3 | g2-g3 | |
|---|---|---|---|
| g1 | 0.00 | 0.00 | 0.00 |
| g1-g2 | 1.00 | 0.00 | 0.00 |
| g1-g2-g3 | 0.33 | 0.33 | 0.33 |
Design matrix for \(SMA_{ii}\) only:
| Z_S_ii | |
|---|---|
| g1 | 1 |
| g1-g2 | 0 |
| g1-g2-g3 | 0 |
Same as model 3 but \(SMA_{ii}\) and \(SMA_{ij}\) with different distributions:
Design matrix for \(SMA_{ij}\) only:
| g1-g2 | g1-g3 | g2-g3 | |
|---|---|---|---|
| g1 | 0.00 | 0.00 | 0.00 |
| g1-g2 | 0.50 | 0.00 | 0.00 |
| g1-g2-g3 | 0.22 | 0.22 | 0.22 |
Design matrix for \(SMA_{ii}\) only:
| g1-g1 | g2-g2 | g3-g3 | |
|---|---|---|---|
| g1 | 1.00 | 0.00 | 0.00 |
| g1-g2 | 0.25 | 0.25 | 0.00 |
| g1-g2-g3 | 0.11 | 0.11 | 0.11 |
For the purpose of illustration, varietal mixtures will be assembled from a panel of 25 genotypes. Only 75 binary, balanced mixtures will be observed, among all 300 possible mixtures. Phenotypes will be simulated according to model 2 (see above), organized into a field trial with each mixture present in each block.
Ignored below, but available in the inference functions:
using data in monovarietal conditions while analyzing mixture data;
taking genomic relationships into account.
nbGenos <- 25
levGenos <- sprintf(
fmt = paste0("geno%0", floor(log10(nbGenos)) + 1, "i"),
1:nbGenos
)
nbMixes <- 75 # only binary and balanced
design <- getDesignBinaryVarMix(levGenos, nbMixes, seed = 12345)
tmp <- getMixturesPerGeno(getMixtureList(design$combs))
table(sapply(tmp, length)) # each genotype is in the same nb of mixtures -> balanced design
##
## 6
## 25
nbBlocks <- 3
levBlocks <- LETTERS[1:nbBlocks]
dat <- do.call(rbind, lapply(levBlocks, function(block) {
data.frame(
stand = paste0(design$combs$comp1, "_", design$combs$comp2),
block = block,
stringsAsFactors = TRUE
)
}))
str(dat)
## 'data.frame': 225 obs. of 2 variables:
## $ stand: Factor w/ 75 levels "geno01_geno03",..: 1 7 13 15 8 22 2 9 35 32 ...
## $ block: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
The design matrices can be generated with the mkZGMA and
mkZSMA functions. See also mkAllZSMA to make
the SMA design matrices all at once.
listContr <- list(block = "contr.sum")
X <- model.matrix(~ 1 + block, data = dat, contrasts = listContr)
Z_GMA <- mkZGMA(dat, "stand", sep = "_")
Z_SMA <- mkZSMA(dat, "stand", sep = "_", inc_SMA_ii = "no")
truth <- list(
"intercept" = 80,
"var_GMA" = 10,
"var_SMA" = 2,
"var_error" = 1
)
set.seed(1234)
truth[["blockEffs"]] <- sample(x = c(-1, 1), size = nbBlocks - 1, replace = TRUE) *
rnorm(n = nbBlocks - 1, mean = 3, sd = 5)
truth[["GMA"]] <- rnorm(n = nbGenos, mean = 0, sd = sqrt(truth$var_GMA))
truth[["SMA"]] <- rnorm(n = nbMixes, mean = 0, sd = sqrt(truth$var_SMA))
truth[["errors"]] <- rnorm(n = nrow(dat), mean = 0, sd = sqrt(truth$var_error))
y <- X %*% c(truth$intercept, truth$blockEffs) +
Z_GMA %*% truth$GMA + Z_SMA %*% truth$SMA +
truth$errors
dat$yield <- y[, 1]
boxplot(yield ~ block, data = dat, las = 1, main = "Simulated data")
Models 1 and 2 will be fitted and compared.
In this first vignette, only the lme4 package will be
used, but see the second vignette that shows how to use TMB
allowing to provide a genomic relationship matrix and to quantify the
uncertainty around estimates of variance components.
listZ1 <- list("GMA" = mkZGMA(dat, "stand", sep = "_"))
system.time(
fit1 <- fitGMASMA(yield ~ 1 + block, dat, listZ1, pkg = "lme4", contrasts = listContr)
)
## user system elapsed
## 0.146 0.004 0.155
listZ2 <- list(
"GMA" = mkZGMA(dat, "stand", sep = "_"),
"SMA" = mkZSMA(dat, "stand", sep = "_", inc_SMA_ii = "no")
)
system.time(
fit2 <- fitGMASMA(yield ~ 1 + block, dat, listZ2, pkg = "lme4", contrasts = , listContr)
)
## user system elapsed
## 0.174 0.000 0.175
BLUEs <- fixef(fit2)
data.frame(
"true" = c(truth$intercept, truth$blockEffs),
"estim" = BLUEs
)
## true estim
## (Intercept) 80.000000 83.278413
## blockB 4.387146 4.134128
## blockC 8.422206 -17.217861
estV <- as.data.frame(lme4::VarCorr(fit2))
data.frame(
"true" = c(truth$var_GMA, truth$var_SMA, truth$var_error),
"estim" = c(estV$vcov[estV$grp == "GMA"], estV$vcov[estV$grp == "SMA"], estV$vcov[estV$grp == "Residual"]),
row.names = c("GMA", "SMA", "error")
)
## true estim
## GMA 10 8.276224
## SMA 2 1.391127
## error 1 1.009792
BLUPs <- ranef(fit2)
cor(truth$GMA, BLUPs$GMA[, "(Intercept)"])
## [1] 0.8504539
cor(truth$SMA, BLUPs$SMA[, "(Intercept)"])
## [1] 0.7285536
op <- par(mfrow = c(1, 2))
for (MA in c("GMA", "SMA")) {
plot(BLUPs[[MA]][, "(Intercept)"], truth[[MA]],
xlab = paste0("BLUP(", MA, ")"), ylab = paste0("true ", MA),
main = "Accuracy with lme4", las = 1, pch = 19
)
abline(a = 0, b = 1, v = 0, h = 0, lty = 2)
abline(lm(truth[[MA]] ~ BLUPs[[MA]][, "(Intercept)"]), col = "red")
}
par(op)
fits <- list("mod1" = fit1, "mod2" = fit2)
t(summarizeGMASMAs(fits))
## mod1 mod2
## AIC 882.5341 823.1463
## R2 0.9800357 0.9922544
## RMSE 1.353726 0.8431985
## var.GMA 9.047029 8.276224
## var.SMA NA 1.391127
## var.SMAij NA NA
## var.SMAii NA NA
## var.err 2.067555 1.009792
As expected, model 2 is better than model 1 on this example (as it was used to simulate it).
See the article for more details:
t1 <- proc.time()
t1 - t0
## user system elapsed
## 3.262 0.096 3.372
print(sessionInfo(), locale = FALSE)
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Debian GNU/Linux 12 (bookworm)
##
## Matrix products: default
## BLAS: /usr/local/R/4.5.1/lib/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0 LAPACK version 3.11.0
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] plantmix_1.0.2 lme4_1.1-37 Matrix_1.7-3 ggplot2_3.5.2 knitr_1.50
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 jsonlite_2.0.0 dplyr_1.1.4 compiler_4.5.1
## [5] Rcpp_1.1.0 tidyselect_1.2.1 dichromat_2.0-0.1 jquerylib_0.1.4
## [9] splines_4.5.1 scales_1.4.0 boot_1.3-31 yaml_2.3.10
## [13] fastmap_1.2.0 lattice_0.22-7 R6_2.6.1 generics_0.1.4
## [17] igraph_2.1.4 rbibutils_2.3 MASS_7.3-65 tibble_3.3.0
## [21] nloptr_2.2.1 minqa_1.2.8 TMB_1.9.17 bslib_0.9.0
## [25] pillar_1.11.0 RColorBrewer_1.1-3 rlang_1.1.7 cachem_1.1.0
## [29] xfun_0.52 sass_0.4.10 cli_3.6.5 withr_3.0.2
## [33] magrittr_2.0.3 Rdpack_2.6.4 digest_0.6.37 grid_4.5.1
## [37] lifecycle_1.0.5 nlme_3.1-168 reformulas_0.4.1 vctrs_0.6.5
## [41] evaluate_1.0.4 glue_1.8.0 farver_2.1.2 rmarkdown_2.29
## [45] tools_4.5.1 pkgconfig_2.0.3 htmltools_0.5.8.1
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.
This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.