According to the article Albert & Siddhartha (1993), a possible model is to assume the existence of an underlying latent variable related to our observed binary variable using the following proposition :
zij=αiX′iβj+W′iλj+ϵij, with ϵij∼N(0,1) ∀ij and such as : yij={1 if zij>00 otherwise.⇒{yij|zij∼Bernoulli(θij) with θij=Φ(αi+X′iβj+W′iλj)where Φ correspond to the repartition functionof the reduced centred normal distribution.
P(yij=1)=P(zij>0)=P(αi+X′iβj+W′iλj+ϵij>0)=P(ϵij>−(αi+X′iβj+W′iλj) )=P(ϵij≤αi+X′iβj+W′iλj)=Φ(αi+X′iβj+W′iλj)
In the same way:
P(yij=0)=P(zij≤0)=P(ϵij≤−(αi+X′iβj+W′iλj) )=P(ϵij>αi+X′iβj+W′iλj)=1−Φ(αi+X′iβj+W′iλj)
with the following parameters and priors :
Latent variables: Wi=(Wi1,…,Wiq) where q is the number of latent variables considered, which has to be fixed by the user (by default q=2). We assume that Wi∼N(0,Iq) and we define the associated coefficients: λj=(λj1,…,λjq)′. We use a prior distribution N(μλ,Vλ) for each lambda not concerned by constraints to 0 on upper diagonal and to strictly positive values on diagonal.
Explanatory variables:
The corresponding regression coefficients for each species j are noted : βj=(βj0,βj1,…,βjp)′ where βj0 represents the intercept for species j which is assume to be a fixed effect.
αi represents the random effect of site i such as αi∼N(0,Vα) and we assumed that Vα∼IG(shape=0.5,rate=0.005) as prior distribution by default.
We go back to a model of the form: Z′=Xβ+ϵ to estimate the posterior distributions of betas, lambdas and latent variables Wi of the model. For example concerning λj, we define Z′ij=Zij−αi−X′iβj such as Z′ij=W′iλj+ϵij so Z′ij | Wi , λj ∼N(W′iλj,1).
In this case we can use the following proposition:
{Y | β∼Nn(Xβ,In)β∼Np(m,V)⇒{β|Y∼Np(m∗,V∗) with m∗=(V−1+X′X)−1(V−1m+X′Y)V∗=(V−1+X′X)−1.
p(β | Y)∝p(Y | β) p(β)∝1(2π)n2exp(−12(Y−Xβ)′(Y−Xβ))1(2π)p2|V|12exp(−12(β−m)′V−1(β−m))∝exp(−12((β−m)′V−1(β−m)+(Y−Xβ)′(Y−Xβ)))∝exp(−12(β′V−1β+m′V−1m−m′V−1β−β′V−1m+Y′Y+β′X′Xβ−Y′Xβ−β′X′Y))∝exp(−12(β′(V−1+X′X)β−β′(V−1m+X′Y)−(Y′X+m′V−1)β+m′V−1m+Y′Y))∝exp(−12(β′(V−1+X′X)β−β′(V−1m+X′Y)−(X′Y+V−1m)′β+m′V−1m+Y′Y))∝exp(−12(β−(V−1+X′X)−1(V−1m+X′Y))′(V−1+X′X)(β−(V−1+X′X)−1(V−1m+X′Y))−(V−1m+X′Y)′(V−1+X′X)−1(V−1m+X′Y)+m′V−1m+Y′Y)∝exp(−12(β−(V−1+X′X)−1(V−1m+X′Y)⏟m∗)′(V−1+X′X)⏟V∗−1(β−(V−1+X′X)−1(V−1m+X′Y)))
Actually, we use that proposition to estimate betas, lambdas and gammas if species traits data are provided.
About the posterior distribution of the random site effects (αi)i=1,…,nsite, we can use a transformation of the form Z′ij=αi+ϵij, with Z′ij=Zij−W′iλj−X′iβj so Z′ij | Wi, λj, βj, αi ∼N(αi,1). We then use the following proposition:
{x | θ∼N(θ, σ2)θ∼N(μ0,τ02)σ2 known⇒{θ| x∼N(μ1,τ12) with μ1=τ02μ0+xσ2τ0−2+σ−2τ1−2=τ0−2+σ−2.
p(θ | x)∝p(x | θ) p(θ)∝1(2πσ2)12exp(−12σ2(x−θ)2)1(2πτ02)12exp(−12τ02(θ−μ0)2)∝exp(−12τ02(θ−μ0)2−12σ2(x−θ)2)∝exp(−12τ02(θ2−2μ0θ)−12σ2(θ2−2xθ))∝exp(−12(θ2(τ0−2+σ−2)−2μ0θτ0−2−2xθσ−2))∝exp(−12(τ0−2+σ−2)−1(θ2−2θμ0τ0−2+xσ−2τ0−2+σ−2))
Concerning posterior distribution of Vα, the variance of random site
effects (αi)i=1,…,nsite, we use the
following proposition :
If {x | σ2∼Nn(θ, σ2In)σ2∼IG(a,b)θ known⇒{σ2|x∼IG(a′,b′) with a′=a+n2b′=12n∑i=1(xi−θ)2+b.
p(σ2 | x)∝p(x | σ2) p(σ2)∝1(2πσ2)n2exp(−12σ2(x−θ)′(x−θ))baΓ(a)(σ2)−(a+1)exp(−bσ2)∝(σ2)−(n2+a⏟a′+1)exp(−1σ2(b+12n∑i=1(xi−θ)2)⏟b′)
In the Bayesian framework, Gibbs’ algorithm produces a realization of the parameter θ=(θ1,…,θm) according to the a posteriori law Π(θ | x) as soon as we are able to express the conditional laws: Π(θi|θ1,…,θi−1,θi+1,…,θm,x) for i=1,…,m.
Gibbs sampling consists of:
Initialization : arbitrary choice of θ(0)=(θ(0)1,…,θ(0)m).
Iteration t : Genererate θ(t) as follows :
θ(t)1∼Π(θ1 |θ(t−1)2,…,θ(t−1)m,x)
θ(t)2∼Π((θ2 | (θ(t)1,θ(t−1)3,…,θ(t−1)m,x)
θ(t)m∼Π(θm | θ(t)1,…,θ(t)m−1,x)
Successive iterations of this algorithm generate the states of a Markov chain {θ(t),t>0} to values in Rm, we show that this chain admits an invariant measure which is the a posteriori law.
For a sufficiently large number of iterations, the vector θ obtained can thus be considered as a realization of the joint a posteriori law Π(θ | x).
Consequently, the implementation of a Gibbs sampler requires the knowledge of the a posteriori distributions of each of the parameters conditionally to the other parameters of the model, which can be deduced from the conjugated priors formulas in the case of the probit model but are not explicitly expressible in the case where a logit or log link function is used.
The algorithm used in jSDM_binomial_probit()
function to
estimate the parameters of the probit model is therefore as follows:
Initialize all parameters to 0 for example, except the diagonal values of Λ initialized at 1 and V(0)α=1.
Gibbs sampler: at each iteration t for t=1,…,NGibbs we repeat each of these steps :
Generate the latent variable Z(t)=(Z(t)ij)j=1,…,Ji=1,…,I such that Z(t)ij∼{N(α(t−1)i+Xiβ(t−1)j+W(t−1)iλ(t−1)j, 1) right truncated by 0 if yij=0N(α(t−1)i+Xiβ(t−1)j+W(t−1)iλ(t−1)j, 1) left truncated by 0 if yij=1 , the latent variable is thus initialized at the first iteration by generating it according to these centered normal laws.
If species traits data are provided, generate the effects of species-specific traits on species’ responses γ(t)=(γ(t)rk)r=0,…,ntk=0,…,p such as : γ(t)rk |β(t−1)1k,…,β(t−1)Jk∼N(m⋆,V⋆), with m⋆=(V−1γrk+T′rTr)−1(V−1γrkμγrk+Tr(β(t−1)k−∑r′≠rTr′γ(t−1)r′k) and V⋆=(V−1γrk+T′rTr)−1.
Generate the fixed species effects β(t)j=(β(t)j0,β(t)j1,…,β(t)jp)′ for j=1,…,J such as : β(t)j | Z(t),W(t−1)1,α(t−1)1,…,W(t−1)I,α(t−1),,λ(t−1)j1,…,λ(t−1)jqI∼Np+1(m⋆,V⋆), with m⋆=(V−1β+X′X)−1(V−1βμβj+X′Z⋆j) and V⋆=(V−1β+X′X)−1, where Z⋆j=(Z⋆1j,…,Z⋆Ij)′ such as Z⋆ij=Z(t)ij−W(t−1)iλ(t−1)j−α(t−1)i.
Generate the the loading factors related to latent variables λ(t)j=(λ(t)j1,…,λ(t)jq)′ for j=1,…,J such as : λ(t)jl | Z(t),β(t)j,α(t−1),W(t−1),λ(t−1)1,…,λ(t−1)l−1,λ(t−1)l+1,…,λ(t−1)q∼N(m⋆,V⋆), with m⋆=(V−1λ+W(t−1)l′W(t−1)l)−1(V−1λμλ+W(t−1)l′Z⋆j) and V⋆=(V−1λ+W(t−1)l′W(t−1)l)−1, where Z⋆j=(Z⋆1j,…,Z⋆Ij)′ such as Z⋆ij=Z(t)ij−Xiβ(t)j−α(t−1)i−∑l′≠lWil′λjl′. In order to constrain the diagonal values of Λ=(λjl)l=1,…,qj=1,…,J to positive values and make the matrix lower triangular, the values of λ(t)j are simulated according to the following conditions: λ(t)jl∼{P such as P(λjl=0)=1 if l>j,N(m⋆,V⋆) left truncated by 0 if l=j,N(m⋆,V⋆) if l<j.
Generate the latent variables (or unmeasured predictors) W(t)i for i=1,…,I according to : W(t)i | Z(t),λ(t),β(t),α(t−1)i∼Nq((Iq+Λ(t)′Λ(t))−1(Λ(t)′Z⋆i),(Iq+Λ(t)′Λ(t))−1), where Z⋆i=(Z⋆i1,…,Z⋆iJ) such as Z⋆ij=Z(t)ij−α(t−1)i−Xiβ(t)j.
Generate the random site effects α(t)i for i=1,…,I selon : αi| Z(t),λ(t),β(t),W(t)i∼N(∑Jj=1Z(t)ij−Xiβ(t)j−W(t)iλ(t)jV(t−1)α−1+J,(1V(t−1)α+J)−1)
Generate the variance of random site effects V(t)α according to: V(t)α | α(t)1,…,α(t)I∼IG(shape=0.5+I2,rate=0.005+12I∑i=1(α(t)i)2)
In the same way as for the probit model, the logit model can be
defined by means of a latent variable: Zij=αi+Xiβj+Wiλj+ϵij for i=1,…,I et j=1,…,J, with ϵij∼logistique(0,1) iid and such as: yij={1 if Zij>00 else However in this case the a priori
distributions of the latent variable and the parameters are not
conjugated, we are not able to use the properties of the conjugated
priors, so modelling using a latent variable is irrelevant.
In this case it is assumed that yij |θij∼Binomial(ni,θij), with
probit(θij)=αi+Xiβj+Wiλj and ni the number of visits to the site
i.
Therefore, the parameters of this model will be sampled by estimating
their conditional a posteriori distributions using an adaptive
Metropolis algorithm.
An a priori distribution is determined for each parameter of
the model :
Vα∼IG(shape=0.5,rate=0.005) with rate=1scale,βjk∼{N(μβjk,Vβk) for j=1,…,J and k=0,…,p,if species traits data are
provided where μβjk=∑ntr=0tjr.γrk and γrk∼N(μγrk,Vγrk) for r=0,…,nt and k=0,…,p.N(μβk,Vβk) for j=1,…,J and k=0,…,p,if species traits data are not
providedλjl∼{N(μλl,Vλl)if l<jN(μλl,Vλl) left truncated by 0if l=jP such as P(λjl=0)=1if l>j for j=1,…,J and l=1,…,q.
This algorithm belongs to the MCMC methods and allows to obtain a
realization of the parameter θ=(θ1,…,,θm)
according to their conditional a posteriori distributions Π(θi|θ1,…,θi−1,θi+1,…,θm,x),
for i=1,…,m known to within a
multiplicative constant.
It is called adaptive because the variance of the conditional
instrumental density used is adapted according to the number of
acceptances in the last iterations.
Initialization : θ(0)=(θ(0)1,…,θ(0)m) arbitrarily set, the acceptance numbers (nAi)i=1,…,m are initialized at 0 and the variances (σ2i)i=1,…,m are initialized at 1.
Iteration t : for i=1,…,m
Generate θ⋆i∼q(θ(t−1)i,.), with conditional instrumental density q(θ(t−1)i,θ⋆i) symmetric, we will choose a law N(θ(t−1)i,σ2i) for example.
Calculate the probability of acceptance : γ=min(1,Π(θ⋆i | θ(t−1)1,…,θ(t−1)i−1,θ(t−1)i+1,…,θ(t−1)m,x)Π(θ(t−1)i | θ(t−1)1,…,θ(t−1)i−1,θ(t−1)i+1,…,θ(t−1)m,x)).
θ(t)i={θ⋆i with probability γ if we are in this case the acceptance number becomes : nAi←nAi+1θ(t−1)i with probability 1−γ.
During the burn-in, every DIV iteration, with DIV={100 if NGibbs≥1000NGibbs10 else , where NGibbs is the total number of
iterations performed.
The variances are modified as a function of the acceptance numbers as
follows for i=1,…,m :
The acceptance rate is calculated : rAi=nAiDIV.
The variances are adapted according to the acceptance rate and a fixed constant Ropt : σi←{σi(2−1−rAi1−Ropt) if rAi≥Roptσi2−1−rAi1−Ropt else
We reset the acceptance numbers : nAi←0.
Every NGibbs10 iteration, average acceptance rates are calculated and displayed mA=1m∑i=1,…,mrAi.
An adaptive Metropolis algorithm is used to sample the model parameters according to their conditional a posteriori distributions estimated to within one multiplicative constant.
First we define the f function
that calculates the likelihood of the model as a function of the
estimated parameters:
f:λj,βj,αi,Wi,Xi,yij,ni→f(λj,βj,αi,Wi,Xi,yij,ni)=L(θij) - Compute logit(θij)=αi+Xiβj+Wiλj.
Compute θij=11+exp(−logit(θij)).
Return \mathrm{L}(\theta_{ij})= p(y_{ij} \ | \ \theta_{ij},n_i)= \dbinom{n_i}{y_{ij}}(\theta_{ij})^{y_{ij}}(1-\theta_{ij})^{n_i-y_{ij}}.
We repeat those steps for i=1,\ldots,I et j=1,\ldots,J, and then we define \theta = \left(\theta{ij}\right)_{i=1,\ldots
I}^{j= 1,\ldots,J}.
This allows us to calculate the likelihood of the model: \mathrm{L}(\theta)= \prod\limits_{\substack{1\leq
i\leq I \\ 1 \leq j\leq I}}\mathrm{L}(\theta_{ij}).
According to Bayes’ formula we have \mathrm{p}(\theta \ | \ Y) \propto \Pi(\theta) \mathrm{L}(\theta). We thus use the following relations to approach the conditional a posteriori densities of each of the parameters with \Pi(.) the densities corresponding to their a priori laws. \begin{aligned} & p(\beta_{jk} \ | \ \beta_{j0},\beta_{j1},\ldots,\beta_{jk-1},\beta_{jk+1},\ldots,\beta_{jp}, \lambda_j,\alpha_1,\ldots,\alpha_I, W_1,\ldots,W_I,Y) \propto \Pi(\beta_{jk})\prod\limits_{1\leq i\leq I} \mathrm{L}(\theta_{ij})\\ &p(\lambda_{jl} \ | \ \lambda_{j1},\ldots,\lambda_{jl-1},\lambda_{jl+1},\ldots,\lambda_{jq}, \beta_j,\alpha_1,\ldots,\alpha_I, W_1,\ldots,W_I,Y) \propto \Pi(\lambda_{jl}) \prod\limits_{1\leq i \leq I}\mathrm{L}(\theta_{ij})\\ &p(W_{il} \ | \ W_{i1},\ldots,W_{il-1},W_{il+1},\ldots,W_{iq},\alpha_i,\beta_1,\ldots,\beta_J,\lambda_1,\ldots, \lambda_J,Y) \propto \Pi(W_{il}) \prod\limits_{1\leq j\leq J}\mathrm{L}(\theta_{ij})\\ &p(\alpha_i \ | \ W_i,\beta_1,\ldots,\beta_J,\lambda_1,\ldots, \lambda_j,V_{\alpha},Y) \propto \Pi(\alpha_i \ | \ V_{\alpha}) \prod\limits_{1\leq j\leq J}\mathrm{L}(\theta_{ij})\\ & \text{, for $i=1,\ldots,I$, $j=1,\ldots,J$, $k=1,\ldots,p$ and $l=1,\ldots,q$. } \end{aligned}
The algorithm implemented in jSDM_binomial_logit()
on
the basis of Rosenthal (2009) and Roberts & Rosenthal (2001) articles, to estimate the
parameters of the logit model is the following :
Definition of constants N_{Gibbs}, N_{burn}, N_{thin} and R_{opt} such that N_{Gibbs} corresponds to the number of
iterations performed by the algorithm, N_{burn} to the number of iterations
required for the burn-in or warm-up time,
N_{samp}=
\dfrac{N_{Gibbs}-N_{burn}}{N_{thin}} corresponding to the
number of estimated values retained for each parameter. Indeed we record
the estimated parameters at certain iterations in order to obtain N_{samp} values, allowing us to represent
a a \ posteriori distribution for
each parameter.
We set R_{opt} the optimal
acceptance ratio used in the adaptive Metropolis algorithms implemented
for each parameter of the model.
Initialize all parameters to 0 for example, except the diagonal values of \Lambda initialized at 1 and V_{\alpha}^{(0)}=1. The acceptance number of each parameter is initialized to 0 and the variances of their conditional instrumental densities take the value 1.
Gibbs sampler at each iteration t for t=1,\ldots,N_{Gibbs} we repeat each of these steps:
Generate the random site effects \alpha_i^{(t)} for i=1,\ldots,I according to an adaptive
Metropolis algorithm that simulates \alpha_i^\star \sim
\mathcal{N}(\alpha_i^{(t-1)},\sigma_{\alpha_i}^2) and then
calculates the acceptance rate as follows:
\gamma =min\left(1, \
\dfrac{\Pi\left(\alpha_i^\star \ | \
V_{\alpha}^{(t-1)}\right)\prod\limits_{1\leq j\leq
J}\left(\alpha_i^\star, W_i^{(t-1)},\beta_j^{(t-1)}, \lambda_j^{(t-1)},
X_i,y_{ij},n_i\right)}{\Pi\left(\alpha_i^{(t-1)} \ | \
V_{\alpha}^{(t-1)}\right)\prod\limits_{1\leq j\leq
J}f\left(\alpha_i^{(t-1)}, W_i^{(t-1)},\beta_j^{(t-1)},
\lambda_j^{(t-1)}, X_i,y_{ij},n_i\right)}\right).
Generate the variance of random site effects V_\alpha^{(t)} according to: V_\alpha^{(t)} \ | \ \alpha_1^{(t)},\ldots,\alpha_I^{(t)} \sim \mathcal{IG}\left( \text{shape}=0.5 + \frac{I}{2}, \text{rate}=0.005 + \frac{1}{2}\sum\limits_{i=1}^I \left(\alpha_i^{(t)}\right)^2\right)
Generate the latent variables (or unmeasured predictors) W_{il}^{(t)} for i=1,\ldots,I and l=1,\ldots,q according to an adaptive Metropolis algorithm that simulates W_{il}^\star \sim \mathcal{N}(W_{il}^{(t-1)}, \sigma_{W_{il}}^2)and then calculates the acceptance rate as follows:
\gamma = min\left(1,\ \dfrac{\Pi\left(W_{il}^\star\right)\prod\limits_{1\leq j\leq J}f\left(W_{il}^\star, \alpha_i^{(t)},\beta_j^{(t-1)}, \lambda_j^{(t-1)},X_i,y_{ij},n_i\right)} {\Pi\left(W_{il}^{(t-1)}\right)\prod\limits_{1\leq j\leq J}f\left(W_{il}^{(t-1)}, \alpha_i^{(t)},\beta_j^{(t-1)}, \lambda_j^{(t-1)}, X_i,y_{ij},n_i\right)}\right). * If species traits data are provided, generate the effects of species-specific traits on species’ responses \gamma^{(t)}=\left(\gamma_{rk}^{(t)}\right)^{r=0,\ldots,nt}_{k=0,\ldots,p} such as : \gamma_{rk}^{(t)} \ | \beta_{1k}^{(t-1)}, \ldots, \beta_{Jk}^{(t-1)} \sim \mathcal{N}(m^\star,V^\star) \text{, with } m^\star = (V_{\gamma_{rk}}^{-1} + T_r'T_r)^{-1}(V_{\gamma_{rk}}^{-1}\mu_{\gamma_{rk}} + T_r\left(\beta_k^{(t-1)} - \sum\limits_{r' \neq r} T_{r'} \gamma_{r'k}^{(t-1)} \right) \text{ and } V^\star = \left(V_{\gamma_{rk}}^{-1}+ T_r'T_r\right)^{-1}.
\gamma = min\left(1,\dfrac{\Pi\left(\beta_{jk}^\star\right)\prod\limits_{1\leq i\leq I}f\left(\beta_{j0}^{(t)},\small{\ldots},\beta_{jk-1}^{(t)},\beta_{jk}^\star,\beta_{jk+1}^{(t-1)},\small{\ldots}, \beta_{jp}^{(t-1)},\lambda_j^{(t-1)}, \alpha_1^{(t)},W_1^{(t)},\small{\ldots},\alpha_I^{(t)}, W_I^{(t)},X_i,y_{ij},n_i\right)} {\Pi\left(\beta_{jk}^{(t-1)}\right)\prod\limits_{1\leq i\leq I}f\left(\beta_{j0}^{(t)},\small{\ldots},\beta_{jk-1}^{(t)},\beta_{jk}^{(t-1)},\beta_{jk+1}^{(t-1)},\small{\ldots}, \beta_{jp}^{(t-1)},\lambda_j^{(t-1)}, \alpha_1^{(t)},W_1^{(t)}, \small{\ldots},\alpha_I^{(t)}, W_I^{(t)},X_i,y_{ij},n_i\right)}\right).
According to the article Hui (2016), we can use the Poisson distribution for the analysis of multivariate abundance data, with estimation performed using Bayesian Markov chain Monte Carlo methods.
In this case, it is assumed that y_{ij} \sim \mathcal{P}oisson(\theta_{ij}), with \mathrm{log}(\theta_{ij}) = \alpha_i + X_i\beta_j+ W_i\lambda_j.
We therefore consider abundance data with a response variable noted : Y=(y_{ij})^{i=1,\ldots,nsite}_{j=1,\ldots,nsp} such as :
y_{ij}=\begin{cases} 0 & \text{if species $j$ has been observed as absent at site $i$}\\ n & \text{if $n$ individuals of the species $j$ have been observed at the site $i$}. \end{cases}
In this case, we cannot use the properties of the conjugate priors, therefore, the parameters of this model will be sampled by estimating their conditional a posteriori distributions using an adaptive Metropolis algorithm in the Gibbs sampler, in the same way as for the logit model.
We use the same algorithm as before by replacing the logit link
function by a log link function and the binomial distribution by a
poisson’s law to calculate the likelihood of the model in the function
jSDM_poisson_log()
.