Welcome to ClientVPS Mirrors

Nonparametric model

Nonparametric model

library(serosv)
library(magrittr)

Local estimation by polynomial

Proposed model

Within the local polynomial framework, the linear predictor \(\eta(a)\) is approximated locally at one particular value \(a_0\) for age by a line (local linear, degree \(p=1\)) or a parabola (local quadratic, degree \(p=2\)).

For a general degree \(p\), the linear predictor for a neighbor of \(a_0\), labeled \(a_i\) is equivalent to the Taylor approximation

\[ \eta(a_i) = \eta(a_0) + \eta^{(1)}(a_0)(a_i - a_0) + \frac{\eta^{(2)}(a_0)}{2}(a_i - a_0)^2 + ... + \frac{\eta^{(p)}(a_0)}{p!}(a_i - a_0)^p \]

\(\eta(a_i)\) can be estimated by maximizing

\[ \Sigma_{i=1}^{N} \ell_i \{Y_i, g^{-1} (\beta_0 + \beta_1(a_i-a_0)+ \beta_2(a_i-a_0)^2 ... + \beta_p(a_i-a_0)^p) \} K_h(a_i - a_0) \]

Where:

The estimator for the \(k\)-th derivative of \(\eta(a_0)\), for \(k = 0,1,…,p\) (degree of local polynomial) is thus

\[ \hat{\eta}^{(k)}(a_0) = k!\hat{\beta}_k(a_0) \]

The estimator for the prevalence at age \(a_0\) is then given by

\[ \hat{\pi}(a_0) = g^{-1}\{ \hat{\beta}_0(a_0) \} \]

The estimator for the force of infection at age \(a_0\) by assuming \(p \ge 1\) is as followed

\[ \hat{\lambda}(a_0) = \hat{\beta}_1(a_0) \delta \{ \hat{\beta}_0 (a_0) \} \]

Refer to Chapter 7.1 of the book by Hens et al. (2012) for a more detailed explanation of the method.

Fitting data

mump <- mumps_uk_1986_1987

To fit a local estimation by polynomials, use lp_model() function with model parameters include:

lp <- lp_model(mump, kern="tcub", nn = 0.5, deg=2)
lp
#> Local polynomial model 
#> 
#> Input type:  aggregated 
#> Configs:  nn=0.5, bandwidth(h)=0, degree=2, kernel=tcub 
#> 
#> Call:
#> locfit(formula = y ~ lp(age, deg = deg, nn = nn, h = h), family = "binomial", 
#>     kern = kern)
#> 
#> Number of observations:          44 
#> Family:  Logistic 
#> Fitted Degrees of freedom:       6.463 
#> Residual scale:                  1
plot(lp)

The users can also provide h or nn as a numeric vector, in which case the package will use the parameter value that minimize the generalized cross-validation (GCV) criterion.

# ----- Tune nearest neighbor (nn) parameter
lp_model(mump, kern="tcub", nn = seq(0.2, 0.8, 0.1), deg=2) %>% 
  plot()


# ----- Tune bandwidth (h) parameter
lp_model(mump, kern="tcub", h = seq(5,25), deg=2) %>% 
  plot()

Alternatively, use plot_gcv() to show GCV curves for the nearest neighbor method (left) and constant bandwidth (right) for manual parameter selection.

plot_gcv(
   mump$age, mump$pos, mump$tot,
   nn_seq = seq(0.2, 0.8, by=0.1),
   h_seq = seq(5, 25, by=1)
 )

Based on the plot, we can then fit the model using the parameter value that give the lowest GCV (nn = 0.3 and h = 14)

lp_model(mump, kern="tcub", nn = 0.3, deg=2) %>% 
  plot()


lp_model(mump, kern="tcub", h = 14, deg=2)  %>% 
  plot()

Hens, Niel, Ziv Shkedy, Marc Aerts, Christel Faes, Pierre Van Damme, and Philippe Beutels. 2012. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. Statistics for Biology and Health. Springer New York. https://doi.org/10.1007/978-1-4614-4072-7.

Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.

This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.