README

The dsmmR R package allows the user to estimate, simulate and define different Drifting semi-Markov model (DSMM) specifications.

Installation

# Install the released version from CRAN
install.packages('dsmmR')
# Or the development version from GitHub
# install.packages("devtools")
devtools::install_github("Mavrogiannis-Ioannis/dsmmR")

High-level documentation

Theory overview

Drifting semi-Markov models are best suited to capture non-homogeneities which evolve in a linear (or polynomial) way. For example, through this approach we account for non-homogeneities that occur from the intrinsic evolution of the system or from the interactions between the system and the environment.

For a detailed introduction in Drifting semi-Markov models consider the documentation through ?dsmmR.

For an extensive description of this approach, consider visiting the complete documentation of the package on the official CRAN page.

Estimation

The easiest way to use dsmmR is through the main function dsmm_fit() in the non-parametric case. This function can estimate a Drifting semi-Markov model from a sequence of states (i.e. a character vector in R). Example data is included in the package, defined in the DNA sequence lambda. Also some parameters need to be specified before using dsmm_fit(), most notably the polynomial degree and the model of our choice. The model is chosen by defining whether the sojourn times f and the transition matrices p are drifting or not.

# Loading the package
library(dsmmR)

# Obtaining the sequence
data("lambda", package = "dsmmR")
sequence <- c(lambda)

# Obtaining the states
states <- sort(unique(sequence))

# Defining the polynomial degree
degree <- 1 # we define a linear evolution in time (state jumps of the embedded Markov chain)

# Defining the model 
f_is_drifting <- TRUE # sojourn time distributions are drifting in time (state jumps of the EMC)
p_is_drifting <- FALSE # transition matrices are not drifting in time (state jumps of the EMC)
# When f is drifting and p is not drifting, we have Model 3.

# Fitting the drifting semi-Markov model on the sequence.
fitted_model <- fit_dsmm(sequence = sequence,
                         states = states,
                         degree = degree,
                         f_is_drifting = f_is_drifting,
                         p_is_drifting = p_is_drifting)

For more details about the estimation, consider viewing the extended documentation through ?fit_dsmm.

Simulation

After fitting a DSMM (or defining it through nonparametric_dsmm() or parametric_dsmm()), we can simulate a sequence from that DSMM. This is pretty straightforward:

sim_seq <- simulate(fitted_model)

Since we follow an object oriented approach, providing the previous object fitted_model is the only necessary attribute.

Drifting semi-Markov kernel

In order to account for the dimension of the DSM kernel, a separate function was necessary. You can obtain the DSM kernel through the command:

kernel <- get_kernel(fitted_model)

The dimensionality of the DSM kernel can be reduced further through the attributes of the function.

Defining drifting semi-Markov models

We can put together all the previous concepts in the showcase of parametric estimation. First, we will define the drifting transition matrices and the drifting sojourn time distributions. Then, we will create a dsmm_parametric object, we will simulate a sequence from it and then finally we will estimate a drifting semi-Markov model from that simulated sequence.

For more information, consider the documentation through ?parametric_dsmm and ?nonparametric_dsmm.

library(dsmmR)

states <- c("a", "b", "c")
s <- length(states)
degree <- 1

p_dist_1 <- matrix(c(0,   0.4,  0.6,
                     0.5, 0,    0.5,
                     0.3, 0.7,  0   ), ncol = s, byrow = TRUE)
p_dist_2 <- matrix(c(0,   0.55, 0.45,
                     0.25, 0,   0.75,
                     0.5, 0.5,  0   ), ncol = s, byrow = TRUE)
p_dist <- array(c(p_dist_1, p_dist_2), dim = c(s, s, degree + 1))

Let us also consider the case where only the parameters of the distributions modeling the sojourn times are drifting across the sequence. Note that distributions like the Negative Binomial and the Discrete Weibull require two parameters, which we define in two matrices for each distribution.

f_dist_1 <- matrix(c(NA,   "nbinom",   "unif",
                   "geom",  NA,        "pois",
                   "pois", "dweibull",  NA   ), nrow = s, ncol = s, byrow = TRUE)
f_dist_1_pars_1 <- matrix(c(NA,  4,   3,
                            0.7, NA,  5,
                            3,   0.6, NA), nrow = s, ncol = s, byrow = TRUE)
f_dist_1_pars_2 <- matrix(c(NA,  0.5, NA,
                            NA,  NA,  NA,
                            NA,  0.8, NA), nrow = s, ncol = s, byrow = TRUE)
f_dist_2 <- f_dist_1 
f_dist_2_pars_1 <- matrix(c(NA,  3,   5,
                            0.3, NA,  2,
                            5,   0.3, NA), nrow = s, ncol = s, byrow = TRUE)
f_dist_2_pars_2 <- matrix(c(NA,  0.4, NA,
                            NA,  NA,  NA,
                            NA,  0.5, NA), nrow = s, ncol = s, byrow = TRUE)

f_dist <- array(c(f_dist_1, f_dist_2), dim = c(s, s, degree + 1))
f_dist_pars <- array(c(f_dist_1_pars_1, f_dist_1_pars_2,
                       f_dist_2_pars_1, f_dist_2_pars_2), 
                     dim = c(s, s, 2, degree + 1))

Then, defining a dsmm_parametric object is done simply through the function parametric_dsmm():

dsmm_model <- parametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.6, 0.3, 0.1),
    degree = degree,
    p_dist = p_dist,
    f_dist = f_dist,
    f_dist_pars = f_dist_pars,
    p_is_drifting = TRUE,
    f_is_drifting = TRUE
)

sim_seq <- simulate(dsmm_model, klim = 30, seed = 1)

fitted_model <- fit_dsmm(sequence = sim_seq,
                         states = states,
                         degree = degree,
                         f_is_drifting = TRUE,
                         p_is_drifting = TRUE,
                         estimation = 'parametric',
                         f_dist = f_dist)

print(fitted_model$dist$p_drift, digits = 2)

, , p_0

     a    b    c
a 0.00 0.40 0.60
b 0.51 0.00 0.49
c 0.27 0.73 0.00

, , p_1

     a    b    c
a 0.00 0.54 0.46
b 0.23 0.00 0.77
c 0.51 0.49 0.00

print(fitted_model$dist$f_drift_parameters, digits = 2)

, , 1, fpars_0

     a    b   c
a   NA 3.66 3.0
b 0.65   NA 4.8
c 3.09 0.62  NA

, , 2, fpars_0

   a    b  c
a NA 0.46 NA
b NA   NA NA
c NA 0.84 NA

, , 1, fpars_1

     a    b   c
a   NA 2.74 5.0
b 0.31   NA 2.1
c 5.02 0.29  NA

, , 2, fpars_1

   a    b  c
a NA 0.38 NA
b NA   NA NA
c NA 0.50 NA

README

Developer Version

1.0.6

dsmmR

Installation

High-level documentation

Theory overview

Estimation

Simulation

Drifting semi-Markov kernel

Defining drifting semi-Markov models

Further reading

Community Guidelines

Notes

References

Acknowledgements