---
title: "Augmented Dynamic Adaptive Model"
author: "Ivan Svetunkov"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Augmented Dynamic Adaptive Model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: library.bib
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align="center",
  fig.height=4,
  fig.width=6,
  fig.path='Figs/',
  fig.show='hold',
  warning=FALSE,
  message=FALSE
)
```

This vignette explains briefly how to use the function `adam()` and the related `auto.adam()` in `smooth` package. It does not aim at covering all aspects of the function, but focuses on the main ones.

ADAM is Augmented Dynamic Adaptive Model. It is a model that underlies ETS, ARIMA and regression, connecting them in a unified framework. The underlying model for ADAM is a Single Source of Error state space model, which is explained in detail separately in an [online textbook](https://openforecast.org/adam/).

The main philosophy of `adam()` function is to be agnostic of the provided data. This means that it will work with `ts`, `msts`, `zoo`, `xts`, `data.frame`, `numeric` and other classes of data. The specification of seasonality in the model is done using a separate parameter `lags`, so you are not obliged to transform the existing data to something specific, and can use it as is. If you provide a `matrix`, or a `data.frame`, or a `data.table`, or any other multivariate structure, then the function will use the first column for the response variable and the others for the explanatory ones. One thing that is currently assumed in the function is that the data is measured at a regular frequency. If this is not the case, you will need to introduce missing values manually.

In order to run the experiments in this vignette, we need to load the following packages:

```{r load_libraries, message=FALSE, warning=FALSE}
require(greybox)
require(smooth)
```

## ADAM ETS
First and foremost, ADAM implements ETS model, although in a more flexible way than [@Hyndman2008b]: it supports different distributions for the error term, which are regulated via `distribution` parameter. By default, the additive error model relies on Normal distribution, while the multiplicative error one assumes Inverse Gaussian. If you want to reproduce the classical ETS, you would need to specify `distribution="dnorm"`. Here is an example of ADAM ETS(MMM) with Normal distribution on `AirPassengers` data:

```{r}
testModel <- adam(AirPassengers, "MMM", lags=c(1,12), distribution="dnorm",
                  h=12, holdout=TRUE)
summary(testModel)
plot(forecast(testModel,h=12,interval="prediction"))
```

You might notice that the summary contains more than what is reported by other `smooth` functions. This one also produces standard errors for the estimated parameters based on Fisher Information calculation. Note that this is computationally expensive, so if you have a model with more than 30 variables, the calculation of standard errors might take plenty of time. As for the default `print()` method, it will produce a shorter summary from the model, without the standard errors (similar to what `es()` does):

```{r}
testModel
```

Also, note that the prediction interval in case of multiplicative error models are approximate. It is advisable to use simulations instead (which is slower, but more accurate):

```{r}
plot(forecast(testModel,h=18,interval="simulated"))
```

If you want to do the residuals diagnostics, then it is recommended to use `plot` function, something like this (you can select, which of the plots to produce):

```{r}
par(mfcol=c(3,4))
plot(testModel,which=c(1:11))
par(mfcol=c(1,1))
plot(testModel,which=12)
```

By default ADAM will estimate models via maximising likelihood function. But there is also a parameter `loss`, which allows selecting from a list of already implemented loss functions (again, see documentation for `adam()` for the full list) or using a function written by a user. Here is how to do the latter on the example of `BJsales`:

```{r}
lossFunction <- function(actual, fitted, B){
  return(sum(abs(actual-fitted)^3))
}
testModel <- adam(BJsales, "AAN", silent=FALSE, loss=lossFunction,
                  h=12, holdout=TRUE)
testModel
```

Note that you need to have parameters actual, fitted and B in the function, which correspond to the vector of actual values, vector of fitted values on each iteration and a vector of the optimised parameters.

`loss` and `distribution` parameters are independent, so in the example above, we have assumed that the error term follows Normal distribution, but we have estimated its parameters using a non-conventional loss because we can. Some of distributions assume that there is an additional parameter, which can either be estimated or provided by user. These include Asymmetric Laplace (`distribution="dalaplace"`) with `alpha`, Generalised Normal and Log-Generalised normal (`distribution=c("gnorm","dlgnorm")`) with `shape` and Student's T (`distribution="dt"`) with `nu`:
```{r}
testModel <- adam(BJsales, "MMN", silent=FALSE, distribution="dgnorm", shape=3,
                  h=12, holdout=TRUE)
```

The model selection in ADAM ETS relies on information criteria and works correctly only for the `loss="likelihood"`. There are several options, how to select the model, see them in the description of the function: `?adam()`. The default one uses branch-and-bound algorithm, similar to the one used in `es()`, but only considers additive trend models (the multiplicative trend ones are less stable and need more attention from a forecaster):

```{r}
testModel <- adam(AirPassengers, "ZXZ", lags=c(1,12), silent=FALSE,
                  h=12, holdout=TRUE)
testModel
```

Note that the function produces point forecasts if `h>0`, but it won't generate prediction interval. This is why you need to use `forecast()` method (as shown in the first example in this vignette).

Similarly to `es()`, function supports combination of models, but it saves all the tested models in the output for a potential reuse. Here how it works:

```{r}
testModel <- adam(AirPassengers, "CXC", lags=c(1,12),
                  h=12, holdout=TRUE)
testForecast <- forecast(testModel,h=18,interval="semiparametric", level=c(0.9,0.95))
testForecast
plot(testForecast)
```

Yes, now we support vectors for the levels in case you want to produce several. In fact, we also support side for prediction interval, so you can extract specific quantiles without a hustle:

```{r}
forecast(testModel,h=18,interval="semiparametric", level=c(0.9,0.95,0.99), side="upper")
```

A brand new thing in the function is the possibility to use several frequencies (double / triple / quadruple / ... seasonal models). 
In order to show how it works, we will generate an artificial time series, inspired by half-hourly electricity demand using `sim.gum()` function:

```{r}
ordersGUM <- c(1,1,1)
lagsGUM <- c(1,48,336)
initialGUM1 <- -25381.7
initialGUM2 <- c(23955.09, 24248.75, 24848.54, 25012.63, 24634.14, 24548.22, 24544.63, 24572.77,
                 24498.33, 24250.94, 24545.44, 25005.92, 26164.65, 27038.55, 28262.16, 28619.83,
                 28892.19, 28575.07, 28837.87, 28695.12, 28623.02, 28679.42, 28682.16, 28683.40,
                 28647.97, 28374.42, 28261.56, 28199.69, 28341.69, 28314.12, 28252.46, 28491.20,
                 28647.98, 28761.28, 28560.11, 28059.95, 27719.22, 27530.23, 27315.47, 27028.83,
                 26933.75, 26961.91, 27372.44, 27362.18, 27271.31, 26365.97, 25570.88, 25058.01)
initialGUM3 <- c(23920.16, 23026.43, 22812.23, 23169.52, 23332.56, 23129.27, 22941.20, 22692.40,
                 22607.53, 22427.79, 22227.64, 22580.72, 23871.99, 25758.34, 28092.21, 30220.46,
                 31786.51, 32699.80, 33225.72, 33788.82, 33892.25, 34112.97, 34231.06, 34449.53,
                 34423.61, 34333.93, 34085.28, 33948.46, 33791.81, 33736.17, 33536.61, 33633.48,
                 33798.09, 33918.13, 33871.41, 33403.75, 32706.46, 31929.96, 31400.48, 30798.24,
                 29958.04, 30020.36, 29822.62, 30414.88, 30100.74, 29833.49, 28302.29, 26906.72,
                 26378.64, 25382.11, 25108.30, 25407.07, 25469.06, 25291.89, 25054.11, 24802.21,
                 24681.89, 24366.97, 24134.74, 24304.08, 25253.99, 26950.23, 29080.48, 31076.33,
                 32453.20, 33232.81, 33661.61, 33991.21, 34017.02, 34164.47, 34398.01, 34655.21,
                 34746.83, 34596.60, 34396.54, 34236.31, 34153.32, 34102.62, 33970.92, 34016.13,
                 34237.27, 34430.08, 34379.39, 33944.06, 33154.67, 32418.62, 31781.90, 31208.69,
                 30662.59, 30230.67, 30062.80, 30421.11, 30710.54, 30239.27, 28949.56, 27506.96,
                 26891.75, 25946.24, 25599.88, 25921.47, 26023.51, 25826.29, 25548.72, 25405.78,
                 25210.45, 25046.38, 24759.76, 24957.54, 25815.10, 27568.98, 29765.24, 31728.25,
                 32987.51, 33633.74, 34021.09, 34407.19, 34464.65, 34540.67, 34644.56, 34756.59,
                 34743.81, 34630.05, 34506.39, 34319.61, 34110.96, 33961.19, 33876.04, 33969.95,
                 34220.96, 34444.66, 34474.57, 34018.83, 33307.40, 32718.90, 32115.27, 31663.53,
                 30903.82, 31013.83, 31025.04, 31106.81, 30681.74, 30245.70, 29055.49, 27582.68,
                 26974.67, 25993.83, 25701.93, 25940.87, 26098.63, 25771.85, 25468.41, 25315.74,
                 25131.87, 24913.15, 24641.53, 24807.15, 25760.85, 27386.39, 29570.03, 31634.00,
                 32911.26, 33603.94, 34020.90, 34297.65, 34308.37, 34504.71, 34586.78, 34725.81,
                 34765.47, 34619.92, 34478.54, 34285.00, 34071.90, 33986.48, 33756.85, 33799.37,
                 33987.95, 34047.32, 33924.48, 33580.82, 32905.87, 32293.86, 31670.02, 31092.57,
                 30639.73, 30245.42, 30281.61, 30484.33, 30349.51, 29889.23, 28570.31, 27185.55,
                 26521.85, 25543.84, 25187.82, 25371.59, 25410.07, 25077.67, 24741.93, 24554.62,
                 24427.19, 24127.21, 23887.55, 24028.40, 24981.34, 26652.32, 28808.00, 30847.09,
                 32304.13, 33059.02, 33562.51, 33878.96, 33976.68, 34172.61, 34274.50, 34328.71,
                 34370.12, 34095.69, 33797.46, 33522.96, 33169.94, 32883.32, 32586.24, 32380.84,
                 32425.30, 32532.69, 32444.24, 32132.49, 31582.39, 30926.58, 30347.73, 29518.04,
                 29070.95, 28586.20, 28416.94, 28598.76, 28529.75, 28424.68, 27588.76, 26604.13,
                 26101.63, 25003.82, 24576.66, 24634.66, 24586.21, 24224.92, 23858.42, 23577.32,
                 23272.28, 22772.00, 22215.13, 21987.29, 21948.95, 22310.79, 22853.79, 24226.06,
                 25772.55, 27266.27, 28045.65, 28606.14, 28793.51, 28755.83, 28613.74, 28376.47,
                 27900.76, 27682.75, 27089.10, 26481.80, 26062.94, 25717.46, 25500.27, 25171.05,
                 25223.12, 25634.63, 26306.31, 26822.46, 26787.57, 26571.18, 26405.21, 26148.41,
                 25704.47, 25473.10, 25265.97, 26006.94, 26408.68, 26592.04, 26224.64, 25407.27,
                 25090.35, 23930.21, 23534.13, 23585.75, 23556.93, 23230.25, 22880.24, 22525.52,
                 22236.71, 21715.08, 21051.17, 20689.40, 20099.18, 19939.71, 19722.69, 20421.58,
                 21542.03, 22962.69, 23848.69, 24958.84, 25938.72, 26316.56, 26742.61, 26990.79,
                 27116.94, 27168.78, 26464.41, 25703.23, 25103.56, 24891.27, 24715.27, 24436.51,
                 24327.31, 24473.02, 24893.89, 25304.13, 25591.77, 25653.00, 25897.55, 25859.32,
                 25918.32, 25984.63, 26232.01, 26810.86, 27209.70, 26863.50, 25734.54, 24456.96)
y <- sim.gum(orders=ordersGUM, lags=lagsGUM, nsim=1, frequency=336, obs=3360,
             measurement=rep(1,3), transition=diag(3), persistence=c(0.045,0.162,0.375),
             initial=cbind(initialGUM1,initialGUM2,initialGUM3))$data
```

We can then apply ADAM to this data:

```{r}
testModel <- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
                  silent=FALSE, h=336, holdout=TRUE)
testModel
```

Note that the more lags you have, the more initial seasonal components the function will need to estimate, which is a difficult task. This is why we used `initial="backcasting"` in the example above - this speeds up the estimation by reducing the number of parameters to estimate. Still, the optimiser might not get close to the optimal value, so we can help it. First, we can give more time for the calculation, increasing the number of iterations via `maxeval`  (the default value is 40 iterations for each estimated parameter, e.g. $40 \times 5 = 200$ in our case):

```{r}
testModel <- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
                  silent=FALSE, h=336, holdout=TRUE, maxeval=10000)
testModel
```

This will take more time, but will typically lead to more refined parameters. You can control other parameters of the optimiser as well, such as `algorithm`, `xtol_rel`, `print_level` and others, which are explained in the documentation for `nloptr` function from nloptr package (run `nloptr.print.options()` for details). Second, we can give a different set of initial parameters for the optimiser, have a look at what the function saves:

```{r eval=FALSE, echo=TRUE}
testModel$B
```

and use this as a starting point for the reestimation (e.g. with a different algorithm):

```{r}
testModel <- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
                  silent=FALSE, h=336, holdout=TRUE, B=testModel$B)
testModel
```

If you are ready to wait, you can change the initialisation to the `initial="optimal"`, which in our case will take much more time because of the number of estimated parameters - 389 for the chosen model. The estimation process in this case might take 20 - 30 times more than in the example above.

In addition, you can specify some parts of the initial state vector or some parts of the persistence vector, here is an example:

```{r}
testModel <- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
                  silent=TRUE, h=336, holdout=TRUE, persistence=list(beta=0.1))
testModel
```


The function also handles intermittent data (the data with zeroes) and the data with missing values. This is partially covered in the vignette on the [oes()](oes.html) function. Here is a simple example:

```{r}
testModel <- adam(rpois(120,0.5), "MNN", silent=FALSE, h=12, holdout=TRUE,
                  occurrence="odds-ratio")
testModel
```

Finally, `adam()` is faster than `es()` function, because its code is more efficient and it uses a different optimisation algorithm with more finely tuned parameters by default. Let's compare:
```{r}
adamModel <- adam(AirPassengers, "CCC",
                  h=12, holdout=TRUE)
esModel <- es(AirPassengers, "CCC",
              h=12, holdout=TRUE)
"adam:"
adamModel
"es():"
esModel
```


# ADAM ARIMA
As mentioned above, ADAM does not only contain ETS, it also contains ARIMA model, which is regulated via `orders` parameter. If you want to have a pure ARIMA, you need to switch off ETS, which is done via `model="NNN"`:

```{r}
testModel <- adam(BJsales, "NNN", silent=FALSE, orders=c(0,2,2),
                  h=12, holdout=TRUE)
testModel
```

Given that both models are implemented in the same framework, they can be compared using information criteria.

The functionality of ADAM ARIMA is similar to the one of `msarima` function in `smooth` package, although there are several differences.

First, changing the `distribution` parameter will allow switching between additive / multiplicative models. For example, `distribution="dlnorm"` will create an ARIMA, equivalent to the one on logarithms of the data:

```{r}
testModel <- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12),
                  orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dlnorm",
                  h=12, holdout=TRUE)
testModel
```

Second, if you want the model with intercept / drift, you can do it using `constant` parameter:

```{r}
testModel <- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12), constant=TRUE,
                  orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dnorm",
                  h=12, holdout=TRUE)
testModel
```

If the model contains non-zero differences, then the constant acts as a drift. Third, you can specify parameters of ARIMA via the `arma` parameter in the following manner:

```{r}
testModel <- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12),
                  orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dnorm",
                  arma=list(ar=c(0.1,0.1), ma=c(-0.96, 0.03, -0.12, 0.03)),
                  h=12, holdout=TRUE)
testModel
```

Finally, the initials for the states can also be provided, although getting the correct ones might be a challenging task (you also need to know how many of them to provide; checking `testModel$initial` might help):

```{r}
testModel <- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12),
                  orders=list(ar=c(1,1),i=c(1,1),ma=c(2,0)), distribution="dnorm",
                  initial=list(arima=AirPassengers[1:24]),
                  h=12, holdout=TRUE)
testModel
```

If you work with ADAM ARIMA model, then there is no such thing as "usual" bounds for the parameters, so the function will use the `bounds="admissible"`, checking the AR / MA polynomials in order to make sure that the model is stationary and invertible (aka stable).

Similarly to ETS, you can use different distributions and losses for the estimation. **Note that the order selection for ARIMA is done in `auto.adam()` function, not in the `adam()`!** However, if you do `orders=list(..., select=TRUE)` in `adam()`, it will call `auto.adam()` and do the selection.

Finally, ARIMA is typically slower than ETS, mainly because its initial states are more difficult to estimate due to an increased complexity of the model. If you want to speed things up, use `initial="backcasting"` and reduce the number of iterations via `maxeval` parameter.

# ADAM ETSX / ARIMAX / ETSX+ARIMA
Another important feature of ADAM is introduction of explanatory variables. Unlike in `es()`, `adam()` expects a matrix for `data` and can work with a formula. If the latter is not provided, then it will use all explanatory variables. Here is a brief example:

```{r}
BJData <- cbind(BJsales,BJsales.lead)
testModel <- adam(BJData, "AAN", h=18, silent=FALSE)
```

If you work with data.frame or similar structures, then you can use them directly, ADAM will extract the response variable either assuming that it is in the first column or from the provided formula (if you specify one via `formula` parameter). Here is an example, where we create a matrix with lags and leads of an explanatory variable:

```{r}
BJData <- cbind(as.data.frame(BJsales),as.data.frame(xregExpander(BJsales.lead,c(-7:7))))
colnames(BJData)[1] <- "y"
testModel <- adam(BJData, "ANN", h=18, silent=FALSE, holdout=TRUE, formula=y~xLag1+xLag2+xLag3)
testModel
```

Similarly to `es()`, there is a support for variables selection, but via the `regressors` parameter instead of `xregDo`, which will then use `stepwise()` function from `greybox` package on the residuals of the model:

```{r}
testModel <- adam(BJData, "ANN", h=18, silent=FALSE, holdout=TRUE, regressors="select")
```

The same functionality is supported with ARIMA, so you can have, for example, ARIMAX(0,1,1), which is equivalent to ETSX(A,N,N):
```{r}
testModel <- adam(BJData, "NNN", h=18, silent=FALSE, holdout=TRUE, regressors="select", orders=c(0,1,1))
```

The two models might differ because they have different initialisation in the optimiser and different bounds for parameters (ARIMA relies on invertibility condition, while ETS does the usual (0,1) bounds by default). It is possible to make them identical if the number of iterations is increased and the initial parameters are the same. Here is an example of what happens, when the two models have exactly the same parameters:

```{r}
BJData <- BJData[,c("y",names(testModel$initial$xreg))];
testModel <- adam(BJData, "NNN", h=18, silent=TRUE, holdout=TRUE, orders=c(0,1,1),
                  initial=testModel$initial, arma=testModel$arma)
testModel
names(testModel$initial)[1] <- names(testModel$initial)[[1]] <- "level"
testModel2 <- adam(BJData, "ANN", h=18, silent=TRUE, holdout=TRUE,
                   initial=testModel$initial, persistence=testModel$arma$ma+1)
testModel2
```

Another feature of ADAM is the time varying parameters in the SSOE framework, which can be switched on via `regressors="adapt"`:

```{r}
testModel <- adam(BJData, "ANN", h=18, silent=FALSE, holdout=TRUE, regressors="adapt")
testModel$persistence
```

Note that the default number of iterations might not be sufficient in order to get close to the optimum of the function, so setting `maxeval` to something bigger might help. If you want to explore, why the optimisation stopped, you can provide `print_level=41` parameter to the function, and it will print out the report from the optimiser. In the end, the default parameters are tuned in order to give a reasonable solution, but given the complexity of the model, they might not guarantee to give the best one all the time.

Finally, you can produce a mixture of ETS, ARIMA and regression, by using the respective parameters, like this:

```{r}
testModel <- adam(BJData, "AAN", h=18, silent=FALSE, holdout=TRUE, orders=c(1,0,0))
summary(testModel)
```

This might be handy, when you explore a high frequency data, want to add calendar events, apply ETS and add AR/MA errors to it.

Finally, if you estimate ETSX or ARIMAX model and want to speed things up, it is recommended to use `initial="backcasting"`, which will then initialise dynamic part of the model via backcasting and use optimisation for the parameters of the explanatory variables:

```{r}
testModel <- adam(BJData, "AAN", h=18, silent=TRUE, holdout=TRUE, initial="backcasting")
summary(testModel)
```

# Auto ADAM
While the original `adam()` function allows selecting ETS components and explanatory variables, it does not allow selecting the most suitable distribution and / or ARIMA components. This is what `auto.adam()` function is for.

In order to do the selection of the most appropriate distribution, you need to provide a vector of those that you want to check:

```{r}
testModel <- auto.adam(BJsales, "XXX", silent=FALSE,
                       distribution=c("dnorm","dlaplace","ds"),
                       h=12, holdout=TRUE)
testModel
```

This process can also be done in parallel on either the automatically selected number of cores (e.g. `parallel=TRUE`) or on the specified by user (e.g. `parallel=4`):

```{r eval=FALSE, echo=TRUE}
testModel <- auto.adam(BJsales, "ZZZ", silent=FALSE, parallel=TRUE,
                       h=12, holdout=TRUE)
```

If you want to add ARIMA or regression components, you can do it in the exactly the same way as for the `adam()` function. Here is an example of ETS+ARIMA:

```{r}
testModel <- auto.adam(BJsales, "AAN", orders=list(ar=2,i=0,ma=0), silent=TRUE,
                       distribution=c("dnorm","dlaplace","ds","dgnorm"),
                       h=12, holdout=TRUE)
testModel
```

However, this way the function will just use ARIMA(2,0,0) and fit it together with ETS(A,A,N). If you want it to select the most appropriate ARIMA orders from the provided (e.g. up to AR(2), I(1) and MA(2)), you need to add parameter `select=TRUE` to the list in `orders`:

```{r}
testModel <- auto.adam(BJsales, "XXN", orders=list(ar=2,i=2,ma=2,select=TRUE),
                       distribution="default", silent=FALSE,
                       h=12, holdout=TRUE)
testModel
```

Knowing how to work with `adam()`, you can use similar principles, when dealing with `auto.adam()`. Just keep in mind that the provided `persistence`, `phi`, `initial`, `arma` and `B` won't work, because this contradicts the idea of the model selection.

Finally, there is also the mechanism of automatic outliers detection, which extracts residuals from the best model, flags observations that lie outside the prediction interval of the width `level` in sample and then refits `auto.adam()` with the dummy variables for the outliers. Here how it works:
```{r}
testModel <- auto.adam(AirPassengers, "PPP", silent=FALSE, outliers="use",
                       distribution="default",
                       h=12, holdout=TRUE)
testModel
```

If you specify `outliers="select"`, the function will create leads and lags 1 of the outliers and then select the most appropriate ones via the `regressors` parameter of adam.

If you want to know more about ADAM, you are welcome to visit [the online textbook](https://www.openforecast.org/adam/) (this is a work in progress at the moment).