--- title: "Unveiling Causal Relationships in Time Series Data" author: "Stavros Stavroglou, Athanasios Pantelous, Hui Wang" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Pattern Causality between two series} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( warning = FALSE, collapse = TRUE, comment = "#>" ) ``` This vignette demonstrates advanced techniques for examining causal relationships between time series using the `patterncausality` package. We will focus on three key aspects: 1. **Cross-validation methods:** To rigorously assess the robustness of our findings, ensuring they are not mere artifacts of the data. 2. **Parameter optimization:** To fine-tune our analysis for the most accurate and reliable results. 3. **Visualization of causality relationships:** To provide clear and intuitive insights into the causal connections between time series. Through cross-validation, we aim to understand: * **Reliability of results:** How dependable are our conclusions? * **Robustness across different sample sizes:** Do our findings hold true regardless of the amount of data used? * **Stability of causality patterns:** Are the identified causal relationships consistent over time and across different data subsets? ## Cross-Validation: Ensuring the Reliability of Causal Inference To demonstrate the application of cross-validation, we will begin by importing a climate dataset from the `patterncausality` package. ```{r message = FALSE} library(patterncausality) data(climate_indices) ``` Now, let's apply cross-validation to evaluate the robustness of pattern causality. We will use the Pacific North American (PNA) and North Atlantic Oscillation (NAO) climate indices as our example time series. ```{r} set.seed(123) X <- climate_indices$PNA Y <- climate_indices$NAO result <- pcCrossValidation( X = X, Y = Y, numberset = seq(100, 500, by = 10), E = 3, tau = 2, metric = "euclidean", h = 1, weighted = FALSE ) print(result$results) ``` To better visualize the results, we will use the `plot` function to generate a line chart. ```{r} plot(result) ``` As you can see from the plot, the location of the causality tends to stabilize as the sample size increases. This indicates that our method is effective at capturing the underlying patterns and causal connections within the time series. In this tutorial, you've learned how to use cross-validation to assess the reliability of time series causality and how to use visualization tools to better understand the results. ## Cross-Validation: Convergence of Pattern Causality Now, let's examine the cross-validation process when the `random` parameter is set to `FALSE`. This approach uses a systematic sampling method rather than random sampling. ```{r} set.seed(123) X <- climate_indices$PNA Y <- climate_indices$NAO result_non_random <- pcCrossValidation( X = X, Y = Y, numberset = seq(100, 500, by = 100), E = 3, tau = 2, metric = "euclidean", h = 1, weighted = FALSE, random = FALSE ) print(result_non_random$results) ``` We can also visualize the results of the non-random cross-validation: ```{r} plot(result_non_random) ``` By comparing the results of the random and non-random cross-validation, you can gain a deeper understanding of how different sampling methods affect the stability and reliability of the causality analysis. ## Cross-Validation with Bootstrap Analysis To obtain more robust results and understand the uncertainty in our causality measures, we can use bootstrap sampling in our cross-validation analysis. This approach repeatedly samples the data with replacement and provides statistical summaries of the causality measures. ```{r} set.seed(123) X <- climate_indices$PNA Y <- climate_indices$NAO result_boot <- pcCrossValidation( X = X, Y = Y, numberset = seq(100, 500, by = 100), E = 3, tau = 2, metric = "euclidean", h = 1, weighted = FALSE, random = TRUE, bootstrap = 10 # Perform 100 bootstrap iterations ) ``` The bootstrap analysis provides several statistical measures for each sample size: - Mean: Average causality measure across bootstrap samples - 5% and 95% quantiles: Confidence intervals for the causality measure - Median: Central tendency measure robust to outliers Let's examine the results: ```{r} print(result_boot$results) ``` We can visualize the bootstrap results using the plot function, which now shows confidence intervals: ```{r} plot(result_boot, separate = TRUE) ``` The shaded area in the plot represents the range between the 5th and 95th percentiles of the bootstrap samples, providing a measure of uncertainty in our causality estimates. The solid line shows the median value, which is more robust to outliers than the mean.