--- title: "Parallel Computing in Pattern Causality Analysis" author: "Stavros Stavroglou, Athanasios Pantelous, Hui Wang" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Parallel Computing in Pattern Causality} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( warning = FALSE, collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 6 ) ``` Pattern causality analysis involves computationally intensive tasks, especially when dealing with complex systems and large datasets. This vignette demonstrates how to leverage parallel computing capabilities in the `patterncausality` package to significantly reduce computation time and improve efficiency. ## Key Benefits of Parallel Computing The parallel computing features in this package are particularly effective for: 1. **Bootstrap Analysis**: - Distributing bootstrap iterations across multiple cores - Ideal for uncertainty quantification - Significant speed improvements for large numbers of iterations 2. **Matrix Computations**: - Processing large causality matrices efficiently - Handling multiple time series simultaneously - Reducing computation time for system-wide analyses 3. **Cross-validation Studies**: - Parallel processing of different sample sizes - Efficient handling of repeated computations - Improved performance for robustness analysis ## Performance Comparison: Sequential vs Parallel Computing Let's explore how parallel computing can enhance the performance of different pattern causality analyses: ```{r message = FALSE} library(patterncausality) data(climate_indices) ``` ## Create test data ```{r} X <- climate_indices$PNA Y <- climate_indices$NAO ``` ## Function to measure execution time ```{r} run_cv_test <- function(n_cores) { start_time <- Sys.time() result <- pcCrossValidation( X = X, Y = Y, numberset = c(100, 200, 300, 400, 500), E = 3, tau = 2, metric = "euclidean", h = 1, weighted = FALSE, random = TRUE, bootstrap = 100, n_cores = n_cores, verbose = TRUE ) end_time <- Sys.time() return(difftime(end_time, start_time, units = "secs")) } ``` # Compare sequential vs parallel ```r time_seq <- run_cv_test(1) time_par <- run_cv_test(parallel::detectCores() - 1) cat("Sequential computation time:", time_seq, "seconds\n") cat("Parallel computation time:", time_par, "seconds\n") cat("Speed-up factor:", as.numeric(time_seq) / as.numeric(time_par), "x\n") ``` ## Matrix Analysis with Multiple Time Series When analyzing causality between multiple time series, parallel computing can significantly reduce computation time: ```r # Create larger test dataset n_series <- 20 n_points <- 1000 test_data <- matrix(rnorm(n_series * n_points), ncol = n_series) colnames(test_data) <- paste0("Series_", 1:n_series) # Function to measure execution time run_matrix_test <- function(n_cores) { start_time <- Sys.time() result <- pcMatrix( dataset = test_data, E = 3, tau = 2, metric = "euclidean", h = 1, weighted = FALSE, n_cores = n_cores, verbose = TRUE ) end_time <- Sys.time() return(difftime(end_time, start_time, units = "secs")) } # Compare sequential vs parallel time_seq <- run_matrix_test(1) time_par <- run_matrix_test(parallel::detectCores() - 1) cat("Sequential computation time:", time_seq, "seconds\n") cat("Parallel computation time:", time_par, "seconds\n") cat("Speed-up factor:", as.numeric(time_seq) / as.numeric(time_par), "x\n") ``` ## Understanding Parallel Performance ### Key Factors Affecting Speed-up 1. **Data Characteristics** - Size of time series - Number of series - Sample sizes in cross-validation - Number of bootstrap iterations 2. **Hardware Considerations** - Number of CPU cores - Available memory - System architecture (Windows/Linux/Mac) 3. **Analysis Type** - Bootstrap analysis: Excellent parallelization potential - Matrix computation: Good for large matrices - Cross-validation: Depends on sample sizes ### Best Practices for Optimal Performance ```r # Get available cores n_cores <- parallel::detectCores() # Use n_cores - 1 for computation recommended_cores <- max(1, n_cores - 1) cat("Recommended number of cores:", recommended_cores, "\n") # Example of memory-efficient parallel computation result <- pcCrossValidation( X = X, Y = Y, numberset = c(100, 200, 300), E = 3, tau = 2, bootstrap = 50, n_cores = 2, # Use modest number of cores for memory efficiency verbose = TRUE ) ``` ## System-Specific Considerations ### Windows Systems - Uses PSOCK clusters - Slightly higher overhead - Consider using fewer cores ### Linux/Mac Systems - Uses FORK clusters - Better parallel performance - Can utilize more cores effectively ### Memory Usage Guidelines - Monitor system memory during computation - Reduce core count if memory pressure is high - Consider batch processing for very large datasets ## Conclusion Parallel computing in pattern causality analysis can provide significant performance improvements, especially for: - Large-scale bootstrap analysis - Multi-series causality matrices - Extensive cross-validation studies Choose parallel computing parameters based on: - Your system capabilities - Dataset characteristics - Analysis requirements - Available computational resources For optimal results, always monitor system performance and adjust parameters accordingly.