--- title: "Geographical Convergent Cross Mapping (GCCM)" author: "Wenbo Lv" date: "2025-01-11" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{GCCM} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## 1. The principle of geographical convergent cross mapping (GCCM) Takens’ theory proves that, for a dynamic system $\phi$, if its trajectory converges to an attractor manifold $M$, which are consisted by a bounded and invariant set of states, then the mapping between $\phi$ and M can be built and time series observations of $\phi$ can be used to construct $M$. According to the generalized embedding theorem, for a compact $d$-dimensional manifold $M$ and a set of observation functions $\left$, the map $\psi_{\phi,h} = \left$ is an embedding of $M$ with $L = 2d + 1$. Here embedding means a one-to-one map resolving all singularities of the original manifold. The elements $h_i$ can be lags of observations from single time series observations, lags of observations from multiple time series, or multiple observation functions. The first two constructions are only special cases of the third one. By taking the measured values at one specific unit and its neighbors (named as spatial lags in spatial statistics) as a set of observation functions, $\psi_{\phi,h} \left(x,s\right) = \left$ is a embedding, where $s$ is the focal unit currently under investigation and $s\left(i\right)$ is its $i$-th order of spatial lags. $h_s\left(x\right)$ and $h_{s\left(i\right)}\left(x\right)$ are their observation functions respectively. (Hereinafter, we will use $\psi \left(x,s\right)$ to present $\psi_{\phi,h} \left(x,s\right)$ for short). For two spatial variables $X$ and $Y$ on the same set of spatial units, their values and spatial lags can be regarded as observation functions reading values from each spatial unit. As the spatial lags in each order contain more than one spatial units, the observation function can be set as the mean of the spatial units or other summary functions considering the spatial direction, to assure the one-to-one mapping of the original manifold $M$. The cross-mapping prediction is defined as: $$ \hat{Y}_s \mid M_x = \sum\limits_{i=1}^{L+1} \left(\omega_{si}Y_{si} \mid M_x \right) $$ where $s$ represents a spatial unit at which the value of $Y$ needs to be predicted, $\hat{Y}_s$ is the prediction result, $L$ is the number of dimensions of the embedding, $si$ is the spatial unit used in the prediction, $Y_{si}$ is the observation value at $si$ and simultaneously the first component of a state in $M_y$, noted as $\psi\left(y,s_i\right)$. In further, $\psi\left(y,s_i\right)$ is determined by its one-to-one mapping point $\psi\left(x,s_i\right)$, which is in turn one of the $L+1$ nearest neighbors of the focal state in $M_x$. $\omega_{si}$ is the corresponding weight defined as: $$ \omega_{si} \mid M_x = \frac{weight \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right)}{\sum_{i=1}^{L+1}weight \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right)} $$ where $weight \left(\ast,\ast\right)$ is the weight function between two states in the shadow manifold, defined as: $$ weight \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right) = \exp \left(- \frac{dis \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right)}{dis \left(\psi\left(x,s_1\right),\psi\left(x,s\right)\right)} \right) $$ where $\exp$ is the exponential function and $dis \left(\ast,\ast\right)$ represents the distance function between two states in the shadow manifold defined as: $$ dis \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right) = \frac{1}{L} \left(\left|h_{si}\left(x\right)-h_{s}\left(x\right)\right| + \sum_{k=1}^{L-1}abs \left[h_{si\left(k\right)}\left(x\right),h_{s\left(k\right)}\left(x\right)\right]\right) $$ Note that the absolute value distance is used here. The skill of cross-mapping prediction is measured by the Pearson correlation coefficient between the true observations and corresponding predictions: $$ \rho = \frac{Cov\left(Y,\hat{Y}\right)}{\sqrt{Var\left(Y\right) Var\left(\hat{Y}\right)}} $$ The prediction skill $\rho$ varies by setting different sizes of libraries, which means the quantity of observations used in reconstruction of the shadow manifold. We can use the convergence of $\rho$ to infer the causal associations. For GCCM, the convergence means that $\rho$ increases with the size of libraries and is statistically significant when the library becomes largest. And the confidence interval of $\rho$ can be estimated based the $z$-statistics with the normal distribution: $$ t = \rho \sqrt{\frac{n-2}{1-\rho^2}} $$ where $n$ is the number of observations to be predicted, and $$ z = \frac{1}{2} \ln \left(\frac{1+\rho}{1-\rho}\right) $$ ## 2. Examples ### 2.1 Install the `spEDM` package ```r install.packages("spEDM", dep = TRUE) ``` Load the `spEDM` package: ``` r library(spEDM) ``` ### 2.2 An example of lattice data about county-level population density Load data and package: ``` r popd_nb = spdep::read.gal(system.file("extdata/popdensity_nb.gal", package = "spEDM")) ## Warning in spdep::read.gal(system.file("extdata/popdensity_nb.gal", package = "spEDM")): ## neighbour object has 4 sub-graphs popd_nb ## Neighbour list object: ## Number of regions: 2806 ## Number of nonzero links: 15942 ## Percentage nonzero weights: 0.2024732 ## Average number of links: 5.681397 ## 4 disjoint connected subgraphs popdensity = readr::read_csv(system.file("extdata/popdensity.csv", package = "spEDM")) ## Rows: 2806 Columns: 7 ## ── Column specification ─────────────────────────────────────────────────────────────────────── ## Delimiter: "," ## dbl (7): x, y, popDensity, DEM, Tem, Pre, slop ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. popdensity ## # A tibble: 2,806 × 7 ## x y popDensity DEM Tem Pre slop ## ## 1 117. 30.5 780. 8 17.4 1528. 0.452 ## 2 117. 30.6 395. 48 17.2 1487. 0.842 ## 3 117. 30.8 261. 49 16.0 1456. 3.56 ## 4 116. 30.1 258. 23 17.4 1555. 0.932 ## 5 116. 30.5 211. 101 16.3 1494. 3.34 ## 6 117. 31.0 386. 10 16.6 1382. 1.65 ## 7 117. 30.2 350. 23 17.5 1569. 0.346 ## 8 117. 30.7 470. 22 17.1 1493. 1.88 ## 9 117. 30.6 1226. 11 17.4 1526. 0.208 ## 10 116. 30.9 137. 598 13.9 1458. 5.92 ## # ℹ 2,796 more rows popd_sf = sf::st_as_sf(popdensity, coords = c("x","y"), crs = 4326) popd_sf ## Simple feature collection with 2806 features and 5 fields ## Geometry type: POINT ## Dimension: XY ## Bounding box: xmin: 74.9055 ymin: 18.2698 xmax: 134.269 ymax: 52.9346 ## Geodetic CRS: WGS 84 ## # A tibble: 2,806 × 6 ## popDensity DEM Tem Pre slop geometry ## * ## 1 780. 8 17.4 1528. 0.452 (116.912 30.4879) ## 2 395. 48 17.2 1487. 0.842 (116.755 30.5877) ## 3 261. 49 16.0 1456. 3.56 (116.541 30.7548) ## 4 258. 23 17.4 1555. 0.932 (116.241 30.104) ## 5 211. 101 16.3 1494. 3.34 (116.173 30.495) ## 6 386. 10 16.6 1382. 1.65 (116.935 30.9839) ## 7 350. 23 17.5 1569. 0.346 (116.677 30.2412) ## 8 470. 22 17.1 1493. 1.88 (117.066 30.6514) ## 9 1226. 11 17.4 1526. 0.208 (117.171 30.5558) ## 10 137. 598 13.9 1458. 5.92 (116.208 30.8983) ## # ℹ 2,796 more rows ``` Run GCCM: ``` r startTime = Sys.time() pd_res = gccm(data = popd_sf, cause = "Pre", effect = "popDensity", libsizes = seq(10, 2800, by = 100), E = 3, k = 4, nb = popd_nb, progressbar = FALSE) endTime = Sys.time() print(difftime(endTime,startTime, units ="mins")) ## Time difference of 19.59433 mins pd_res ## libsizes popDensity->Pre Pre->popDensity ## 1 10 0.01130395 0.06607465 ## 2 110 0.03306905 0.22689926 ## 3 210 0.04821180 0.28196395 ## 4 310 0.06475999 0.30785228 ## 5 410 0.07922017 0.33124862 ## 6 510 0.09428975 0.36226576 ## 7 610 0.10951603 0.39693375 ## 8 710 0.12249230 0.42302102 ## 9 810 0.13594244 0.44164444 ## 10 910 0.14767067 0.45689198 ## 11 1010 0.15863133 0.46853529 ## 12 1110 0.16852300 0.47657811 ## 13 1210 0.17808456 0.48683486 ## 14 1310 0.18761109 0.49750321 ## 15 1410 0.19697038 0.50708625 ## 16 1510 0.20507772 0.51660029 ## 17 1610 0.21264081 0.52744922 ## 18 1710 0.21969130 0.53845530 ## 19 1810 0.22618694 0.54912681 ## 20 1910 0.23207300 0.55982092 ## 21 2010 0.23780738 0.57043510 ## 22 2110 0.24300452 0.58089481 ## 23 2210 0.24773394 0.59104138 ## 24 2310 0.25208726 0.60092354 ## 25 2410 0.25628422 0.61050007 ## 26 2510 0.26039117 0.62015838 ## 27 2610 0.26436854 0.62986485 ## 28 2710 0.26826114 0.63868014 ``` Visualize the result: ``` r plot(pd_res,xlimits = c(0, 2800)) ``` ![**Figure 1**. The cross-mapping prediction outputs between population density and county-level Precipitation.](../man/figures/gccm/fig1-1.png) ### 2.3 An example of grid data about farmland NPP Load data and package: ``` r npp = terra::rast(system.file("extdata/npp.tif", package = "spEDM")) npp ## class : SpatRaster ## dimensions : 404, 483, 3 (nrow, ncol, nlyr) ## resolution : 10000, 10000 (x, y) ## extent : -2625763, 2204237, 1877078, 5917078 (xmin, xmax, ymin, ymax) ## coord. ref. : CGCS2000_Albers ## source : npp.tif ## names : npp, pre, tem ## min values : 164.00, 384.3409, -47.8194 ## max values : 16606.33, 23878.3555, 263.6938 terra::plot(npp, nc = 3, mar = rep(0.1,4), oma = rep(0.1,4), axes = FALSE, legend = FALSE) ``` ![**Figure 2**. Maps of farmland NPP and climate factors.](../man/figures/gccm/fig2-1.png) *To save the computation time, we will aggregate the data by 3 times and select 3000 non-NA pixels to predict:* ``` r npp = terra::aggregate(npp, fact = 3, na.rm = TRUE) terra::global(npp,"isNA") ## isNA ## npp 14815 ## pre 14766 ## tem 14766 terra::ncell(npp) ## [1] 21735 namat = terra::as.matrix(!is.na(npp[[1]]), wide = TRUE) pred = terra::rowColFromCell(npp,which(namat)) dim(pred) ## [1] 6920 2 set.seed(42) indices = sample(nrow(pred), size = 3000, replace = FALSE) pred = pred[indices,] ``` Run GCCM: ``` r startTime = Sys.time() npp_res = gccm(data = npp, cause = "pre", effect = "npp", libsizes = seq(10,130,5), E = 2, k = 5, RowCol = pred, progressbar = FALSE) endTime = Sys.time() print(difftime(endTime,startTime, units ="mins")) ## Time difference of 24.49013 mins npp_res ## libsizes npp->pre pre->npp ## 1 10 0.049237491 0.07312462 ## 2 15 0.063939456 0.10307461 ## 3 20 0.080009483 0.12935898 ## 4 25 0.097903289 0.14718470 ## 5 30 0.104758061 0.15943448 ## 6 35 0.111686523 0.17026742 ## 7 40 0.119169359 0.18985868 ## 8 45 0.132677547 0.21607528 ## 9 50 0.148968939 0.24764600 ## 10 55 0.165193950 0.28759045 ## 11 60 0.173868473 0.30697776 ## 12 65 0.179777011 0.31209341 ## 13 70 0.157231842 0.28574340 ## 14 75 0.108711472 0.24817243 ## 15 80 0.058977718 0.22701677 ## 16 85 -0.003501071 0.23364337 ## 17 90 -0.022853416 0.23577169 ## 18 95 -0.042780283 0.34364441 ## 19 100 -0.034298266 0.72561637 ## 20 105 -0.070605553 0.77999128 ``` Visualize the result: ``` r plot(npp_res,xlimits = c(9, 101),ylimits = c(-0.05,1)) ## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_line()`). ## Removed 1 row containing missing values or values outside the scale range (`geom_line()`). ``` ![**Figure 3**. The cross-mapping prediction outputs between farmland NPP and precipitation.](../man/figures/gccm/fig3-1.png)