---
title: "Information Consistency-Based Measures for Spatial Stratified Heterogeneity"
author: "Wenbo Lv"
date: "2024-12-01"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{sshicm}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---



  

## 1. Introduction to `sshicm` package

### 1.1 The `sshicm` package can be used to address following issues:

- Information consistency-based measures of spatial stratified heterogeneity intensity for continuous and nominal variables.

- Strength of spatial pattern associations based on information consistency measures.

### 1.2 Example data in the `sshicm` package

#### baltim data

"baltim" consists of [Baltimore home sale prices and hedonics][5]. In total, there are 221 instances in "baltim" data. The explanatory variables are whether it is a detached unit (DWELL), whether it has a patio (PATIO), whether it has a fireplace (FIREPL), whether it has air conditioning (AC), and whether the dwelling is in Baltimore County (CITCOU, while the target variable is the sale price of the home (PRICE).


#### cinc data

"cinc" is derived from [the 2008 Cincinnati Crime + Socio-Demographics dataset][6]. It includes spatial data on 457 objects located on an irregular lattice. The explanatory variables are male population (MALE), female population (FEMALE), median age (MEDIAN_AGE), average family size (AVG_FAMSIZ), and population density (DENSITY), while the target variable is the existence of theft (THEFT_D).

![**Figure 1**. Maps of the baltim and cinc data sets. ([Bai et al. 2023][2])](../man/figures/sshicm/sshicm_example_data.jpg){width=500px}

### 1.3 Functions in the `sshicm` package

#### Two functions for vector-type inputs of dependent and independent variables.

- `sshic()` for continuous dependent variable

- `sshin()` for continuous nominal variable

#### Regression-style data frame modeling function

A function `sshicm()` that yields all results in a single line, with the `type` parameter set to `IC` (Continuous) or `IN` (Nominal) to specify whether the dependent variable is a continuous or nominal variable.

## 2. The principle of measuring spatial stratified heterogeneity based on information consistency

**Note: All explanatory variables must be discretized in advance or inherently be discrete nominal variables.**

### 2.1 When the dependent variable is a continuous variable:

$$
I_{C}\left(d,s\right) = \sum_{s_{i} \in S}p\left(s_{i}\right)\frac{ \arctan \left(\textbf{RelE} \left( f_{d_{i}} \mid \mid  f \right) \right)}{\pi / 2}
$$

where $d_i$ is the random variable corresponding to the target variable in stratum $s_i$ , and $f_{d_i}$ and $f$ are the density functions of $d_i$ and $d$, respectively. Additionally, $\textbf{RelE} \left( f_{d_{i}} \mid \mid  f \right)$ is the relative entropy of $f_{d_i}$ and $f$.

$$
\textbf{RelE} \left( f_{d_{i}} \mid \mid  f \right) = H \left(f_{d_{i}} , f\right) - H \left(f_{d_{i}}\right) = \sum_{i = 1}^{n} f_{d_{i}} \log \frac{1}{f} - \sum_{i = 1}^{n} f_{d_{i}} \log \frac{1}{f_{d_{i}}} = \sum_{i = 1}^{n} f_{d_{i}} \log \frac{f_{d_{i}}}{f}
$$

### 2.2 When the dependent variable is a nominal variable:

$$
I_{N}\left(d,s\right) = \frac{I \left(d,s\right)}{I \left(d\right)} = 
\frac{I \left(d\right) - I \left(d \mid s\right)}{I \left(d\right)} = 
1 - \frac{\sum_{s_i \in S}\sum_{x \in V_d} p\left(s_i,x\right) \log p\left(x \mid s_i\right)}{\sum_{x \in V_d} p\left(x\right) \log p\left(x\right)}
$$

where $p\left(x\right)$ is the probability of observing $x$ in $U$, $p\left(s_i,x\right)$ is the probability of observing $s_i$ and $x$ in $U$, and $p\left(x \mid s_i\right)$ is the probability of observing $x$ given that the stratum is $s_i$.


## 3. Examples of the `sshicm` package

```r
install.packages("sshicm", dep = TRUE)
```



``` r
library(sshicm)
```



``` r
baltim = sf::read_sf(system.file("extdata/baltim.gpkg",package = "sshicm"))
sshicm(PRICE ~ .,baltim,type = "IC")
## # A tibble: 5 × 3
##   Variable     Ic      Pv
##   <chr>     <dbl>   <dbl>
## 1 DWELL    0.648  0.00801
## 2 AC       0.223  0.0591 
## 3 PATIO    0.168  0.556  
## 4 FIREPL   0.135  0.667  
## 5 CITCOU   0.0898 0.988
```


``` r
cinc = sf::read_sf(system.file("extdata/cinc.gpkg",package = "sshicm"))
sshicm(THEFT_D ~ .,cinc,type = "IN")
## # A tibble: 5 × 3
##   Variable        In      Pv
##   <chr>        <dbl>   <dbl>
## 1 DENSITY    0.776   0.0681 
## 2 MEDIAN_AGE 0.228   0.0230 
## 3 MALE       0.0367  0      
## 4 AVG_FAMSIZ 0.0205  0.00300
## 5 FEMALE     0.00584 0.0200
```


``` r
ntds = gdverse::NTDs
sshicm(incidence ~ watershed + elevation + soiltype,data = ntds)
## # A tibble: 3 × 3
##   Variable     Ic     Pv
##   <chr>     <dbl>  <dbl>
## 1 elevation 0.293 0.0250
## 2 watershed 0.177 0.0521
## 3 soiltype  0.117 0.0671
```


## Reference



Wang, J., Haining, R., Zhang, T., Xu, C., Hu, M., Yin, Q., … Chen, H. (2024). Statistical Modeling of Spatially Stratified Heterogeneous Data. Annals of the American Association of Geographers, 114(3), 499–519. [https://doi.org/10.1080/24694452.2023.2289982][1].

Bai, H., Wang, H., Li, D., & Ge, Y. (2023). Information Consistency-Based Measures for Spatial Stratified Heterogeneity. Annals of the American Association of Geographers, 113(10), 2512–2524. [https://doi.org/10.1080/24694452.2023.2223700][2].

Wang, J., Li, X., Christakos, G., Liao, Y., Zhang, T., Gu, X., & Zheng, X. (2010). Geographical Detectors‐Based Health Risk Assessment and its Application in the Neural Tube Defects Study of the Heshun Region, China. International Journal of Geographical Information Science, 24(1), 107–127. [https://doi.org/10.1080/13658810802443457][3].

Wang, J. F., Zhang, T. L., & Fu, B. J. A measure of spatial stratified heterogeneity. Ecological indicators, 2016. 67, 250-256. [https://doi.org/10.1016/j.ecolind.2016.02.052][4].



&nbsp; 

[1]: https://doi.org/10.1080/24694452.2023.2289982
[2]: https://doi.org/10.1080/24694452.2023.2223700
[3]: https://doi.org/10.1080/13658810802443457
[4]: https://doi.org/10.1016/j.ecolind.2016.02.052
[5]: https://geodacenter.github.io/data-and-lab/baltim/
[6]: https://geodacenter.github.io/data-and-lab/walnut_hills/

&nbsp; 
&nbsp;