In the following we will demonstrate an idealized workflow based on a subset of the WorldPop data set that is delivered together with this package. You can follow along the code snippets below to reproduce the results. Please note that to reduce the time it takes to process this vignette, we will not download any resources from the internet. In a real use case, thus processing time might substantially increase because resources have to be downloaded and real portfolios might be larger than the one created in this example.
This vignette assumes that you have already followed the steps in Installation and have familiarized yourself with the terminology used in the package. If you are unfamiliar with the terminology used here, please head over to the Terminology article to learn about the most important concepts.
The idealized workflow for using {mapme.biodiversity}
consists of the following steps:
'POLYGON'
First, we will load the {mapme.biodiversity}
and the
{sf}
package for handling spatial vector data. For tabular
data handling, we will also load the {dplyr}
and
{tidyr}
packages. Then, we will read an internal GeoPackage
which includes the geometry of a protected area in the Dominican
Republic from the WDPA database.
library(mapme.biodiversity)
library(sf)
library(dplyr)
library(tidyr)
aoi_path <- system.file("extdata", "sierra_de_neiba_478140.gpkg", package = "mapme.biodiversity")
(aoi <- read_sf(aoi_path))
#> Simple feature collection with 1 feature and 4 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -71.80933 ymin: 18.57668 xmax: -71.33201 ymax: 18.69931
#> Geodetic CRS: WGS 84
#> # A tibble: 1 × 5
#> WDPAID NAME DESIG_ENG ISO3 geom
#> <dbl> <chr> <chr> <chr> <MULTIPOLYGON [°]>
#> 1 478140 Sierra de Neiba National Park DOM (((-71.76134 18.66333, -71.76067 1…
The sf-object contains a single object of geometry type
'MULTIPOLYGON'
. The {mapme.biodiversity}
package, however, only supports geometries of type
'POLYGON'
, thus we need to cast the geometry before we
advance. The resulting sf
object also contains some
metadata, that will be retained throughout the complete workflow.
Because some of the casted geometries represent artefacts of the
digitization process, in this example we will subset to include only the
largest polygon.
(aoi <- st_cast(aoi, to = "POLYGON")[1, ])
#> Warning in st_cast.sf(aoi, to = "POLYGON"): repeating attributes for all
#> sub-geometries for which they may not be constant
#> Simple feature collection with 1 feature and 4 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -71.80731 ymin: 18.58134 xmax: -71.33268 ymax: 18.69799
#> Geodetic CRS: WGS 84
#> # A tibble: 1 × 5
#> WDPAID NAME DESIG_ENG ISO3 geom
#> <dbl> <chr> <chr> <chr> <POLYGON [°]>
#> 1 478140 Sierra de Neiba National Park DOM ((-71.76202 18.66333, -71.74668 18…
In the following, we will simulate a portfolio consisting of several
polygons (assets, in the jargon of this package). To this end, we create
smaller polygons within the original extent of the main polygon. This
way, we can showcase the behavior of the
{mapme.biodiversity}
package for portfolios that contain
multiple assets. We will only select single assets with geometry type
'POLYGON'
that lie within the original boundary of the
protected area.
aoi_gridded <- st_make_grid(
x = st_bbox(aoi),
n = c(10, 10),
square = FALSE
) %>%
st_intersection(aoi) %>%
st_as_sf() %>%
mutate(geom_type = st_geometry_type(x)) %>%
filter(geom_type == "POLYGON") %>%
select(-geom_type, geom = x) %>%
st_as_sf()
metanames <- names(st_drop_geometry(aoi))
aoi_gridded[metanames] <- st_drop_geometry(aoi)
plot(aoi_gridded)
Now, we are ready to initiate a portfolio object containing multiple
assets. We use the mapme_options()
function and set some
arguments, such as the output directory, that are important for the
subsequent processing.
# copying package internal resource to a temporary location
outdir <- file.path(tempdir(), "mapme.biodiversity")
dir.create(outdir)
resource_dir <- system.file("res", package = "mapme.biodiversity")
file.copy(resource_dir, outdir, recursive = TRUE)
#> [1] TRUE
mapme_options(
outdir = file.path(outdir, "res"),
verbose = TRUE
)
The outdir
argument points towards a directory on the
local file system of your machine. All downloaded resources will be
written to respective directories nested within outdir
.
Once you request a specific resource for your portfolio, only those
files will be downloaded that are missing to match its spatio-temporal
extent. This behavior is beneficial, e.g. in case you share the
outdir
between different projects to ensure that only
resources matching your current portfolio are returned.
The verbose
logical controls whether or not the package
will print informative messages during the calculations. Note, that even
if set to FALSE
, the package will inform users about any
potential errors or warnings.
You can check which indicators are available via the
available_indicators()
function:
available_indicators()
#> # A tibble: 26 × 3
#> name description resources
#> <chr> <chr> <list>
#> 1 active_fire_counts Number of detected fires by NASA FIRMS <tibble>
#> 2 active_fire_properties Extraction of properties of fires detected … <tibble>
#> 3 biome Areal statistics of biomes from TEOW <tibble>
#> 4 deforestation_drivers Areal statistics of deforestation drivers <tibble>
#> 5 drought_indicator Relative wetness statistics based on NASA G… <tibble>
#> 6 ecoregion Areal statstics of ecoregions based on TEOW <tibble>
#> 7 elevation Statistics of elevation based on NASA SRTM <tibble>
#> 8 fatalities Number of fatalities by group of conflict b… <tibble>
#> 9 gsw_change Statistics of the surface water change laye… <tibble>
#> 10 gsw_occurrence Areal statistic of surface water based on o… <tibble>
#> # ℹ 16 more rows
available_indicators("population_count")
#> # A tibble: 1 × 3
#> name description resources
#> <chr> <chr> <list>
#> 1 population_count Statistic of population counts <tibble [1 × 5]>
Say, we are interested in the population_count
indicator. We can learn more about this indicator and its required
resources by using either of the commands below or, if you are viewing
the online version, head over to the population_count
documentation.
By inspecting the help page we learned that this indicator requires
the worldpop
resource and it requires to specify two extra
arguments: the population statistic to calculate and the eninge to be
used for the calculation (learn more about engines here).
With that information at hand, we can start to retrieve the required
resource. We can learn about all available resources using the
available_resources()
function:
available_resources()
#> # A tibble: 23 × 5
#> name description licence source type
#> <chr> <chr> <chr> <chr> <chr>
#> 1 chirps Climate Hazards Group … CC - u… https… rast…
#> 2 esalandcover Copernicus Land Monito… CC-BY … https… rast…
#> 3 fritz_et_al Drivers of deforestati… CC-BY … https… rast…
#> 4 gfw_emissions Global Forest Watch - … CC-BY … https… rast…
#> 5 gfw_lossyear Global Forest Watch - … CC-BY … https… rast…
#> 6 gfw_treecover Global Forest Watch - … CC-BY … https… rast…
#> 7 global_surface_water_change Global Surface Water -… https:… https… rast…
#> 8 global_surface_water_occurrence Global Surface Water -… https:… https… rast…
#> 9 global_surface_water_recurrence Global Surface Water -… https:… https… rast…
#> 10 global_surface_water_seasonality Global Surface Water -… https:… https… rast…
#> # ℹ 13 more rows
available_resources("worldpop")
#> # A tibble: 1 × 5
#> name description licence source type
#> <chr> <chr> <chr> <chr> <chr>
#> 1 worldpop WorldPop - Unconstrained Global Mosaics 2000 - … CC-BY … https… rast…
For the purpose of this vignette, we are going to download the
worldpop
resource. We can get more detailed information
about a given resource, by using either of the commands below to open up
the help page. If you are viewing the online version of this
documentation, you can simply head over to the worldpop
resource documentation.
We can now make the worldpop
resource available for our
portfolio. We will use a common interface that is used for all
resources, called get_resources()
. We have to specify our
portfolio object and supply one or more resource functions with their
respective arguments. This will then download the matching resources to
the output directory specified earlier.
aoi_gridded <- get_resources(x = aoi_gridded, get_worldpop(years = 2010:2015))
#> Skipping existing files in output directory.
In case you want to download more than one resource, you can use the same interface and the resources will be made available sequentially. Required arguments for a resource are simply added as usual:
The next step consists of calculating specific indicators. Note that
each indicator requires one or more resources that were made available
via the get_resources()
function explained above. You will
have to re-run this function in every new R session, but note that data
that is already available will not be re-downloaded.
Here, we are goingto calculate the population_count
indicator which is based on the worldpop
resource. Since
the resource has been made available in the previous step, we can
continue requesting the calculation of our desired indicator. Note the
command below would issue an error in case a required resource has not
been made available via get_resources()
beforehand.
aoi_gridded <- calc_indicators(
aoi_gridded,
calc_population_count(engine = "zonal", stats = "sum")
)
#> Found a column named 'assetid'. Overwritting its values with a unique identifier.
Now let’s take a look at the results. We will select only some of the metadata and the output indicator column to get a clearer picture of what has happened.
(aoi_gridded <- aoi_gridded %>% select(assetid, population_count))
#> Simple feature collection with 23 features and 2 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -71.80731 ymin: 18.58134 xmax: -71.33268 ymax: 18.69799
#> Geodetic CRS: WGS 84
#> # A tibble: 23 × 3
#> assetid population_count geom
#> <int> <list> <POLYGON [°]>
#> 1 1 <tibble [6 × 2]> ((-71.78358 18.65871, -71.78358 18.67725, -71.79993…
#> 2 2 <tibble [6 × 2]> ((-71.79993 18.6867, -71.78358 18.67725, -71.75984 …
#> 3 3 <tibble [6 × 2]> ((-71.76202 18.66333, -71.74668 18.64602, -71.74431…
#> 4 4 <tibble [6 × 2]> ((-71.71238 18.62893, -71.71238 18.63615, -71.73611…
#> 5 5 <tibble [6 × 2]> ((-71.75984 18.69367, -71.75984 18.69095, -71.73611…
#> 6 6 <tibble [6 × 2]> ((-71.71429 18.68985, -71.73611 18.67725, -71.73611…
#> 7 7 <tibble [6 × 2]> ((-71.66702 18.63736, -71.68865 18.64985, -71.71238…
#> 8 8 <tibble [6 × 2]> ((-71.70975 18.68944, -71.68865 18.67725, -71.67141…
#> 9 9 <tibble [6 × 2]> ((-71.67141 18.68721, -71.68865 18.67725, -71.68865…
#> 10 10 <tibble [6 × 2]> ((-71.64095 18.64971, -71.64119 18.64985, -71.64178…
#> # ℹ 13 more rows
We obtained a new listed column in our sf
-object that is
named like the requested indicator. For each asset in our portfolio,
this column contains a tibble with 6 rows and two columns. Let’s have a
closer look at one of these objects.
aoi_gridded$population_count[10]
#> [[1]]
#> # A tibble: 6 × 2
#> population_count_sum year
#> <dbl> <chr>
#> 1 31.9 2010
#> 2 43.3 2011
#> 3 32.8 2012
#> 4 32.9 2013
#> 5 36.5 2014
#> 6 38.0 2015
For each asset, the result is a tibble in long format indicating the
population sum per year (make sure to read the detailed indicator
documentation via ?population_count
). Let’s quickly
visualize the results for a single asset:
If you wish to conduct your statistical analysis in R, you can use
{tidyr}
functionality to unnest one or multiple columns.
Especially for large portfolios, it is usually a good idea to keep the
geometry information in a separated variable to keep the size of the
data object relatively small.
geometries <- select(aoi_gridded, assetid)
aoi_gridded %>%
st_drop_geometry() %>%
tidyr::unnest(population_count)
#> # A tibble: 138 × 3
#> assetid population_count_sum year
#> <int> <dbl> <chr>
#> 1 1 305. 2010
#> 2 1 313. 2011
#> 3 1 324. 2012
#> 4 1 321. 2013
#> 5 1 286. 2014
#> 6 1 330. 2015
#> 7 2 217. 2010
#> 8 2 218. 2011
#> 9 2 224. 2012
#> 10 2 241. 2013
#> # ℹ 128 more rows
{mapme.biodiversity}
follows the parallelization
paradigm of the {future}
package. That means that you as a user are in the control if and how you
would like to set up parallel processing. Currently,
{mapme.biodiversity}
supports parallel processing on the
asset level of the calc_indicators()
function only. We also
currently assume that parallel processing is done on the cores of a
single machine. In future developments, we would like to support
distributed processing. If you are working on a distributed use-cases,
please contact the developers, e.g. via the discussion
board or mail.
To process e.g. 6 assets in parallel and report a progress bar you will have to set up the following in your code:
library(future)
library(progressr)
plan(multisession, workers = 6) # set up parallel plan with 6 concurrent threads
with_progress({
aoi_gridded <- calc_indicators(
aoi_gridded,
calc_population_count(
engine = "zonal",
stats = "sum"
)
)
})
plan(sequential) # close child processes
Note, that the above code uses future::multisession()
as
the parallel backend. This backend will resolve the calculation in
multiple background R sessions. You should use that backend if you are
operating on Windows, using R Studio or otherwise are not sure about
which backend to use. In case you are operating on a system that allows
process forking and are not using R Studio, consider using
future::multicore()
for more efficient parallel
processing.
You can use the write_portfolio()
function to save a
processed portfolio object to disk as a GeoPackage. This allows sharing
your data with others who might not be using R, but any other geospatial
software. Simply point towards a non-existing file on your local disk to
write the portfolio. The function will create an individual table for
all processed indicators. Via the read_portfolio()
function, a portfolio which has been written to disk in such a way can
be read back into R. However, users should note that the portfolio-wide
arguments that were set with mapme_options()
are
not reconstructed. Thus if you wish to continue to use
{mapme.biodiversity}
functionality on such a portfolio
object, make sure to re-run init_portfolio()` on it.