Welcome to the neonSoilFlux package! This vignette will
guide you through the process of using this package to acquire and
compute soil CO2 fluxes at different sites in the National
Ecological Observatory Network.
You can think about this package working in two primary phases:
acquire_neon_data). This includes:
compute_neon_flux).We split these two functions in order to optimize time and that both
were fundamentally different processes. Acquiring the NEON data makes
use of the neonUtilities package.
This package takes the guess work out of which data products to
collect, hoping to reduce the workflow needed. We rely very much on the
tidyverse philosophy for computation and coding here.
An overview of the package is also presented in
neonSoilFlux: An R Package for Continuous Sensor-Based
Estimation of Soil CO2 Fluxes, published in Methods in Ecology
and Evolution.
neonSoilFlux R package. a) Acquire: Data
are obtained for a given NEON location and horizontal sensor location,
which includes soil water content, soil temperature, CO\(_{2}\) concentration, and atmospheric
pressure. All data are screened for quality assurance; if gap-filling of
missing data occurs, it is flagged for the user. b)
Harmonize: Any belowground data are then harmonized to
the same depth as CO2 concentrations using linear regression.
c) Compute: The flux across a given depth is computed
via Fick’s law, denoted with Fijk, where \(i\), \(j\), or \(k\) are either 0 or 1 denoting the layers
the flux is computed across (\(i\) =
closest to surface, \(k\) = deepest).
F000 represents a flux estimate where the gradient \(dC/dz\) is the slope of a linear regression
of CO2 with depth.NEON is now requiring an API token to access their data. You can find information about acquiring a token at https://www.neonscience.org/resources/learning-hub/tutorials/api-token-setup.
Once you have an NEON API token, you can set it with the function
neon_api_token:
Load up the relevant libraries:
Let’s say we want to acquire the NEON soil data at the
SJER site during the
month June in 2022:
Two required inputs are needed to run the function acquire_neon_data:
As the data are acquired various messages from the
loadByProduct function from the neonUtilities
package are shown - this is normal. Products are acquired from each
spatial location (horizontalPosition) or vertical depth
(verticalPosition) at a NEON site.
Outputs for acquire_neon_data are two nested data
frames:
site_data This contains three variables: the
measurement name (one of soilCO2concentration,
VSWC (soil water content), soilTemp (soil
temperature), and staPres (atmospheric pressure)),
monthly_mean contains the mean value of the measurement at
each horizontal and vertical depth. We compute the monthly mean using a
bootstapped technique. data which contains the stacked
variables acquired from neonUtilities - the horizontal and vertial
positions, timestamp (in UTC), associated values, the QF flag (0 = pass,
1 = fail, LINK)site_megapit: the nested data frame of the soil
sampling data, found here LINK. This data table is
essential what is reported back from acquiring the data product from
NEON.For each data product, the acquire_neon_data function
also performs two additional checks:
swc_correct. Information about regarding this
correction is found here: LINK.
Once updated sensors are installed in the future we will depreciate this
function.The function acquire_neon_data has additional input
options that may be useful for your work:
token: The string of the NEON API token. The default is
NULL, but you can supply a API token directly. Acquiring a
NEON token is at https://www.neonscience.org/resources/learning-hub/tutorials/api-token-setup.time_frequency Will you be using 30 minute
("30_minute") or 1 minute ("1_minute")
recorded data? The currently set default is 30 minutes. 1 minute data is
implemented, but has not been sufficiently tested (and it also requires
a lot of in-computer memory).provisional: Should you use provisional data when
downloading? This option is useful if you are accessing data that is not
part of the most current NEON
data release (i.e. the current year). Defaults to FALSE.depth_chop: This is useful if you want to only compute
fluxes with measurement levels to a certain depth. There are typically 8
measurement levels below ground. Currently set to NULL (all
levels). The provided integer must be greater than 4 (top 4
levels).With the resulting output from acquire_neon_data, you
can then unnest the different data frames to make plots. The following
code plots the timeseries of volumetric soil water content across all
spatial locatios at SJER:
The monthly mean is utilized when a given measurement fails final QF
checks. This function is provided by code
from Zoey Werbin. At each
replicate location (horizontalPosition) and soil depth, and
a monthly mean is computed when there are at least 15 days of
measurements.
Assume you have a vector of measurements \(\vec{y}\), standard errors \(\vec{\sigma}\), and expanded uncertainty \(\vec{\epsilon}\) (all of length \(M\)) that passes the QF checks in a given month. By definition, the expanded uncertainty \(\vec{\epsilon}\) includes a 95% confidence interval, so \(\vec{\sigma}_{i}\leq\vec{\epsilon}_{i}\). Additionally, we define the bias \(\vec{b}=\sqrt{\left(\vec{\epsilon}\right)^{2}-\left(\vec{\sigma}\right)^{2}}\) to be the quadrature difference between the expanded uncertainty and the standard error.
We generate a bootstrap sample of the mean \(\overline{y}\) and standard error \(\overline{s}\) the following ways. For our cases we set the number of bootstrap samples \(N\) to be 5000. Individual entries for \(\overline{y}_{i}\) and \(\overline{s}_{i}\) are determined by the following:
R will recycle the
vector \(\vec{y}\) so that this sample
is of length \(M\). We will call the
sample of \(\vec{y}\) as \(\vec{x}\).Once that is complete, the reported monthly mean and standard deviation is \(\overline{\overline{y}}\) and \(\overline{s}\).
Once we have out_env_data from
acquire_neon_flux, we then compute the fluxes at this
site:
out_fluxes <- compute_neon_flux(
input_site_env = out_env_data$site_data,
input_site_megapit = out_env_data$site_megapit
)The resulting data frame out_fluxes has the following
variables:
startDateTime: Time period of measurement (as
POSIXct)horizontalPosition: Sensor location where flux is
computedflux_compute: A nested tibble with soil flux gradients
computed via different diffusitivies at different measurement depths.
See below.surface_diffusivity: Computation of surface diffusivity
(see below)soilCO2concentrationMeanQF: QF flag for soil CO2
concentration across all vertical depths at the given horizontal
position: 0 = no issues, 1 = monthly mean used in measurement, 2 = QF
failVSWCMeanQF: QF flag for volumetric soil water content
(VSWC) across all vertical depths at the given horizontal position: 0 =
no issues, 1 = monthly mean used in measurement, 2 = QF failsoilTempMeanQF: QF flag for soil temperature across all
vertical depths at the given horizontal position: 0 = no issues, 1 =
monthly mean used in measurement, 2 = QF failstaPresMeanQF: QF flag for atmospheric pressure at the
given horizontal position: 0 = no issues, 1 = monthly mean used in
measurement, 2 = QF failA QF measurement fails when there is a monthly mean could not be computed for a measurement. If any of the input variables (soil CO2, VSWC, soil temperature, and atmospheric pressure) have a QF fail, then all flux calculations to fail at that given horizontal position.
The nested data frame flux_compute has the following
structure:
diffus_method: The type of diffusivity used to
compute fluxes. Currently implemented are “Marshall” or
“Millington-Quirk”
flux: The calculated soil flux (\(\mu\)mol m-2
s−1)
flux_err: The calculated flux error (by
quadrature)
gradient: The computed CO2 flux gradient
(\(\mu\)mol m-3
m{−1})
gradient_error: The computed CO2 flux
gradient error (by quadrature)
method: Each site had three measurement layers, so
we denote the flux as a three-digit subscript \(F_{ijk}\) with indicator variables \(i\), \(j\), and \(k\) indicate if a given layer was used
(written in order of increasing depth), according to the following:
r2: The R2 value from the linear
regression for F000. Otherwise it is
NA.
The nested data frame surface_diffusivity has the
following structure:
zOffset: The depth that diffusivity is computed
at.diffusivity: The calculated soil flux (\(\mu\)mol m-2
s{−1})diffusExpUncert: The calculated diffusivity uncertainty
(by quadrature)diffus_method: The type of diffusivity used to compute
fluxes. Currently implemented are “Marshall” or “Millington-Quirk”You can see the distribution the QF flags for each environmental
measurement with env_fingerprint_plot:
The resulting plot has rows corresponding to the replicate plots
(horizontalPosition), and columns corresponding to the
different environmental measurements used when computing fluxes.
Similarly, you can see the distribution of QF flags for each
diffusivity and flux computation with
flux_fingerprint_plot. Because there are two different
diffusivities implemented (“Marshall” or “Millington-Quirk”), that
option needs to be passed to flux_fingerprint_plot:
# Fingerprint plot for Marshall method:
flux_fingerprint_plot(
input_fluxes = out_fluxes,
input_diffus_method = "Marshall")
# Fingerprint plot for Marshall method:
flux_fingerprint_plot(
input_fluxes = out_fluxes,
input_diffus_method = "Millington-Quirk")(The default method is "Marshall"). The resulting plot
has rows corresponding to the replicate plots
(horizontalPosition), and columns corresponding to the
vertical levels using when computing the gradient for the soil flux.
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.
This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.