library("proximetricsR")2026-06-25
Spectral preprocessing is a critical step in near-infrared (NIR) calibration workflows. Raw spectral data often contains systematic variations, noise, and artifacts that can obscure the true relationship between spectra and reference properties. The proximetricsR package provides a flexible, composable system for building and applying preprocessing pipelines.
prep_*() functions): Create specifications for individual preprocessing stepspreprocess_recipe()): Assemble multiple constructors into an ordered sequenceprocess()): Apply the recipe to spectral data matricesThis separation of specification and execution enables reproducible, device-aware preprocessing pipelines that can be stored, shared, and applied consistently.
library("proximetricsR")data("NIRcannabis")
X <- NIRcannabis$spcThe prep_*() functions create preprocessing step objects. Each constructor validates its parameters and encodes algorithm-specific information. The order in which constructors are passed to preprocess_recipe() defines the execution order.
prep_resample()Resampling interpolates spectra to a new wavelength or wavenumber grid.
ProxiMate mode (user-defined grid):
prep_resample(grid = c(1001, 1700, 2))- prep_resample
min_wav: 1001; max_wav: 1700; resolution: 2
ProxiScout mode (NeoSpectra standard grid):
prep_resample(grid = "proxiscout")- prep_resample
Resampling is often the first step to standardize wavelength grids across different instruments.
prep_smooth()Smoothing reduces high-frequency noise while preserving spectral features.
Savitzky-Golay (ProxiScout compatible):
prep_smooth(w = 11, p = 3, algorithm = "savitzky-golay")- prep_smooth
w: 11; p: 3; algorithm: 'savitzky-golay'
Moving average (ProxiMate compatible):
prep_smooth(w = 7, algorithm = "moving-average")- prep_smooth
w: 7; algorithm: 'moving-average'
w: window size (must be a positive odd integer)p: polynomial order for Savitzky-Golay (must be < w)prep_snv()SNV (Standard Normal Variate) normalizes each spectrum independently by centering and scaling:
\[SNV_i = \frac{x_i - \bar{x}_i}{s_i}\]
where \(\bar{x}_i\) and \(s_i\) are the mean and standard deviation of the \(i\)-th spectrum.
prep_snv()- prep_snv
SNV corrects for multiplicative effects (e.g., baseline offsets, path length variations) and is device-agnostic.
prep_derivative()Derivatives enhance spectral differences and reduce baseline effects.
Savitzky-Golay (ProxiScout):
prep_derivative(m = 1, w = 11, p = 3, algorithm = "savitzky-golay")- prep_derivative
m: 1; w: 11; p: 3; algorithm: 'savitzky-golay'
Gap-Segment (ProxiScout):
prep_derivative(m = 2, w = 9, p = 3, algorithm = "gap-segment")- prep_derivative
m: 2; w: 9; p: 3; algorithm: 'gap-segment'
NIRWise PLUS compatible (ProxiMate):
prep_derivative(m = 1, w = 5, p = 11, algorithm = "nwp")- prep_derivative
m: 1; w: 5; p: 11; algorithm: 'nwp'
Parameters: - m: derivative order (1 or 2) - w: window/gap size (positive odd integer) - p: polynomial order (Savitzky-Golay) or smoothing window (gap-segment, nwp) - algorithm: choice of method
prep_detrend()Detrending removes wavelength-dependent baseline effects by fitting and removing a polynomial trend (ProxiScout only):
prep_detrend(p = 2)- prep_detrend
p: 2
p: polynomial order (default 2)For the full Barnes et al. (1989) procedure (SNV + detrending), chain prep_snv() before prep_detrend().
prep_transform()Convert between reflectance and absorbance using Beer’s Law (ProxiScout only):
\[A = -\log_{10}(R)\]
prep_transform(to = "absorbance")- prep_transform
to: 'absorbance'
to: target unit ("absorbance" or "reflectance")prep_wav_trim()Retain only a specified wavelength band and/or remove constant-valued edge columns (ProxiScout only):
prep_wav_trim(band = c(1000, 1800), trim_constant_edges = TRUE)- prep_wav_trim
band: 1000, 1800; trim_constant_edges: TRUE
band: wavelength range to retain (or c() to skip)trim_constant_edges: remove zero or constant-valued columns at edgesThe preprocess_recipe() function combines constructors into an ordered pipeline. Order matters: preprocessing steps are applied in the order specified.
Different BUCHI devices support different preprocessing steps:
ProxiMate supports: - prep_resample() with user-defined grids - prep_smooth() with moving-average algorithm - prep_snv() - prep_derivative() with nwp algorithm
ProxiScout supports: - prep_resample() with “proxiscout” grid - prep_smooth() with savitzky-golay algorithm - prep_snv() - prep_derivative() with savitzky-golay or gap-segment algorithms - prep_detrend() - prep_transform() - prep_wav_trim()
Single preprocessing step (SNV only):
SNV is device-agnostic, so device is optional:
recipe_snv <- preprocess_recipe(prep_snv())
recipe_snvSpectral preprocessing recipe (device: "unspecified"):
- Step 1: prep_snv
Multiple steps (requires device):
recipe_ps <- preprocess_recipe(
prep_smooth(w = 7, p = 1, algorithm = "savitzky-golay"),
prep_snv(),
prep_derivative(m = 1, w = 5, p = 2, algorithm = "savitzky-golay"),
device = "proxiscout"
)
recipe_psSpectral preprocessing recipe (device: "proxiscout"):
- Step 1: prep_smooth
w: 7; p: 1; algorithm: 'savitzky-golay'
- Step 2: prep_snv
- Step 3: prep_derivative
m: 1; w: 5; p: 2; algorithm: 'savitzky-golay'
ProxiMate-specific recipe:
recipe_pm <- preprocess_recipe(
prep_smooth(w = 7, algorithm = "moving-average"),
prep_snv(),
prep_derivative(m = 1, w = 5, p = 11, algorithm = "nwp"),
device = "proximate"
)
recipe_pmSpectral preprocessing recipe (device: "proximate"):
- Step 1: prep_smooth
w: 7; algorithm: 'moving-average'
- Step 2: prep_snv
- Step 3: prep_derivative
m: 1; w: 5; p: 11; algorithm: 'nwp'
Recipes validate that all steps are compatible with the specified device and raise informative errors if not.
process()The process() function executes a recipe on spectral data:
X_snv <- process(X, recipe_snv)
dim(X_snv)[1] 80 234
X_ps <- process(X, recipe_ps)
dim(X_ps)[1] 80 224
The applied recipe is stored as an attribute and can be retrieved:
applied_recipe <- attr(X_ps, "preprocess_recipe")
applied_recipeSpectral preprocessing recipe (device: "proxiscout"):
- Step 1: prep_smooth
w: 7; p: 1; algorithm: 'savitzky-golay'
- Step 2: prep_snv
- Step 3: prep_derivative
m: 1; w: 5; p: 2; algorithm: 'savitzky-golay'
A typical ProxiMate workflow for fat/protein prediction:
recipe_pm_fat <- preprocess_recipe(
prep_smooth(w = 7, algorithm = "moving-average"),
prep_snv(),
prep_derivative(m = 1, w = 5, p = 11, algorithm = "nwp"),
device = "proximate"
)
X_fat_prep <- process(X, recipe_pm_fat)
head(X_fat_prep[, 1:5]) 1025 1028 1031 1034 1037
[1,] -0.0137 -0.0138 -0.0138 -0.0135 -0.0131
[2,] -0.0208 -0.0213 -0.0215 -0.0215 -0.0212
[3,] -0.0184 -0.0186 -0.0187 -0.0185 -0.0181
[4,] -0.0122 -0.0124 -0.0124 -0.0122 -0.0119
[5,] -0.0170 -0.0171 -0.0170 -0.0167 -0.0161
[6,] -0.0134 -0.0135 -0.0135 -0.0132 -0.0128
ProxiScout instruments benefit from additional preprocessing steps:
recipe_ps_full <- preprocess_recipe(
prep_resample(grid = "proxiscout"),
prep_smooth(w = 7, p = 1, algorithm = "savitzky-golay"),
prep_snv(),
prep_detrend(p = 2),
prep_derivative(m = 1, w = 5, p = 2, algorithm = "savitzky-golay"),
device = "proxiscout"
)
X_ps_full <- process(X, recipe_ps_full)
dim(X_ps_full)[1] 80 102
Sometimes less is more. A minimal recipe with only SNV:
recipe_minimal <- preprocess_recipe(prep_snv())
X_minimal <- process(X, recipe_minimal)Select a specific wavelength range for ProxiScout:
recipe_band <- preprocess_recipe(
prep_wav_trim(band = c(1100, 1600)),
prep_smooth(w = 5, p = 1, algorithm = "savitzky-golay"),
prep_snv(),
device = "proxiscout"
)
X_band <- process(X, recipe_band)
colnames(X_band)[c(1, ncol(X_band))][1] "1106" "1592"
Preprocessing steps affect each other. Common orderings:
Always specify device = "proximate" or device = "proxiscout" when building recipes (except for SNV-only recipes). This ensures recipes are portable and the preprocessing is compatible with the target device.
Store recipes alongside calibration models to ensure preprocessing is applied identically during prediction. The process() function attaches the recipe as an attribute for downstream tracking.
Preprocessing parameters are typically tuned during model development:
The preprocessing recipe system in proximetricsR provides a structured, reproducible approach to spectral preprocessing:
This design enables seamless integration with calibration workflows and ensures preprocessing is applied consistently from model development through deployment on BUCHI devices.