--- title: "ProxiMate: Structure of the applications" author: - name: Leonardo Ramirez-Lopez and Claudio Orellano email: ramirez-lopez.l@buchi.com affiliation: Data Science Department, BUCHILabortechnik AG, Flawil, Switzerland date: today clean: true bibliography: ["proximetricsR.bib"] biblio-style: "apalike" link-citations: true format: html: toc: true toc-depth: 3 toc-location: left number-sections: true code-overflow: wrap smooth-scroll: true html-math-method: mathjax vignette: > %\VignetteIndexEntry{ProxiMate: Structure of the applications} %\VignetteEncoding{UTF-8} %\VignetteEngine{quarto::html} --- # Introduction This package can be used to build and/or update NIR applications that are ready to be consumed by the [ProxiMate series of NIR sensors](https://www.buchi.com/en/products/instruments/proximate) manufactured by BUCHI Labortechnik AG. Once an application is installed in a ProxiMate device, it can be used to predict the properties of a given matrix using the spectral models contained in that application. This package builds upon the standard structure of the ProxiMate applications, which are conventionally developed with the compiled-executable software "NIRWise PLUS" offered by BUCHI. Therefore, the files output by `proximetricsR` follow the same structure of the ones output by "NIRWise PLUS". No changes or improvements on these output files have been conducted for the development of `proximetricsR`. # Structure of the ProxiMate predictive applications ProxiMate applications can be described as a collection of predictive models that, along with other metadata, are packed into a single file that can be installed in a ProxiMate sensor. Once an application is imported, the sensor can be used to predict the properties of samples the application was built for. A Proximate application comprises the following files: - Calibration data file (.tsv): a file containing the main data used to build the predictive models. - Local data file (.tsv): a file where the spectral data (and eventually property data) collected by the user is stored. - Calibration model files (.cal): these files contain a predictive model along with the spectral processing methods to be applied before the model is used for prediction. A .cal file for every model is present in the application. - Project files (.prj): these files contain information about the model parameters and results. For each model in the application, a .prj file is present in the application. - Report files (.rtf): a file containing a summary of the model calibration results. - Application metadata file (.nad): a file containing some parameters of the application (e.g. scanning time, standard operating procedure, property units, etc). - Application file (.nax): a single "container" file where all the previous files are stored as shown in Figure @fig-filestructure. ```{r} #| label: fig-filestructure #| out-width: "80%" #| fig-cap: "Files inside the application file" #| echo: false #| fig-align: center #| fig-retina: 0.85 #| out-extra: 'style="background-color: #FFFFFF; border: 10px solid transparent; padding:0px; display: inline-block;"' knitr::include_graphics("file_structure.jpg") ``` ## Calibration data file (.tsv) This is the main input file which contains the predictor (spectra) and response (properties) data used to calibrate the models. The data is stored as a tab separated table. Every row in this table represents a single spectral measurement of a sample along with its associated data (e.g. property values, date, etc.). This table is usually exported directly from the ProxiMate devices and typically contains columns with the following fields: - `ROW`: a numeric which indicates the row number. - `Check`: a logical value `true` or `false`. Note that these values are written in low-case letters and do not represent `R` logical values, therefore they are interpreted in `R` as characters. However, they are interpreted by ProxiMate systems as logical. - `Date`: the date in which the measurement was collected (day/month/year hour:min:sec, e.g. `17/12/2020 10:06:25`) - `SNR`: a character string with the serial number(s) of the detector(s) in the NIR sensor device. An additional serial number is provided if measurements were done with a sensor that includes a detector for measuring spectra in the visible range of the electromagnetic spectrum (e.g. `918FG118;1502091`). - `ID`: a character string indicating the sample name. - `Barcode`: a character string with metadata of the sample. It is a placeholder for data which is typically read from barcode scanners. - `Note`: a character string with metadata of the sample. - `Result`: a string containing numbers separated by a semicolon. Each number indicates the predicted value of a property. This information is shown if the samples in the file were collected by the ProxiMate sensor using a predictive application. This column is not used by `proximetricsR`. - `Reference`: a string containing numbers separated by a semicolon. Each number indicates the reference value of a property. This information is shown if the user of the ProxiMate sensor input these values in the instrument. This column is not used by `proximetricsR`. - Properties: multiple columns that contain the reference values of the corresponding property. For each property, there is a single column with the property name as its header. The name of each property is assigned by the user. - `Begin`: a string indicating the time at which the measurement started (hour:min:sec, e.g. `10:06:25`). - `End`: a string indicating the time at which the measurement ended (hour:min:sec, e.g. `10:06:40`). - `Recipe`: a string indicating the matrix or the application name. - `Composition`: an empty column (not used). - `Images`: an empty column (not used). - Spectral data: the spectral data in ProxiMate devices is collected by using diode-array detectors [see @workman2001commercial for a description of this type of technology]. These detectors record the spectral information at different photodiode pixels. The .tsv file contains absorbance data collected at each pixel by the detectors. Each pixel represents an specific wavelength (in nm units). To obtain the wavelength information for each pixel a polynomial function needs to be applied to each pixel number/index. The following columns contain all the necessary spectral information: - `#X1`: index of the first pixel(s). If there are two detectors (visible and NIR), this field contains two numbers: the index of the first pixel for the visible detector and a second number indicating the index of the first pixel of the NIR detector. For example, a value `823, 4` indicates that the index of the first pixel for the visible detector is `823` and the index of the first pixel for the NIR detector is `4`. The NIR pixel indices are zero-based, therefore the correct one-based counts is`5`. - `#X2`: index of the last pixel(s). If there are two detectors (visible and NIR), this filed contains two numbers: the index of the last pixel for the visible detector and a second number indicating the last of the first pixel of the NIR detector. For example, a value `1074, 272` indicates that the index of the last pixel for the visible detector is `1074` and the index of the last pixel for the NIR detector is `272.` The NIR pixel indices are zero-based, therefore the corrected one-based counts is `273`. - `#X3`: a string with the set(s) of polynomial coefficients. If there are two detectors (visible and NIR), this field will contain two sets of coefficients, otherwise it contains only one set corresponding to the coefficients of the NIR detector. The coefficients are separated by semicolon, while the set of coefficients (if applies) are separated by a comma. The polynomial order is inferred/derived from the number of values in the set of coefficients and they are sequentially arranged from the largest to the lowest degree. For example, the value `0;0;-7.586146E-05;2.12726;-1301.079, 2.04E-10;-1.28E-07;2.80E-05;-4.76E-03;3.89;880.06` represents the coefficients of a fourth degree polynomial for the visible part (before the comma) and a fifth degree polynomial for the near-infrared part (after the comma). For example, the first value in the wavelengths for both NIR and VIS, using the example values for `#X1` to `#X3` as above, can be obtained as follows: ::: {.indented-block} ```{r} #| eval: true first_pixel_nir <- 4 first_pixel_count_nir <- first_pixel_nir + 1 first_wavelength_nir <- 2.04E-10 * first_pixel_count_nir^5 + -1.28E-07 * first_pixel_count_nir^4 + 2.80E-05 * first_pixel_count_nir^3 + -4.76E-03 * first_pixel_count_nir^2 + 3.89 * first_pixel_count_nir + 880.06 first_wavelength_nir first_pixel_count_vis <- 823 first_wavelength_vis <- 0.0 * first_pixel_count_vis^4 + 0.0 * first_pixel_count_vis^3 + -7.586146E-05 * first_pixel_count_vis^2 + 2.12726 * first_pixel_count_vis + -1301.079 first_wavelength_vis ``` A sequence of numbers can be used to obtain all the wavelengths at once. This sequence must start with the index of the first pixel and end with the index of the last pixel. Continuing the previous example, we can obtain all wavelengths at once as follows: ```{r} #| eval: true pixel_sequence_nir <- 4:272 pixel_sequence_count_nir <- pixel_sequence_nir + 1 wavelengths_nir <- 2.04E-10 * pixel_sequence_count_nir^5 + -1.28E-07 * pixel_sequence_count_nir^4 + 2.80E-05 * pixel_sequence_count_nir^3 + -4.76E-03 * pixel_sequence_count_nir^2 + 3.89 * pixel_sequence_count_nir + 880.06 wavelengths_nir[1:5] wavelengths_nir[(length(wavelengths_nir) - 5):length(wavelengths_nir)] pixel_sequence_vis <- 823:1074 wavelengths_vis <- 0.0 * pixel_sequence_vis^4 + 0.0 * pixel_sequence_vis^3 + -7.586146E-05 * pixel_sequence_vis^2 + 2.12726 * pixel_sequence_vis + -1301.079 wavelengths_vis[1:5] wavelengths_vis[(length(wavelengths_vis) - 5):length(wavelengths_vis)] ``` ::: - [ ]{style="color:white;" .no-marker} - Spectral data: the spectral data of each pixel are stored in the columns named with a hash character followed by a number (e.g. `#3`). Note that these numbers do not represent the pixel index. ## Local data file (.tsv) This file has the same structure as the "Calibration data file". The only difference is that this local file is used to store spectra measured by the user of the application. A spectrum is written to this file only if one or more of its reference (response) values are manually input by the user directly in the instrument. ## Calibration model files (.cal) These files store both the instructions for spectral pre-processing and the parameters of calibrated models. Each file contains one single model, i.e. if an application contains _n_ predictive models, then there will be _n_ cal files in the application. These files are used by the the sensor instruments to conduct the required predictions of the response variables in the application. ## Project files (.prj) These files store all the final results obtained when a calibration model is built. This file can be imported into the NIRWise PLUS software to visualize the calibration model results and the used settings. Note that the files are not used to generate predictions as their purpose is limited to the visualization and review of the models. The project file may contain the pre-pocessed spectra, the data matrices generated during the calibration process, the information about the type of model validation, the outlier detection method used, the response residuals, etc. ## Report files (.rtf) These files are in rich text format, each containing a report on the results of the calibration of a single response variable in the application. This report includes information such as the original tsv file used for the calibrations, the number of observations used and their indices in the tsv table, the standard error of the calibration (SEC), the coefficient of determination (R^2), etc. ## Application metadata file (.nad) This file contains application metadata such as the application name (the name that will be shown when it is imported into a sensor), the sample measurement geometry, measurement time, the creation date, the standard operating procedure, additive (offset) and multiplicative (slope) adjustments to the predicted response values, outlier detection parameters, etc. ## Application file (.nax) This file acts as a container for all the files described above. It is in fact a [ZIP file](https://en.wikipedia.org/wiki/ZIP_(file_format)) used to pack and compress the application files. The file and folder structure inside a container with _n_ predictive models can be described as follows: ``` .nax │ .nad │ └───Calibrations │ ..cal │ ..prj │ ..rtf │ ... │ ..cal │ ..prj │ ..rtf │ └───Data │ │ .tsv │ └───Local │ .tsv ``` # References {-}