1 Smithsonian Tropical Research Institute, Balboa, Ancón, Republic of Panama, 2 Corporación Geológica ARES, Bogotá, Colombia. 3 Servicio Geológico Colombiano, Bogotá, Colombia.
SDAR is a fast and consistent tool for plotting and facilitating the analysis of stratigraphic and sedimentological data, designed to plot detailed stratigraphic sections and to perform quantitative stratigraphic analyses.
Stratigraphic Columns (SC) are the most useful and common ways to represent the field descriptions (e.g., grain size, the thickness of rock packages, fossil content and lithological components) of rock sequences and well logs. In these representations, the width of SC vary according to the grain size (i.e., the wider the strata, the coarser the rocks (Miall 1990; Tucker 2011), and the thickness of each layer is represented at the vertical axis of the diagram. Typically these representations are drawn 'manually' using vector graphic editors (e.g., Adobe Illustrator®, CorelDRAW®, Inskape). Nowadays there are various software packages which automatically plots SCs, but there are not versatile open-source tools and it is very difficult to both store and analyse stratigraphic information.
This document presents Stratigraphic Data Analysis in R (SDAR), an analytical package designed for both plotting and facilitate the analysis of Stratigraphic Data in R (R Core Team 2019). SDAR, uses simple stratigraphic data and takes advantage of the flexible plotting tools available in R to produce detailed SCs. The main benefits of SDAR are:
To install SDAR package from CRAN:
install.packages("SDAR")
The standard workflow in SDAR consists of
saltarin_beds
To explore the functionalities of SDAR, we will use the publicly available dataset of Saltarin well,
saltarin_beds
is the example dataset available within SDAR,
this dataset gives a lithologic description for borehole Saltarin 1A, located in the Llanos Basin in
eastern Colombia (4.612 N, 70.495 W). The stratigraphic well Saltarin 1A drilled 671 meters of the
Miocene succession of the eastern Llanos basin, corresponding to the Carbonera (124.1 m; 407.1
ft), Leon (105.1 m; 344.8 ft), and Guayabo Formations (441.8 m; 1449.5 ft) (Bayona, et al. 2008). The Saltarin core was described at a scale of 1:50 for
identification of grain-size trends, sedimentary structures, clast composition, the thickness of
lamination, bioturbation patterns, and macrofossil identification, all of which are used for
identifying individual lithofacies and for sedimentological and stratigraphic analyses
(Jaramillo et al., 2017).
The command data(saltarin_beds)
will load the dataset saltarin_beds
into
the current R session.
library(SDAR) # Load SDAR library
data(saltarin_beds) # load Saltarin demo dataset
class(saltarin_beds)
#> [1] "data.frame"
# check the content and the structure of Saltarin_beds dataset
nrow(saltarin_beds) # number of rock layers
#> [1] 686
ncol(saltarin_beds) # number of variables recording composition and texture description of each layer
#> [1] 22
names(saltarin_beds) # variable names of composition and texture description of each layer
#> [1] "bed_number" "base" "top"
#> [4] "rock_type" "prim_litho" "grain_size"
#> [7] "prim_litho_percent" "sec_litho" "grain_size_sec_litho"
#> [10] "sec_litho_percent" "base_contact" "grading"
#> [13] "grain_size_base" "grain_size_top" "sorting"
#> [16] "roundness" "matrix" "cement"
#> [19] "fabric" "munsell_color" "Rcolor"
#> [22] "notes"
Note that saltarin_beds
is a data frame object with 686 layers (rows), and 22 variables
(columns) storing thickness, composition and texture description of each layer, stored following the
suggested format by SDAR (to get more details about the specific types of data required by SDAR,
check SDAR_data_model
vignette).
In order to draw a stratigraphic layer in SDAR,
the minimum information required for each layer is bed_number
, thickness (i.e, it is
defined by a base
and a top
), rock_type
, prim_litho
,
and grain_size
. In summary, a table with the structure presented in table 1 must be
provided.
This example is from a borehole core where depths are measured down from the surface,
therefore “base” is greather than “top”.
bed_number base top rock_type prim_litho grain_size 1 671 670.2 sedimentary claystone clay 2 670.2 669.4 covered 3 669.4 669.18 sedimentary sandstone medium sand 4 669.18 667.6 sedimentary limestone wackestone 5 667.6 667.2 sedimentary conglomerate boulder 6 667.2 666.2 sedimentary shale silt
# header of the mandatory fields of "saltarin_beds" dataset to draw a graphic log using SDAR
head(saltarin_beds[,1:6])
#> bed_number base top rock_type prim_litho grain_size
#> 1 1 671.00 670.20 sedimentary claystone clay
#> 2 2 670.20 669.40 sedimentary siltstone silt
#> 3 3 669.40 669.18 sedimentary siltstone silt
#> 4 4 669.18 667.60 sedimentary claystone clay
#> 5 5 667.60 667.20 sedimentary siltstone silt
#> 6 6 667.20 666.20 sedimentary siltstone silt
NOTE: The SDAR project includes the development of a graphic user interface to connect this R package with a database management system; for this reason the structure of the data and headers (column names) should be followed in order to match the database structure.
To improve communication between geoscientists, some conventions, defined by sedimentologists to draw lithology patterns, and to describe grain size, color and so on, are implemented. Details on the information required to define a layer and the sources for the conventions implemented are provided in the vignette “SDAR data model”.
vignette("SDAR_data_model")
We have provided on the SDAR repository a template of the data format used by SDAR as a Microsoft Excel spreadsheet, SDAR_v0.95_beds_template.xlsx. This is the suggested format by SDAR to store thickness, composition and texture description of rock layers (beds). The data for each bed should be presented as a row, with columns for each of the parameters entered for that bed (e.g., thickness, lithology, grain size and so on).
The simplest way to get your stratigraphic data into R for use with SDAR is to fill out the SDAR
beds Excel template and import this file into R. There are several functions to load Excel files
into R, below are the steps to import an Excel file using the readxl
package.
To install readxl package from CRAN:
install.packages("readxl")
In order to import an Excel file, navigate to your working directory (for example, with setwd()
),
or add the full path where your file is stored to the read_excel
function.
library (readxl) # load the readxl package
my_beds <- read_excel("file_name.xlsx") # on your working directory
my_beds <- read_excel("Path where your Excel file is stored/file_name.xlsx") # setting full path
# Notice that the separator between folders is forward slash (/), as it is on Linux and Mac systems.
# When working in Windows, you need to either use the forward slash or using double backslash (\\).
my_beds <- read_excel("C:\\Users\\john\\Desktop\\File_name.xlsx") # full path example in windows systems
The Saltarin well example dataset available within SDAR is also accesible in Excel format, it is available
in installed files folder inst/extdata
, to find inst/extdata/SDAR_v0.95_beds_saltarin.xlsx
,
you need to call system.file("extdata", "mydata.xlsx", package = "mypackage")
.
# Read the SDAR beds external data example (Excel file format)
library (readxl)
fpath <- system.file("extdata", "SDAR_v0.95_beds_saltarin.xlsx", package = "SDAR")
beds_data <- read_excel(fpath)
nrow(beds_data) # number of rock layers
#> [1] 686
names(beds_data) # variable names of composition and texture description of each layer
#> [1] "bed_number" "base" "top"
#> [4] "rock_type" "prim_litho" "grain_size"
#> [7] "prim_litho_percent" "sec_litho" "grain_size_sec_litho"
#> [10] "sec_litho_percent" "base_contact" "grading"
#> [13] "grain_size_base" "grain_size_top" "sorting"
#> [16] "roundness" "matrix" "cement"
#> [19] "fabric" "munsell_color" "Rcolor"
#> [22] "notes"
strata
classValidating data is all about checking whether a dataset meets all the requirements it must to fulfill,
and the strata
function makes it easy for you to check if your stratigraphic data satisfy
the defined SDAR data model. The SDAR package introduces a new S4 object class called strata
to store stratigraphic data. This S4 class gives a rigorous definition of a strata
object.
The valid object of this S4 class will meet all the requirements specified in the definition
(e.g., the names of the columns must be called: bed_number, base, top, rock_type,
prim_litho, grain_size, also base and top must be of a numeric type).
The definition of this S4 class reduces errors. It recognizes the type of information that the
object contains, and the validity of it (wickham 2014).
The strata
class provide an additional argument called datum
, this parameter allows users to define
the horizontal reference datum. The options are base or top; base
is the case when thickness is
measured up from the bottom of, e.g., an outcrop section; top
is the case when depths are measured
down from the surface, e.g., boreholes and cores. The default options is datum = "top"
# strata function automatically validates the inputted dataset
# and returns a stratigraphy class object.
validated_beds <- strata(saltarin_beds)
#> 'beds data has been validated successfully'
# check the class of the object generated by the strata function
class(validated_beds)
#> [1] "strata"
#> attr(,"package")
#> [1] "SDAR"
The previous chunk of code validated the inputted dataset saltarin_beds
and returns a new strata
class object validated_beds. The fact that there are no warnings or errors
beds data has been validated successfully
means that indeed each row (bed/layer) information in
the input data, successfully satisfy the expectations in SDAR data model (an error would occur
for example, if we’d misspell sandstone). By default, all errors and warnings are printed out on
the R console screen when validation rules are confronted with input data. The following example
contains an error specification Error: Check row numbers 3, 7. values (sandtone, mudston)
are 'prim_litho' not register in 'litho.table'. (note that sandstone and mudstone are misspelled,
therefore the error is caught and shown in the R console). In beds/layers stratigraphic overlapping
is not allowed, if overlapping occurs strata
function will print an error on screen and return
a dataframe object with the overlapping intervals.
In order to validate data from an outcrop / stratigraphic section, set the parameter datum = "top"
# datum = "base" must be selected when stratigraphic distance above datum
# increases upwards (toward younger levels, as a stratigraphic section).
outcrop_validated_beds <- strata(my_outcrop_beds, datum = "base")
strata
classIn this version of SDAR package, the methods associated with the strata
class are plot
and summary
.
Once the stratigraphy data is loaded into R, and sucessfully validated on the strata
class, we are able to
plot strata
class objects to visualise the information. The plot method provides different
outputs depending on the parameter settings. The summary
method displays standard information about
the strata
class object. The summary
function displays a synopsis of the content in the strata
object including the total number of layers, the thickness of the study section and the number of layers
by lithology type, and grain size.
strata
classThe minimal information required to plot a stratigraphic column using SDAR is a table with the
structure presented in table 1. Having a defined and a validated dataset, as a strata
class,
the plot method plot.strata
is accessed automatically.
# Code to generate example presented in Figure 1.
library(SDAR) # load SDAR library
data(saltarin_beds) # load Saltarin beds dataset
validated_beds <- strata(saltarin_beds) # validates the Saltarin_beds dataset
plot(validated_beds) # plot a stratigraphic log with the SDAR default options
# The default parameters are: `datum = "top"`, `data.units = "feet"`,
# `scale = 100`, and `barscale = 2`
Figure 1: Output example of the plot method for a strata
class. The Saltarin datased was previously
validated into a strata class, here it is plotted using the default parameters.
This plotting parameter (scale
) enables users to employ different drawing scales (graphic vertical scaling).
It defines the vertical scale to draw the graphic log, from 1:1 to any desired scale (e.g., 1:50, 1:200, 1:500).
Moreover, the data.units
parameter allows users to specifies the unit of measure of the stratigraphic
thickness used in input data (thickness measured in field), the user defines whether the data were measured
in meters or feet, default unit ’feet’.
# Code to generate example presented in Figure 2.
plot(validated_beds, data.units="meters", scale=300, barscale=5)
# plot Saltarin dataset at 1:300 scale in meters (meters was the measure unit in the description
# process of Saltarin well), and thickness marks and labels each 5 meters, by default the bar scale is
# plotted at the left side of the lithology track.
Figure 2: Saltarin dataset setting the parameter data.units = “meters”, scale 1:300, and barscale = 5.
Given that the stratigraphic information is stored in a numerical format, SDAR provides the option to draw a specific interval for a given outcrop section or borehole log. The parameters included in plot function that allows this functionality are:
# Code to generate the example presented in Figure 3.
plot(validated_beds, data.units="meters", subset.base=614, subset.top=597)
Figure 3: The beds included into the stratigraphic
interval defined by subset.base
and subset.top
parameters [614 - 597 meters] are plotted.
Often the grain size is not a constant parameter throughout a rock layer, for that reason, in a detailed field description geologists include the grain size variation. Usually, the grain size is described at the bottom and at the top of the layer. Grading commonly consists of an upward decrease in grain size (normal grading), however, certain sedimentary process result in an upward increase in grain size (inverse grading). When grading is normal or inverse, the grain size of the base and top must be provided in the format presented in Table 2.
In order to include and represent gradding information in SDAR, the columns grading,
grain_size_base, and grain_size_top must be included in beds/layers table.
bed_number base top rock_type prim_litho grain_size grading grain_size_base grain_size_top 1 671 670.2 sedimentary claystone clay 2 670.2 669.4 covered 3 669.4 669.18 sedimentary sandstone medium sand normal coarse sand fine / medium sand 4 669.18 667.6 sedimentary limestone wackestone normal packstone wackestone 5 667.6 667.2 sedimentary conglomerate boulder inverse cobble boulder 6 667.2 666.2 sedimentary shale silt
In the previous sections it was presented how SDAR represents the information associated with beds. Here, how SDAR integrate intervals attributes (e.g., bioturbation, sedimentary structures) is presented.
An interval is defined over a stratigraphic range; it has to be defined by a base and a top, the main requirement to set an interval is that the recorded geological feature (e.g., sedimentary structures, bioturbation, unit name, fossil content) is presented throughout the defined stratigraphic range.
In the data structure to define intervals, the user must define a stratigraphical base, top, and the recorded feature of each interval as is presented in Table 3. Each row in this data array describes a stratigraphic interval with the feature described on it (to get more details about the specific types of data required by SDAR, check SDAR_data_model vignette). The interval features available to integrate in this SDAR version are:
base | top | index |
---|---|---|
669.4 | 669.2 | intense |
668.6 | 668.2 | moderate |
665.2 | 665.0 | moderate |
661.4 | 659.9 | low |
637.5 | 637.0 | low |
base | top | sed_structure |
---|---|---|
671 | 670.2 | cross bedding |
671.5 | 671.5 | climbing ripples |
669.4 | 669.18 | lenticular lamination |
668.2 | 667.6 | normal grading |
667.2 | 666.2 | wavy lamination |
We have provided on the SDAR repository a template of the data format used by SDAR as a Microsoft Excel spreadsheet, SDAR_v0.95_intervals_template.xlsx. This is the suggested format by SDAR to store interval information (e.g., bioturbation, sedimentary structures, and so on).
In order to import a sheet from an Excel file, navigate to your working directory (for example, with setwd()
),
or add the full path where your file is stored to the read_excel
function, and specify the sheet to read
with a number or name (the name of a sheet) or (the position of the sheet).
# Specify sheet by its name
my_int_data <- read_excel("file_name.xlsx", sheet= "data") # on your working directory
my_int_data <- read_excel("Path where your Excel file is stored/file_name.xlsx", sheet= "data") # full path
# Specify sheet by its index
my_int_data <- read_excel("file_name.xlsx", sheet= 1)
# Notice that the separator between folders is forward slash (/), as it is on Linux and Mac systems.
# When working in Windows, you need to either use the forward slash or using double backslash (\\).
# full path example in windows systems
my_int_data <- read_excel("C:\\Users\\john\\Desktop\\File_name.xlsx", sheet= "data")
The Saltarin intervals dataset is available in Excel format, it is available
in installed files folder inst/extdata
, to find inst/extdata/SDAR_v0.95_intervals_saltarin.xlsx
,
you need to call
system.file("extdata", "mydata.xlsx", package = "mypackage")
.
# Read the bioturbation external data example (Saltarin intervals Excel file format)
fpath <- system.file("extdata", "SDAR_v0.95_intervals_saltarin.xlsx", package = "SDAR")
bioturbation_data <- read_excel(fpath, sheet = "bioturbation") # import bioturbation sheet
nrow(bioturbation_data) # number of bioturbated intervals
#> [1] 151
bioturbation_data # header of Saltarin bioturbation dataset
#> # A tibble: 151 x 3
#> base top index
#> <dbl> <dbl> <chr>
#> 1 669.4 669.2 intense
#> 2 668.6 668.2 moderate
#> 3 665.15 664.95 moderate
#> 4 661.4 659.9 low
#> # … with 147 more rows
Import Saltarin intervals dataset
# import core_number data
core_number_data <- read_excel(fpath, sheet = "core_number")
# import samples data
samples_data <- read_excel(fpath, sheet = "samples")
# import sedimentary structures data
sed_structures_data <- read_excel(fpath, sheet = "sed_structures")
# import fossils data
fossils_data <- read_excel(fpath, sheet = "fossils")
# import other symbols data
other_symbols_data <- read_excel(fpath, sheet = "other_symbols")
# import lithostratigraphy data
litho_data <- read_excel(fpath, sheet = "lithostra")
# import chronostratigraphy data
crono_data <- read_excel(fpath, sheet = "chronostra")
Plot setting parameters allows users to integrate features to the graphic log (e.g. sedimentary structure, fossil content, unit name). These elements will be plotted on the right or left side of the lithological column. Each one of these additional features will be displayed as symbols, graphic bar, or points at the right or left side of the lithological column. Figure 4 presents the way that SDAR represents the interval attributes.
# Code to generate example presented in Figure 4.
plot(validated_beds, data.units="meters",
subset.base=664, subset.top=649,
bioturbation=bioturbation_data,
fossils=fossils_data,
sed.structures=sed_structures_data,
other.sym=other_symbols_data,
samples=samples_data,
ncore=core_number_data,
lithostrat=litho_data,
chronostrat=crono_data,
symbols.size=0.8)
# For the performance of this example only a subset of the data is plotted. In order to plot
# the complete Saltarin Well dataset, suppress subset.base=664, and subset.top=649" parameters
Figure 4: Graphic log of Saltarin well for [664 - 649 meters] interval, adding symbol features
representation (e.g, sedimentary structures, fossil content, samples), bioturbation,
and lithostratigraphic and chonostratigraphic framework.
Figures 1-4 present examples of graphic logs generated automatically using SDAR packages after the stratigraphic information has been correctly loaded and validated into R. Graphic log generated by SDAR is exported as PDF files (completely editable with any vector drawing application). It will present on a single page, and the paper size will automatically be updated by changes in the vertical scale, or when different sets of attributes are plotted on the right or left side of the lithological column (check the working directory for the PDF output file).
If you see problems with the PDF output, remember that the problem is much more likely to be in your viewer than in R. Try another viewer if possible, browsers as Mozilla Firefox and Google Chrome provide an excellent rendering engine for PDF files.
strata
class dataIn this section, the functionality of the summary
method is presented. When summary
function
is executed with a strata
class object, the results are printed in the R console. The summary
function displays a synopsis of the content in the strata
object. It includes the total number of layers, the thickness of the SC, the thickness of covered intervals, thickness percent and the number of layers by lithology type, into the study SC. The results of running summary
function with the example dataset are printed below.
summary(validated_beds)
#>
#> Number of beds: 610
#> Number of covered intervals 76
#>
#> Thickness of the section: 671.0
#> Thickness of covered intervals: 77.9
#>
#> Summary by lithology:
#>
#> Thickness Percent (%) Number beds
#> sandstone 233.3 34.77 330
#> claystone 211.6 31.53 130
#> siltstone 143.4 21.37 138
#> coal 3.1 0.46 8
#> conglomerate 1.8 0.27 4
#> covered 77.9 11.61 76
summary(validated_beds, grain.size=TRUE)
#>
#> Number of beds: 610
#> Number of covered intervals 76
#>
#> Thickness of the section: 671.0
#> Thickness of covered intervals: 77.9
#>
#> Summary by lithology:
#>
#> Thickness Percent (%) Number beds
#> sandstone 233.3 34.77 330
#> claystone 211.6 31.53 130
#> siltstone 143.4 21.37 138
#> coal 3.1 0.46 8
#> conglomerate 1.8 0.27 4
#> covered 77.9 11.61 76
#>
#> Summary by Grain Size:
#>
#> Thickness Percent (%) Number beds
#> clay 194.0 28.92 123
#> clay / silt 43.7 6.51 28
#> silt 88.6 13.21 89
#> silt / very fine sand 88.3 13.16 101
#> very fine sand 71.6 10.68 122
#> very fine / fine sand 32.4 4.83 49
#> fine sand 27.5 4.10 37
#> fine / medium sand 20.3 3.03 18
#> medium sand 9.2 1.37 11
#> medium / coarse sand 5.6 0.83 8
#> coarse sand 5.5 0.82 15
#> coarse / very coarse sand 3.7 0.55 3
#> very coarse / granule 1.5 0.22 3
#> granule 1.1 0.16 3
#> covered 77.9 11.61 76
This project has been sponsored by Carlos Jaramillo (Smithsonian Tropical Research Institute), financial support of this research was provided by COLCIENCIAS (partly funding the master studies of the main author) fundación para la Investigación de la Ciencia y la Tecnológia del Banco de la República, (Colombia), Corporación Geológica ARES (Colombia), and the Smithsonian Tropical Research Institute, the Anders Foundation, 1923 Fund and Gregory D. and Jennifer Walston Johnson.
The Saltarin 1A well dataset for this analysis, was provided by Alejandro Mora of HOCOL S.A.
Bayona, G., Valencia, A., Mora, A., Rueda, M., Ortiz, J., Montenegro, O. 2008. Estratigrafia y procedencia de las rocas del Mioceno en la parte distal de la cuenca antepais de los Llanos de Colombia. Geologia Colombiana, 33, 23-46.
Jaramillo, C., Romero, I., D'Apolito, C., Bayona, G., Duarte, E., Louwye, S., Escobar, J., Luque, J., Carrillo-Briceno, J., Zapata, V., Mora, A., Schouten, S., Zavada, M., Harrington, G., Ortiz, J., and Wesselingh, F., 2017, Miocene Flooding Events of Western Amazonia: Science Advances, v. 3, p. e1601693
Miall, A. D. (1990). Principles of Sedimentary Basin Analysis. Springer-Verlag.
R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Tucker, M. E. (2011). Sedimentary Rocks in the Field: A Practical Guide. Geological Field Guide. Wiley.
Wickham, H. (2014). Advanced R (Chapman & Hall/CRC The R Series). hapman and Hall/CRC, 1 edition.