loggit
is an easy-to-use, yet powerful, ndjson
logger. It is very fast, has zero external dependencies, and can be as straightforward or as integral as you want to make it.
R has a selection of built-in functions for handling different exceptions, or special cases where diagnostic messages are provided, and/or function execution is halted because of an error. However, R itself provides nothing to record this diagnostic post-hoc; useRs are left with what is printed to the console as their only means of analyzing the what-went-wrong of their code. There are some slightly hacky ways of capturing this console output, such as sink
ing to a text file, repetitively cat
ing identical exception messages that are passed to existing handler calls, etc. But there are two main issues with these approaches:
The console output is not at all easy to parse, so that a user can quickly identify the causes of failure without manually scanning through it
Even if the user tries to structure a text file output, they would likely have to ensure consistency in that output across all their work, and there is still the issue of parsing that text file into a familiar, usable format
Enter: JSON
For those unaware: JSON is a lightweight, portable (standardized) data format that is easy to read and write by both humans and machines. An excerpt from the introduction of the JSON link above:
JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.
Basically, you can think of JSON objects like you would think of list
s in R: a set of named key-value pairs. Since R list
s are subsets of the data.frame
class, logs written by loggit
are easily retrievable as data frames – this means you can analyze your log data, right from your R code!
What loggit
does a bit differently is write logs as newline-delimited JSON (ndsjon
). Instead of a JSON file that looks like this:
[
{
"key1": "value1"
},
{
"key2": "value2"
}
]
loggit
’s logs containing the same data will instead put each object on its own line:
{"key1": "value1"}
{"key2": "value2"}
This makes the log entries themselves exhibit very fast disk write speeds, while still being machine-parsable, human-readable, and ideal for log stream collection systems (like the stdout
of your terminal, or a container in Docker or Kubernetes).
loggit
To write a log entry using loggit
via its exception handlers, you just load loggit
, set its log file location, and use the same handlers you always do:
library(loggit)
set_logfile("/path/to/my/log/directory/loggit.log") # loggit enforces no specific file extension
message("This is a message")
#> {"timestamp": "2021-02-27T20:37:41-0600", "log_lvl": "INFO", "log_msg": "This is a message"}
#> This is a message
warning("This is a warning")
#> {"timestamp": "2021-02-27T20:37:41-0600", "log_lvl": "WARN", "log_msg": "This is a warning"}
#> Warning in warning("This is a warning"): This is a warning
# stop("This is a critical error, so I'm not actually going to run it in this vignette")
You can see that the handlers will pring both the loggit
-generated log entry, as well as their base default output. To only have the JSON print, wrap the call in the appropriate suppressor (i.e. suppressMessages()
or suppressWarnings()
). To only have the base text printed, pass echo = FALSE
to the handler.
And… that’s it! You’ve introduced human-readable, machine-parsable logging into your workflow!
However, surely you want more control over your logs.
Behind the scenes, loggit
’s core function, also called loggit()
, is executed right before the base handlers with some sane defaults. However, the loggit()
function is also exported for use by the developer:
loggit("INFO", "This is also a message")
#> {"timestamp": "2021-02-27T20:37:41-0600", "log_lvl": "INFO", "log_msg": "This is also a message"}
loggit("WARN", "This is also a warning")
#> {"timestamp": "2021-02-27T20:37:41-0600", "log_lvl": "WARN", "log_msg": "This is also a warning"}
loggit("ERROR", "This is an error, but it won't stop your code from running like `stop()` does")
#> {"timestamp": "2021-02-27T20:37:41-0600", "log_lvl": "ERROR", "log_msg": "This is an error__COMMA__ but it won't stop your code from running like `stop()` does"}
“But why wouldn’t I just use the handlers instead?”
Because loggit()
exposes much greater flexibility to the user, by way of custom fields.
loggit(
"INFO",
"This is a message",
but_maybe = "you want more fields?",
sure = "why not?",
like = 2,
or = 10,
what = "ever"
)
#> {"timestamp": "2021-02-27T20:37:41-0600", "log_lvl": "INFO", "log_msg": "This is a message", "but_maybe": "you want more fields?", "sure": "why not?", "like": "2", "or": "10", "what": "ever"}
Since JSON is considered semi-structured data (sometimes called “schema-on-read”), you can log any custom fields you like, as inconsistently as you like. It all just ends up as text in a file, with no column structure to worry about.
So, loggit
’s log format is a special type of JSON. JSON objects are like list
s – and so are data.frames
. To allow for the most flexibility, the read_logs()
function is available to you, which reads in the currently-set log file as a data frame:
read_logs()
#> timestamp log_lvl
#> 1 2021-02-27T20:37:41-0600 ERROR
#> 2 2021-02-27T20:37:41-0600 INFO
#> 3 2021-02-27T20:37:41-0600 WARN
#> 4 2021-02-27T20:37:41-0600 INFO
#> 5 2021-02-27T20:37:41-0600 WARN
#> 6 2021-02-27T20:37:41-0600 ERROR
#> 7 2021-02-27T20:37:41-0600 INFO
#> log_msg
#> 1 Means differ! (actual = 5.8658, expected = 5.8433
#> 2 This is a message
#> 3 This is a warning
#> 4 This is also a message
#> 5 This is also a warning
#> 6 This is an error, but it won't stop your code from running like `stop()` does
#> 7 This is a message
#> but_maybe sure like or what
#> 1
#> 2
#> 3
#> 4
#> 5
#> 6
#> 7 you want more fields? why not? 2 10 ever
Notice that read_logs()
handles any columnar inconsistencies as mentioned above. If read_logs()
finds a field that other entries don’t have, it maps it to an empty string for that log entry. This was chosen over NA
s to allow for consistency on re-write. You can, however, just replace all the empty strings with NA
after read, if you want to.
You can also pass a file path to read_logs()
, and read that loggit
log file instead.
The other helpful utilities are as follows:
"%Y-%m-%dT%H:%M:%S%z"
, but you may set it yourself using set_timestamp_format()
. Note that this format is ultimately passed to format.Date()
, so the supplied format needs to be valid.set_logfile(logfile)
. Similarly, you can retrieve the location of the current log file using get_logfile()
.loggit
will default to writing to an R temporary directory. As per CRAN policies, a package cannot write to a user’s “home filespace” without approval. Therefore, you need to set the log file before any logs are written to disk, using set_logfile(logfile)
(I recommend in your working directory, and naming it “loggit.log”). If you are using loggit in your own package, you can wrap this in a call to .onLoad()
, so that logging is set on package load. If not, then make the set call as soon as possible (e.g. at the top of your script(s), right after your calls to library()
); otherwise, no logs will be written to persistent storage!