```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r} (subject.size.vec <- unique(as.integer(10^seq(0,3.5,l=100)))) (backtrackers <- c( if(requireNamespace("stringi"))atime::atime_grid( ICU=stringi::stri_match(subject, regex=pattern)), atime::atime_grid( PCRE=regexpr(pattern, subject, perl=TRUE), TRE=regexpr(pattern, subject, perl=FALSE)))) backtrackers.result <- atime::atime( N=subject.size.vec, setup={ subject <- paste(rep("a", N), collapse="") pattern <- paste(rep(c("(a)?", "\\1"), each=N), collapse="") }, expr.list=backtrackers) backtrackers.best <- atime::references_best(backtrackers.result) plot(backtrackers.best) ``` The plot above shows that ICU/PCRE/TRE are all exponential in N (subject/pattern size) when the pattern contains backreferences. ```{r} all.exprs <- c( if(requireNamespace("re2"))atime::atime_grid( RE2=re2::re2_match(subject, pattern)), backtrackers) all.result <- atime::atime( N=subject.size.vec, setup={ subject <- paste(rep("a", N), collapse="") pattern <- paste(rep(c("a?", "a"), each=N), collapse="") }, expr.list=all.exprs) all.best <- atime::references_best(all.result) plot(all.best) ``` The plot above shows that ICU/PCRE are exponential time whereas RE2/TRE are polynomial time. Exercise for the reader: modify the above code to use the `seconds.limit` argument so that you can see what happens to ICU/PCRE for larger N (hint: you should see a difference at larger sizes). ## Interpolate at seconds.limit using predict method ```{r} (all.pred <- predict(all.best)) summary(all.pred) ``` The `predict` method above returns a list with a new element named `prediction`, which shows the data sizes that can be computed with a given time budget. The `plot` method is used below, ```{r} plot(all.pred) ``` ## `atime_grid` to compare different engines In the `nc` package there is an `engine` argument which controls which C regex library is used: ```{r} nc.exprs <- atime::atime_grid( list(ENGINE=c( if(requireNamespace("re2"))"RE2", "PCRE", if(requireNamespace("stringi"))"ICU")), nc=nc::capture_first_vec(subject, pattern, engine=ENGINE)) nc.result <- atime::atime( N=subject.size.vec, setup={ rep.collapse <- function(chr)paste(rep(chr, N), collapse="") subject <- rep.collapse("a") pattern <- list(maybe=rep.collapse("a?"), rep.collapse("a")) }, expr.list=nc.exprs) nc.best <- atime::references_best(nc.result) plot(nc.best) ``` The result/plot above is consistent with the previous result.