fozzie_difference_join_rs,
fozzie_distance_join_rs,
fozzie_interval_join_rs, fozzie_regex_join_rs,
and fozzie_string_join_rs functions are no longer exported
and their .Rd documentation files have been removed.tibbles in the Description field to be more generic and
formal.interval_mode = 'real' now handles a
mix of integer and double inputs correctly.by = NULL, the internal common_by
function will now print the columns used in the join.extendr packages.options(fozzie.nthread = 4), which will be respected by all
functions with an nthread argument. By default, the package uses the
default from the multithreading Rust library rayon.tibble, the output
result will now be a tibble.
tibble but not in data.frame.tibble is now a suggested import.integer: integer-based join types, with behavior
designed to emulate IRanges findOverlaps. Importantly, [1, 2] and [3, 4]
would be considered overlapping in this case.real: real number joins, where there must be some
continuous overlap between ranges to be considered matching.auto: behavior determined by the input column
types.by function should now better resemble the
fuzzyjoin implementation. Notes have been added to the
internal function signature to acknowledge their contribution.styler to be more style guide compliant.nthread argument wherein the
user-specified thread count was ignored and the default global thread
pool settings were always used. See Issue
#7.fozzie_join functions have been renamed to
fozzie_string_join. This will better describe the function
behavior and allow us to add other join types in the future. See Issue
#9fozzie_string_full_join now implements full joins as
the union of the left and right fuzzy join. Before this, it was the
cartesian product of left and right datasets.fozzie_difference_join suite of functions now
available. This allows joining on numeric distance.rapidfuzz crate for supported algorithms,
as they perform better than prior implementations.fozzie_left_join(), fozzie_inner_join(),
…).v0.0.5.fuzzyjoin was a required import for the
misspellings dataset.devtools::check() and
R CMD check checks for the first time.textdistance crate implementation. Those scripts now have a
header comment acknowledging the original author.nthread=2 for compliance with CRAN
policies.par_chunks have replaced equivalent
par_iter operations)stringdist behavior.nthread
parameter.prefix_weight and
max_prefix parameters added. These are similar to the
bt and p parameters in the
stringdist package, with some differences
(prefix_weight is a set number of characters, not a
proportion).jaro method is no longer supported. The default
values for the jw and jaro_winkler methods
simplify into the Jaro case.distance_col is live. It can be used to
add the string distance of joined fields to the output.NA in R character fields with a string with the string
value “NA”. Tests updated to expect a true NA.NA strings in all Rust
internals that perform fuzzy matches. If one or more values in a pair is
NA, the pair is considered a non-match.inner and left joins. Results were verified
against expectations and with the fuzzyjoin package.
Exceptions:
jarowinkler/jw method requires the
addition of new parameters for p and dt to be
fully customizable. Currently, jaro_winkler defaults to a scaling factor
of 0.1 and a maximum prefix of 4. This is consistent with the default of
the stringdist method.jaro algorithm does not actually exist in the
stringdist implementation, as it is equivalent to setting
p=0.fuzzy_join API call now includes the how
method to specify the join type. inner and
left are the currently supported methods. At least
right, full, and anti are planned
for future releases.
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.
This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.