tagger_impl to improve performance of
tokenize(split = TRUE).tokenize now warns rather than throws an error when an
invalid input is given during partial parsing. With this change,
tokenize is no longer entirely aborted even if an invalid
string is given. Parsing of those strings is simply skipped.global_idf3.bind_tf_idf2.
norm=TRUE. Cosine nomalization is now performed on
tf_idf values as in the RMeCab package.tf="itf" and idf="df" options.pack now preserves doc_id type when it’s
factor.MECABRC environment variable or
~/.mecabrc to set up dictionaries.tokenize now skips resetting the
output encodings to UTF-8.split is FALSE.grain_size argument to
tokenize.bind_lr function.RcppParallel::parallelFor instead of
tbb::parallel_for. There are no user’s visible
changes.tokenize can now accept a character vector in addition
to a data.frame like object.gbs_tokenize is now deprecated. Please use the
tokenize function instead.is_blank.partial argument to gbs_tokenize
and tokenize. This argument controls the partial parsing
mode, which forces to extract given chunks of sentences when
activated.posDebugRcpp function.bind_tf_idf2 can calculate and bind the term frequency,
inverse document frequency, and tf-idf of the tidy text dataset.collapse_tokens, mute_tokens, and
lexical_density can be used for handling a tidy text
dataset of tokens.tokenize, it still requires MeCab and its
dictionaries installed and available).tokenize now preserves the original order of
docid_field.bind_tf_idf2 function and is_blank
function.prettify now can extract columns only specified by
col_select.NEWS.md file to track changes to the
package.tokenize now takes a data.frame as its first argument,
returns a data.frame only. The former function that gets character
vector and returns a data.frame or named list was renamed as
gbs_tokenize.
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.
This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.