tokenize_tweets() function, which is no
longer supported.tokenize_ptb() function for Penn Treebank
tokenizations (@jrnold) (#12).chunk_text() to split long documents
into pieces (#30).tokenize_tweets() preserves usernames,
hashtags, and URLS (@kbenoit) (#44).stopwords() function has been removed in favor of
using the stopwords package (#46).tif package. (#49)tokenize_skip_ngrams has been improved to generate
unigrams and bigrams, according to the skip definition (#24).tokenizers supports (@ironholds) (#26).tokenize_skip_ngrams now supports stopwords (#31).NA consistently (#33).tokenize_words() gains arguments to preserve or strip
punctuation and numbers (#48).tokenize_skip_ngrams() and
tokenize_ngrams() to return properly marked UTF8 strings on
Windows (@patperry)
(#58).tokenize_tweets() now removes stopwords prior to
stripping punctuation, making its behavior more consistent with
tokenize_words() (#76).tokenize_character_shingles() tokenizer.tokenize_words() and
tokenize_word_stems().
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.
This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.