fix R CMD check notes in documentation of R6 classes
text2vec 0.6.5 (2023-10-16)
fix test discovered with Matrix==1.6-2 release
text2vec 0.6.4 (2023-02-15)
update dependency Matrix>=1.5-2, fixes #338
text2vec 0.6.2 (2022-09-11)
removed test which is not needed with Matrix package v 1.5
text2vec 0.6
2019-12-17
breaking change - removed construction of a
vocabulary in parallel on windows
use rsparse package for SVD and GloVe
factorizations
updated RWMD implementation (hopefully bug free)
2018-09-10
breaking change - changed IDF formula - see #280
for details.
2018-05-28
Added postag_lemma_tokenizer() (wrapper around
udpipe::udpipe_annotate). Can be used as a drop-in
replacement for more simple tokenizers in text2vec.
2018-05-25
Made combine_vocabularies() part of public API - see
#260 for details.
2018-05-10
Added coherence() function for comprehensive coherence
metrics. Thanks to Manuel Bickel ( @manuelbickel ) for conrtibution.
2018-05-02
Fixed bug LSA model - document embeddings calculated as left
singular vectors multiplied by singular values (not square root of
values as before). Thanks to Sloane Simmons ( @singularperturbation )
Now fit_transform and transform methods in
LDA model produce same results. Thanks to @jiunsiew for reporting. Also now LDA has
n_iter_inference parameter. It controls number of the
samples from converged distribution for document-topic inference. This
leads to more robust document-topic probabilities (reduced variance).
Default value is 10.
2018-01-17
more numerically robust PMI, LFMD - thanks to @andland. Also adds iteration number
iter to collocation_stat. iter
shows iteration number when collocation stats (and counters) were
calculated.
text2vec 0.5.1 [2018-01-10]
2018-01-10
removed rank* columns from collocation_stat - were
never used internally. Users can easily calculate ranks themselves
2018-01-09
Added Bi-Normal Separation transformation, thanks to Pavel Shashkin
( @pshashk )
Added Dunning’s log-likelihood ratio for collocations, thanks to
Chris Lee ( @Chrisss93 )
Early stopping for collocations learning
2017-12-18
fixed several bugs #219 #217 #205
decreased number of dependencies - no more magrittr,
uuid, tokenizers
removed distributed LDA which didn’t work correctly
models API follow mlapi package. No API
changes on text2vec side - we just put abstract
scikit-learn-like classes to a separate package in order to
make them more reusable.
text2vec 0.5.0
2017-06-12
Add additional filters to prune_vocabulary - filter by
document counts
Clean up LSA, fixed transform method. Added option to use randomized
SVD algorithm from irlba.
2017-05-17
Imrove dist2 performamce for RWMD - incorporate ideas
from gensim
PR discussion.
2017-05-17
API breaking change - vocabulary format change -
now plain data.frame with meta-information in attributes
(stopwords, ngram, number of docs, etc).
2017-03-25
No more rely on RcppModules
API breaking change - removed lda_c
from formats in DTM construction
added ifiles_parallel, itoken_parallel
high-level functions for parallel computing
API breaking changechunks_numer
parameter renamed to n_chunks
2017-01-02
API breaking change - removed
create_corpus from public API, moved co-occurence related
optons to create_tcm from vecorizers
add ability to add custom weights for co-occurence statistics
calculations
2016-12-30
Noticeable speedup (1.5x) and even
more noticeable improvement on memory usage (2x less!)
for create_dtm, create_tcm . Now package
relies on sparsepp
library for underlying hash maps.
2016-10-30
Collocations - detection of multi-word phrases using differend
heuristics - PMI, gensim, LFMD.