cwb_huffcode() and cwb_compress_rdx() did
not delete redundant files on Windows. Fixed by temporarily unloading
the corpus #89.cwb_encode() failed if argument
s_attributes was empty list. Fixed, the default value of
s_attributes is now list() #90.cwb_makeall() will not reset CORPUS_REGISTY environment
variable implicitly if corpus to process has already been loaded
#92.cwb_makeall(), cwb_huffcode()
and cwb_compress_rdx() have new argument
logfile to redirect output to this file. Requires argument
quietly to be TRUE #65.cl_struc_values() does not duplicate registry
directories any more #77.get_region_matrix() reports NA values for negative
strucs #87.region_matrix_to_struc_matrix() returns NA values for
regions without nested region as declared in the documentation #88.check_strucs() issues warning if negative values are
passed and if length of input vector is 0.ranges_to_cpos() drops rows from input matrix with NA
values and issues a respective warning.cwb_encode(), cwb_makeall(),
cwb_huffcode() and cwb_compress_rdx() perform
tilde expansion on filename provided by argument registry,
avoiding a crash #84.region_to_strucs() to get minimumum and
maximum struc of s-attribute within region provided. Works also for
nested s-attributes.region_matrix_to_struc_matrix().cl_cpos2lbound() and
cl_cpos2rbound() return NA if corpus position is outside
stru for given s-attribute. #78.cl_cpos2lbound() and
cl_cpos2rbound() are exposed directly from C++ without R
wrappers, improving performance. Using the environment variable
‘CORPUS_REGISTRY’ if argument registry is handled
implicitly now.Rcpp::sourceCpp() or Rcpp::cppFunction().devtools::install_github("PolMine/RcppCWB"). The missing
ref = "dev" has been inserted.cwb_encode() crashed if arguments data_dir
and vrt_dir include a tilde. Tilde expansion is now applied
to these arguments to avoid this #73.sprintf() with
snprintf() to address security issue.sprintf() #70.corpus_properties() and corpus_property()
do not crash any more, if corpus is not loaded or not present #69.p_attr_default() to programmatically
extract default p-attribute #63.region_matrix_corpus() C++ code that
would not show any context at all if s_attribute expansion transgressed
start or end of corpus.region_matrix_corpus() C++ code that
would result from not considering that query matches may go cover more
than one strucs of a structural attribute.corpus_info_file() does not crash if INFO is not
defined in the registry file (#62).sAttribute and
pAttribute as s_attribute or
p_attribute respectively is now accompanied by a warning
that arguments are deprectated.check_corpus() function distinguishes between
whether a corpus is loaded in the CL and/or CQP context.cwb_huffcode() and cwb_compress_rdx() have
argument delete to trigger deleting redundant files after
compression (#60).cqp_load_corpus will internally upper corpus ID as
required in the CQP context (#64).corpus_data_dir() dir not work as
intended without explicitly setting the registry argument.
Fixed.corpus_info_file(),
corpus_full_name(), corpus_p_attributes(),
corpus_s_attributes(), corpus_properties() and
corpus_property() to retrieve registry file data.corpus_registry_dir().cwb_charsets() reports the
charsets supported by CWB.cl_load_corpus() and
cqp_load_corpus() do what the functions suggests.cl_list_corpora() complements existing
function cqp_list_corpora() for the CL context.skip_blank_lines,
strip_whitespace and xml of
cwb_encode() open configuration options of
cwb_encode(), overcoming the previously hard-coded
equivalent to the command-line option “-xsB”.(#38).cpos_to_id(),
.cl_find_corpus() and .cl_new_attribute() are
an entry to passing around pointers, rather than re-creating objects
whenever switching from R to C..s_attr() and .p_attr() return
pointers for a s- or p-attribute.cl_* are now available with pointer as input
(e.g. cpos_to_id()).cqp_drop_subcorpus() function that has been
disabled temporarily is usable again (#34).cqp_query() is now able to process subcorpora.RcppCWB:::.cqp_subcropus() will construct a subcorpus
from a region matrix.check_corpus() does not re-set the registry
directory and more, but tries to load the checked corpus if it has not
yet been loaded.s_attr_relationship() will detect
whether two s-attributes are siblings, or in a descendent or ancestor
relationship.cwb_encode(), cwb_huffcode(),
cwb_makeall() and cwb_compress_rdx() now have
an argument quietly to control display of output messages.
cwb_encode() has an argument verbose to
control whether counter on the number of tokens processed is
dislpayed.cwb_encode() to digest variations of
path statements between macOS and Windows are addressed using a reliable
normalization of paths with fs::path() (#48).encoding is checked for the validity of the
encoding passed in (#34).check_cpos() issues a warning if argument
cpos is NULL (#21).cl_cpos2id(), cl_cpos2lbound(),
cl_cpos2rbound(), cl_cpos2str() and
cl_cpo2struc() will return an empty, zero-length integer
vector if argument cpos is NULL (#21).check_corpus() (used internally by
many functions) resulted from slightly differing representations of
otherwise identical paths. Using fs::path() for path for
normalization internally will omit misleading warning messages.cqp_get_registry() will now return a
fs::path object, as a safeguard for a consistent
normalization of paths.cl_delete_corpus() will now (visibly) return a
logial value.cqp_load_corpus() will return FALSE if
corpus has not been loaded successfully.wrappers.cpp into cl.cpp,
cqp.cpp and utils.cpp, so that the code is
organized more coherently corresponding to the different logics.check_cqp_query() renamed to
check_query() to avoid a conflict with a function defined
in the polmineR package.cqp_list_subcorpora() returns a character
vector. Previously, we just had obscure printed messages.s_attribute_decode() will not break if s-attribute has
no values (#54).cl_struc2str() and
cl_struc2cpos() may now include negative values, the
vectors returned will have NA values at respective
positions. The check against negative values in
check_strucs is dropped accordingly.cwb_encode() function did not declare structural
attributes in the registry and mistakenly channeled output for the file
to the terminal (#49). Fixed.cwb_encode() did not reset global variables,
which resulted in a set of errors. Solved. (#51)cwb-huffcode.c,
cwb-compress-rdx.c and cwb-makeall.c was not
in line with the CWB version of the rest of the code (v3.4.14 / SVN
revision 1069) but rather v2.2.b99 or v3.0.0. All code changes up to
v3.4.14 were reconstructed and implemented (#35). Note that
cwb-encode.c was at CWB v3.4.14, as the encoding
functionality was exposed at a later stage.cwb_version() will report the version of
the CWB source code.cwb_encode() function now has a previously missing
argument encoding to state the encoding of the corpus to be
indexed.cwb_encode() now assumes
implicitly that input files are XML files and remove blank lines and
leading and trailing whitespace. This is equivalent to the option “-xsB”
of the command line utility cwb-encode.cwb_encode() is now a patch of the
main() function of cwb-encode.c, so that code
in the *.cpp file can be limited to a slim wrapper, limiting the risk
that the code in RcppCWB looses touch with CWB upstream
development._eval.h, _globalvars.h and
_cl.h in the ./src directory are autogenerated
files now, not to be edited by hand.cqp_drop_subcorpus() function is
temporarily disabled to ensure that the package can be built (#34).check_corpus() that would trigger resetting the registry
unintendendly and potentially falsely.use_tmp_dir(), normalizePath() is
applied on the tempdir() result to avoid confusion with
symbolic links on macOS.cwb_encode() (not yet run on
Windows).cqp_get_registry() that
would sometimes result in a wrong return value (i.e. registry path) has
been fixed (#14).cwb_makeall(), an
internal check is performed whether the corpus has been loaded already
and whether the home directory of the loaded corpus and defined in the
registry file are identical (#31).cl_delete_corpus() function crashed when trying to
delete a corpus that has not been loaded (#33). The function now aborts
gracefully returning 0 when trying to delete a corpus that has not been
loaded.corpus_is_loaded() can be used to check
whether a corpus is loaded.cwb_encode() that exposes functionality of cwb-encode CWB
utility.cl_cpos2lbound() and
cl_cpos2rbound() will now accept an integer vector with
length > 1 as argument cpos and return a vector with the
same length. Useful to speed up iterated queries for left and right
boundaries of regions (#19).cl_struc_values() exposes the
corresponding C function of the Corpus Library (CL). The previous
implicit assumption that all structural attributes have values can thus
be tested. Intended to work with annotations of sentences and
paragraphs, i.e. common structural attributes that do usually not have
values.corpus_data_dir() will derive the data
directory from the internal C representation of a corpus.s_attr_regions() will derive regions
defined by a structural attribute from the *.rng file. Fastest option
for large corpora.s_attr_is_sibling() and
s_attr_is_descendent() test the sibling/descendent
relationship of structural attributes.check_corpus() now includes checks whether the
registry provided (argument registry) is identical with the
registry defined internally by CQP. The registry is reset if directories
are not identical.s_attribute_decode() method
was incomplete for method “Rcpp”. This alternative to the “pure R”
approach is now implemented (#2).method previously setting “wininet” in
./tools/winlibs.R is omitted to avoid the warning “the ‘wininet’ method
is deprecated for http:// and https:// URLs” on Windows.pcre-config to
locate header files of PCRE.cqp_initialize())get_tmp_registry() will
return the whereabouts of this directory.check_corpus()-function. Problems with the previous
implementation that relied on files in the registry directory to ensure
the presence of a corpus hopefully do not occur.cl_charset_name() is exposed, it will return
the charset of a corpus. Faster than parsing the registry file again and
again.cl_delete_corpus()-function can remove loaded
corpora from memory.
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.
This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.