                  ** stringi package NEWS and CHANGELOG **
===============================================================================


* 0.4-1 (2014-12-11) **CRAN**

   * [IMPORTANT CHANGE] `n_max` argument in `stri_split_*()` has been renamed `n`.

   * [IMPORTANT CHANGE] `simplify=FALSE` in `stri_extract_all_*()` and
     `stri_split_*()` now calls `stri_list2matrix()` with `fill=""`.
     `fill=NA_character_` may be obtained by using `simplify=NA`.

   * [IMPORTANT CHANGE, NEW FUNCTIONS] #120: `stri_extract_words` has been
     renamed `stri_extract_all_words` and `stri_locate_boundaries` -
     `stri_locate_all_boundaries` as well as `stri_locate_words` -
     `stri_locate_all_words`. New functions are now available:
     `stri_locate_first_boundaries`, `stri_locate_last_boundaries`,
     `stri_locate_first_words`, `stri_locate_last_words`,
     `stri_extract_first_words`, `stri_extract_last_words`.

   * [IMPORTANT CHANGE] #111: `opts_regex`, `opts_collator`, `opts_fixed`, and
     `opts_brkiter` can now be supplied individually via `...`.
     In other words, you may now simply call e.g.
     `stri_detect_regex(str, pattern, case_insensitive=TRUE)` instead of
     `stri_detect_regex(str, pattern, opts_regex=stri_opts_regex(case_insensitive=TRUE))`.

   * [NEW FEATURE] #110: Fixed pattern search engine's settings can
     now be supplied via `opts_fixed` argument in `stri_*_fixed()`,
     see `stri_opts_fixed()`. A simple (not suitable for natural language
     processing) yet very fast `case_insensitive` pattern matching can be
     performed now. `stri_extract_*_fixed` is again available.

   * [NEW FEATURE] #23: `stri_extract_all_fixed`, `stri_count`, and
     `stri_locate_all_fixed` may now also look for overlapping pattern
     matches, see `?stri_opts_fixed`.

   * [NEW FEATURE] #129: `stri_match_*_regex` gained a `cg_missing` argument.

   * [NEW FEATURE] #117: `stri_extract_all_*()`, `stri_locate_all_*()`,
     `stri_match_all_*()` gained a new argument: `omit_no_match`.
     Setting it to `TRUE` makes these functions compatible with their
     `stringr` equivalents.

   * [NEW FEATURE] #118: `stri_wrap()` gained `indent`, `exdent`, `initial`,
     and `prefix` arguments. Moreover Knuth's dynamic word wrapping algorithm
     now assumes that the cost of printing the last line is zero, see #128.

   * [NEW FEATURE] #122: `stri_subset()` gained an `omit_na` argument.

   * [NEW FEATURE] `stri_list2matrix()` gained an `n_min` argument.

   * [NEW FEATURE] #126: `stri_split()` now is also able to act
     just like `stringr::str_split_fixed()`.

   * [NEW FEATURE] #119:  `stri_split_boundaries()` now have
     `n`, `tokens_only`, and `simplify` arguments. Additionally,
     `stri_extract_all_words()` is now equipped with `simplify` arg.

   * [NEW FEATURE] #116: `stri_paste()` gained a new argument:
     `ignore_null`. Setting it to `TRUE` makes this function more compatible
     with `paste()`.

   * [NEW FEATURE] #114: `stri_paste()`: `ignore_null` arg has been added.

   * [OTHER] #123: `useDynLib` is used to speed up symbol look-up in
     the compiled dynamic library.

   * [BUGFIX]  #94: Run-time errors on Solaris caused by setting
     `-DU_DISABLE_RENAMING=1` -- memory allocation errors in i.a. ICU's
     UnicodeString. This setting also caused some ABSan sanity check
     failures within ICU code.


-------------------------------------------------------------------------------


* 0.3-1 (2014-11-06) **CRAN**

   * [IMPORTANT CHANGE] #87: `%>%` overlapped with the pipe operator from
     the `magrittr` package; now each operator like `%>%` has been renamed `%s>%`.

   * [IMPORTANT CHANGE] #108: Now the BreakIterator (for text boundary analysis)
     may be better controlled via `stri_opts_brkiter()` (see options `type`
     and `locale` which aim to replace now-removed `boundary` and `locale` parameters
     to `stri_locate_boundaries`, `stri_split_boundaries`, `stri_trans_totitle`,
     `stri_extract_words`, `stri_locate_words`).

   * [NEW FUNCTIONS] #109: `stri_count_boundaries()` and `stri_count_words()`
     count the number of text boundaries in a string.

   * [NEW FUNCTIONS] #41: `stri_startswith_*()` and `stri_endswith_*()`
     determine whether a string starts or ends with a given pattern.

   * [NEW FEATURE] #102: `stri_replace_all_*()` gained a `vectorize_all()` parameter,
     which defaults to TRUE for backward compatibility.

   * [NEW FUNCTION] #91: `stri_subset_*()`, a convenient and more efficient
     substitute for `str[stri_detect_*(str, ...)]`, added.

   * [NEW FEATURE] #100: `stri_split_fixed()`, `stri_split_charclass()`,
     `stri_split_regex()`, `stri_split_coll()` gained a `tokens_only` parameter,
     which defaults to `FALSE` for backward compatibility.

   * [NEW FUNCTION] #105: `stri_list2matrix()` converts lists of atomic vectors
     to character matrices, useful in connection with `stri_split`
     and `stri_extract`.

   * [NEW FEATURE] #107: `stri_split_*()` now allow setting an `omit_empty=NA` argument.

   * [NEW FEATURE] #106: `stri_split()` and `stri_extract_all()` gained a `simplify`
     argument (if `TRUE`, then `stri_list2matrix(..., byrow=TRUE)`
     is called on the resulting list.

   * [NEW FUNCTION] #77: `stri_rand_lipsum()` generates
     (pseudo)random dummy *lorem ipsum* text.

   * [NEW FEATURE] #98: `stri_trans_totitle()` gained a `opts_brkiter`
     parameter; it indicates which ICU BreakIterator should be used when
     performing case mapping.

   * [NEW FEATURE] `stri_wrap()` gained a new parameter: `normalize`.

   * [BUGFIX] #86: `stri_*_fixed()`, `stri_*_coll()`, and `stri_*_regex()` could
     give incorrect results if one of search strings were of length 0.

   * [BUGFIX] #99: `stri_replace_all()` did not use the `replacement` arg.

   * [BUGFIX] #112: Some of the objects were not PROTECTed from
     being garbage collected, which might have caused spontaneous SEGFAULTS.

   * [BUGFIX] Some collator's options were not passed correctly to ICU services.

   * [BUGFIX] Memory leaks causes as detected by
     `valgrind --tool=memcheck --leak-check=full` have been removed.

   * [DOCUMENTATION] Significant extensions/clean ups in the stringi manual.


-------------------------------------------------------------------------------


* 0.2-5 (2014-05-16) **CRAN**

   * icudt-dependent examples are no longer run if `icudt` is not available.


* 0.2-4 (2014-05-15) **CRAN**

   * [BUGFIX] Issues with loading of misaligned addresses in `stri_*_fixed()`.


* 0.2-3 (2014-05-14) **CRAN**

   * [IMPORTANT CHANGE] `stri_cmp*()` now do not allow for passing
      `opts_collator=NA`. From now on, `stri_cmp_eq`, `stri_cmp_neq`,
      and the new operators `%===%`, `%!==%`, `%stri===%`, and `%stri!==%`
      are locale-independent operations, which base on code point comparisons.
      New functions `stri_cmp_equiv` and `stri_cmp_nequiv`
      (and from now on also `%==%`, `%!=%`, `%stri==%`, and `%stri!=%`)
      test for canonical equivalence.

   * [IMPORTANT CHANGE] `stri_*_fixed()` search functions now perform
      a locale-independent exact (byte-wise, of course after conversion to UTF-8)
      pattern search. All the Collator-based, locale-dependent search routines
      are now available via `stri_*_coll()`. The reason for this is that
      ICU USearch has currently very poor performance and in many search tasks
      in fact it is sufficient to do exact pattern matching.

   * `stri_*_fixed` now use a tweaked Knuth-Morris-Pratt search algorithm,
      which improves the search performance drastically.


* 0.2-2 (2014-05-01) **devel**

   * [IMPORTANT CHANGE] `stri_enc_nf*()` and `stri_enc_isnf*()` function families
      have been renamed to `stri_trans_nf*()` and `stri_trans_isnf*()`,
      respectively. This is because they deal with text transforming,
      and not with character encoding. Moreover, all such operations may
      also be performed by ICU's Transliterator (see below).

   * [NEW FUNCTION] `stri_trans_general()` and `stri_trans_list()` give access
      to ICU's Transliterator: may be used to perform very general
      text transforms.

   * [NEW FUNCTION `stri_split_boundaries()` utilizes ICU's BreakIterator
      to split strings at specific text boundaries. Moreover,
      stri_locate_boundaries indicates positions of these boundaries.

   * [NEW FUNCTION] `stri_extract_words()` uses ICU's BreakIterator to
      extract all words from a text. Additionally, `stri_locate_words`
      locates start and end positions of words in a text.

   * [NEW FUNCTION] `stri_pad()`, `stri_pad_left()`, `stri_pad_right()`,
      and `stri_pad_both` pad a string with a specific code point.

   * [NEW FUNCTION] `stri_wrap()` breaks paragraphs of text into lines.
     Two algorithms (greedy and minimal raggedness) are available.


* 0.2-1 (2014-04-18) **devel**

   * [IMPORTANT CHANGE] `stri_*_charclass()` search functions now
     rely solely on ICU's UnicodeSet patterns. All previously accepted
     charclass identifiers became invalid. However, new patterns
     should now be more familiar to the users (they are regex-like).
     Moreover, we observe a very nice performance gain.

   * [IMPORTANT CHANGE] `stri_sort()` now does not include `NA`s
     in output vectors by default, for compatibility with `sort()`.
     Moreover, currently none of the input vector's attributes are preserved.

   * [NEW FUNCTION] `stri_unique()` extracts unique elements from
     a character vector.

   * [NEW FUNCTIONS] `stri_duplicated()` and `stri_duplicated_any()`
     determine duplicate elements in a character vector.

   * [NEW FUNCTION] `stri_replace_na()` replaces `NA`s in a character vector
      with a given string, useful for emulating e.g. R's `paste()` behavior.

   * [NEW FUNCTION] `stri_rand_shuffle()` generates a random permutation
     of code points in a string.

   * [NEW FUNCTION] `stri_rand_strings()` generates random strings.

   * [NEW FUNCTIONS] New functions and binary operators for string comparison:
     `stri_cmp_eq()`, `stri_cmp_neq()`, `stri_cmp_lt()`, `stri_cmp_le()`,
     `stri_cmp_gt()`, `stri_cmp_ge()`, `%==%`, `%!=%`, `%<%`, `%<=%`, `%>%`, `%>=%`.

   * [NEW FUNCTION] `stri_enc_mark()` reads declared encodings of character
     strings as seen by stringi.

   * [NEW FUNCTION] `stri_enc_tonative(str)` is an alias to
     `stri_encode(str, NULL, NULL)`.

   * [NEW FEATURE] `stri_order()` and `stri_sort()` now have an additional argument
     `na_last` (defaults to `TRUE` and `NA`, respectively).

   * [NEW FEATURE] `stri_replace_all_charclass()`, `stri_extract_all_charclass()`,
     and `stri_locate_all_charclass()` now have a new arg, `merge`
     (defaults to `FALSE` for backward-compatibility). It may be used
     to e.g. replace sequences of white spaces with a single space.

   * [NEW FEATURE] `stri_enc_toutf8()` now has a new `validate` arg (defaults
     to FALSE for backward-compatibility). It may be used in a (rare) case
     in which a user wants to fix an invalid UTF-8 byte sequence.
     stri_length (among others) now detect invalid UTF-8 byte sequences.

   * [NEW FEATURE] All binary operators `%???%` now also have aliases `%stri???%`.

   * Performance improvements in `StriContainerUTF8` and `StriContainerUTF16`
     (they affect most other functions).

   * Significant performance improvements in `stri_join()`, `stri_flatten()`,
     `stri_cmp()`, `stri_trans_to*()`, and others.

   * Added 3rd mirror site for our icudt binary distribution.

   * `U_MISSING_RESOURCE_ERROR` message in `StriException` now suggests
     calling `stri_install_check()`.

   * [BUGFIX] UTF-8 BOMs are now silently removed from input strings.

   * [BUGFIX] no more attempts to re-encode UTF-8 encoded strings
     if native encoding=UTF-8 in `StriContainerUTF8`.

   * [BUGFIX] possible memory leaks when throwing errors via `Rf_error()`.

   * [BUGFIX] `stri_order()` and `stri_cmp()` could return incorrect results
     for `opts_collator=NA`.

   * [BUGFIX] `stri_sort()` did not guarantee to return strings in UTF-8.


-------------------------------------------------------------------------------


* 0.1-25 (2014-03-12) **CRAN**

    * LICENCE tweaks.

    * Initial CRAN release.


* 0.1-24 (2014-03-11) **devel**

    * Fixed bugs detected with `ASan` and `UBSan`,
      e.g. fixed `CharClass::gcmask` type (`enum` -> `uint32_t`)
      (reported by `UBSan`).

    * Fixed array over-runs detected with `valgrind` in `string8.h`.

    * Fixed unitialized class fields in `StriContainerUTF8`
      (reported by `valgrind`).


* 0.1-23 (2014-03-11) **devel**

    * License changed to BSD-3-clause, COPYRIGHTS updated.

    * icudt is not shipped with stringi anymore;
      it is now downloaded in install.libs.R from one of our servers.

    * New functions: `stri_install_check()`, `stri_install_icudt()`.


* 0.1-22 (2014-02-20) **devel**

   * System ICU is used on systems which do have one (version >= 50 needed).
     ICU is autodetected with `pkg-config` in `./configure`.
     Pass `'--disable-pkg-config'` to `./configure` to force building
     ICU from sources.

   * icudt52b (custom subset) is now shipped with stringi
     (for big-endian, ASCII systems).


* 0.1-21 (2014-02-19) **devel**

   * Fixed some Solaris-related issues while preparing stringi
     for CRAN submission.


* 0.1-20 (2014-02-17) **devel**

   * ICU4C 52.1 sources included (common, i18n, stubdata + icu52dt.dat
     loaded dynamically). Compilation via Makevars.

   * stringi now does not depend on any external libraries.


* 0.1-11 (2013-11-16) **devel**

   * ICU4C is now statically linked on Windows.

   * First OS X binary build.

   * The package is being intensively tested by our students @ FMIS WUT.


* 0.1-10 (2013-11-13) **devel**

   * Using pkg-config via ./configure to look for ICU4C libs.


* 0.1-6 (2013-07-05) **devel**

   * First Windows binary build.

   * Compilation passed on Oracle Sun Studio compiler collection.

   * By now we have implemented most of the functionality
     scheduled for milestone 0.1.


* 0.1-1 (2013-01-05) **devel**

   * The stringi project has been established on GitHub.


-------------------------------------------------------------------------------
