Changes in version 0.7.0 (2017-06-22)

 * Add text_locate(), for searching for specific terms in text,
   reporting the contexts of the occurrences ("Key words in context").

 * Add text_count() and text_detect() for counting term occurrences or
   checking for a term's existence in a text.

 * Add text_types() and text_ntype() for returning the unique types in
   a text, or counting types.

 * Rename tokens() to text_tokens() for consistency; add text_ntoken().

 * Add text_nsentence() for counting sentences.

 * Add term_frame(), reporting term frequencies as a data frame with
   columns "text", "term", and "count".

 * Add transpose argument to term_matrix().

 * Add new version of format.corpus_text() that is faster and aware
   of character widths, in particular, Emoji and East Asian character
   widths.

 * Rename term_counts() "min" and "max" arguments to "min_count" and
   "max_count".

 * Normalize token filter combine, drop, drop_except, stem_except arguments,
   to allow passing cased versions of these arguments.

 * Set combine = abbreviations("english") by default.

 * Fixed bug where "u.s" (a unigram) stems to "u.s" (a bigram), and then
   causes for term_matrix() select.  Thanks to Dmitriy Selivanov for
   reporting: https://github.com/patperry/r-corpus/issues/3 .


Changes in version 0.6.0 (2017-06-06)

 * Rename text_filter() to token_filter().

 * Add "ngrams" options for term_counts() and term_matrix().

 * Add term_counts() "min" and "max" options for excluding terms with
   counts below or above specified limits.

 * Add term_counts() "limit" option to limit the number of reported terms.

 * Add term_counts() "types" option for reporting the types that make up
   a term.

 * Remove "select" argument from token_filter(), but add "select" to
   term_matrix() arguments.

 * Replace sentences() function with text_split(), which has options for
   breaking into multi-sentence blocks or multi-token blocks.

 * Add sentence break suppressions (special handling for abbreviations);
   the default behavior for text_split(, "sentences") is to use a set of
   English abbreviations as suppressions.

 * Add option to treat CR and LF like spaces when determining sentence
   boundaries; this is now the default.

 * Add abbreviations() function with abbreviation lists for English,
   French, German, Italian, Portuguese, and Russian (from the Unicode
   Common Locale Data Repository).

 * Add more refined control over token_filter() drop cateogries:
   merged "kana", and "ideo" into "letter"; split off "punct", "mark",
   and "other" from "symbol".

 * Remove "remove_control", "map_dash", and "remove_space" type
   normalization options from text_filter().

 * Remove "ignore_empty" token filter option.


Changes in version 0.5.1 (2017-05-25)

 * Rename "text" class to "corpus_text" to avoid name classes with grid.
   Thanks to Jeroen Ooms for reporting:
   https://github.com/patperry/corpus/issues/1

 * Rename "jsondata" to "corpus_json" for consistency.

 * Fix bug in read_ndjson() for reading factors with missing values.


Changes in version 0.5.0 (2017-05-23)

 * Add term_counts() function to tabulate term frequencies.

 * Add term_matrix() function to compute a term frequency matrix.

 * Add text_filter() option ("stem_except") to exempt specific terms from
   stemming.

 * Add text_filter() option ("drop") to drop specific terms, along with
   option ("drop_except") to exempt specific terms from dropping.

 * Add text_filter() option ("combine") to combine multi-word phrases like
   "new york city" into a single term.

 * Add text_filter() option ("select") to select specific terms (excluding
   all words that are not on this list).

 * Rename text_filter() options "fold_case", "fold_dash", "fold_quote"
   to "map_case", "map_dash", "map_quote".

 * Add stopwords() function.

 * Make read_ndjson() decode JSON strings as character or factor (according
   to whether "stringsAsFactors" is TRUE) except for fields named "text",
   which get decoded as text objects.


Changes in version 0.4.0 (2017-05-16)

 * Allow read_ndjson() to read from connections, not just files, by
   reading the file contents into memory first. Use this by default
   instead of memory mapping.

 * Rename text_filter() option "drop_empty" to "ignore_empty".

 * Add text_filter() options "drop_symbol", "drop_number", "drop_letter",
   "drop_kana", and "drop_ideo"; these options replace the matched tokens
   with NA.

 * Fix internal function namespace clashes on Linux and other similar
   platforms.


Changes in version 0.3.0 (2017-05-04)

 * Support for serializing dataset and text objects via readRDS() and
   other native routines.  Unfortunately, this support doesn't come for
   free, and the objects take a little bit more memory.

 * More convenient interface for accessing JSON arrays.

 * Rename as.text()/is.text() to as_text()/is_text(); make as_text()
   retain names, work on S3 objects.

 * Add support for stemming via the Snowball library.

 * Rename read_json() to read_ndjson() to not clash with jsonlite.

 * Rename "dataset" type to "jsondata".

 * Make read_ndjson() return a data frame by default, not a "jsondata"
   object.


Changes in version 0.2.0 (2017-04-15)

 * First CRAN release.

 * Added Windows support.

 * Added support for setting names on text objects.

 * Added documentation.


Changes in version 0.1.0 (2017-04-11)

 * First milestone, with support for JSON decoding, text segmentation,
   and text normalization.
