NEWS 
====

Versioning
----------

Releases will be numbered with the following semantic versioning format:

<major>.<minor>.<patch>

And constructed with the following guidelines:

* Breaking backward compatibility bumps the major (and resets the minor 
  and patch)
* New additions without breaking backward compatibility bumps the minor 
  (and resets the patch)
* Bug fixes and misc changes bumps the patch



textclean 0.7.3
----------------------------------------------------------------

Maintenance release to bring package up to date with the lexicon package API changes.

textclean 0.7.0 - 0.7.2
----------------------------------------------------------------

NEW FEATURES

* `match_tokens` added to find all the tokens that match a regex(es) within a
  given text vector.  This useful when combined with the `replace_tokens` 
  function.
  
* Fixed versions of `drop_element`/`keep_element` added to allow for dropping
  elements specified by a known vector rather than a regex.

* The `collapse` and `glue` functions from the **glue** package are reexported
  for easy string manipulation.
  
* `replace_date` added for normalizing dates.

* `replace_time` added for normalizing time stamps.

* `replace_money` added for normalizing money references.

* `mgsub` picks up a `safe` argument using the **mgsub** package as the backend.
  In addition `mgsub_regex_safe` added to make the usage explicit.  The safe mode
  comes at the cost of speed.

IMPROVEMENTS

* `replace_names` drops the replacement of 
    `c('An', 'To', 'Oh', 'So', 'Do', 'He', 'Ha', 'In', 'Pa', 'Un')` which are 
    likely words and not names.
    
* `replace_html` picks ups some additional symbol replacments including:
  `c("&trade;", "&ldquo;", "&rdquo;", "&lsquo;", "&rsquo;", "&bull;", "&middot;", 
  "&sdot;", "&ndash;", "&mdash;", "&ne;", "&frac12;", "&frac14;", "&frac34;", 
  "&deg;", "&larr;", "&rarr;", "&hellip;")`.



textclean 0.6.0 - 0.6.3
----------------------------------------------------------------

NEW FEATURES

* `replace_kern` added to replace a form of informal emphasis in which the
  writer takes words >2 letters long, capitalizes the entire word, and places
  spaces in between each letter.  This was contributed by Stack Overflow's
  @ctwheels: https://stackoverflow.com/a/47438305/1000343.

* `replace_internet_slang` added to replace Internet acronyms and abbreviations
  with machine friendly word equivalents.
  
* `replace_word_elongation` added to replace word elongations (a.k.a. "word 
  lengthening") with the most likely normalized word form.  See 
  http://www.aclweb.org/anthology/D11-105 for details.
  
* `fgsub` added for the ability to match, extract, operate a function over the
  extracted strings, & replace the original matches with the extracted strings.
  This performs similar functionality to `gsubfn::gsubfn` but is less powerful.
  For more powerful needs see the **gsubfn** package.



textclean 0.4.0 - 0.5.1
----------------------------------------------------------------

BUG FIXES

* `replace_grade` did not use `fixed = TRUE` for its call to `mgsub`.  This could
  result in the plus signs being interpreted as meta-characters.  This has been 
  corrected.

NEW FEATURES

* `replace_names` added to remove/replace common first and last names from text 
  data.
  
* `make_plural` added to make a vector of singular noun forms plural.

* `replace_emoji` and `replace_emoji_identifier` added for replacing emojis with
  text or an identifier token for use in the **sentimentr** package.

MINOR FEATURES

* `mgsub_regex` and `mgsub_fixed` to provide wrappers for `mgsub` that makes
  their use apparent without setting the `fixed` command.
  
* `replace_curly_quote` added to replace curly quotes with straight versions.

IMPROVEMENTS

* `replace_non_ascii` now uses `stringi::stri_trans_general` to coerce more 
  non-ASCII characters to ASCII format.
  
* `check_text` now checks for HTML characters/tags.  Thanks to @Peter Gensler
  for suggesting this (see issue #15). 

CHANGES

* `filter_` functions deprecated in favor of `drop_`/`keep_` versions of filter
  functions.  This was change was to address the opposite meaning that **dplyr**'s 
  `filter` has, which retains rows matching a pattern be default.



textclean 0.3.1
----------------------------------------------------------------

BUG FIXES

* `replace_tokens` added to complement `mgsub` for times when the user wants to 
  replace fixed tokens with a single value or remove them entirely.  This yields 
  an optimized solution that is much faster than `mgsub`.

CHANGES

* `mgusb` no longer uses `trim = TRUE` by default.

textclean 0.2.1 - 0.3.0
----------------------------------------------------------------

BUG FIXES

* `check_text` reported to use `replace_incomplete` rather than 
  `add_missing_endmark` when endmark is missing.
  
NEW FEATURES

* The `replace_emoticon`, `replace_grade` and `replace_rating` functions have 
  been moved from the **sentimentr** package to **textclean** as these are 
  cleaning functions.  This makes the functions more modular and generalizable 
  to all types of text cleaning.  These functions are still imported and 
  exported by **sentimentr**.
  
* `replace_html` added to remove html tags and repalce symbols with appropriate
  ASCII symbols.
  
* `add_missing_endmarks` added to detect missing endmarks and replace with the 
  desired symbol.

IMPROVEMENTS

* `replace_number` now uses the *english* package making it faster and more 
  maintainable.  In addition, the function now handles decimal places as well.



textclean 0.1.0 - 0.2.0
----------------------------------------------------------------

BUG FIXES

* `check_text` reported `NA` as non-ASCII.  This has been fixed.

NEW FEATURES

* `check_text` added to report on potential problems in a text vector.

* `replace_ordinal` added to replace ordinal numbers (e.g., 1st) with word 
  representation (e.g., first).
  
* `swap` added to swap two patterns simultaneously.

* `filter_element` added to exclude matching elements from a vector.



textclean 0.0.1 
----------------------------------------------------------------

This package is a collection of tools to clean and process text.  Many of these tools have been taken from the **qdap** package and revamped to be more intuitive, better named, and faster.
