The goal of messy.cats is to make cleaning messy categorical easier. User inputted character categorical data often suffers from messiness than can be complicated and time consuming to clean up. When inputting data, users often make typos or formatting errors than cause stubborn, hard to detect issues in their data. By leveraging string distance measurement tools, messy.cats allows users to automate many of the steps involved with cleaning categorical data. This enables users to spend less time fiddling around with inconsistent categorical data with less effort.
You can install the released version of messy.cats from GITHUB with:
if(!require(devtools)){
install.packages("devtools")
}
devtools::install_github("hkarp1/messy.cats")This is a basic example which shows you how to solve a common problem:
library(messy.cats)
plant_categories = c("tree", "bush", "herb", "grass")
messy_plant_categories = c("green tree", "red bush", "new herb", "old grass", "young tree", "small bush", "20 herbs", "the grass", "a tree",
"bushes", "herbs", "tall grass")
cat_match(plant_categories, messy_plant_categories)
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.
This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.