Deduplicates datasets by retaining the most complete and informative records. Identifies duplicated entries based on a specified key column, calculates completeness scores for each row, and compares values within groups. When differences between duplicates exceed a user-defined threshold, records are split into unique IDs; otherwise, they are coalesced into a single, most complete entry. Returns a list containing the original duplicates, the split entries, and the final coalesced dataset. Useful for cleaning survey or administrative data where duplicated IDs may reflect minor data entry inconsistencies.
| Version: | 0.1.0 |
| Imports: | dplyr, rlang, magrittr |
| Published: | 2025-07-15 |
| DOI: | 10.32614/CRAN.package.pickmax |
| Author: | Sbonelo Chamane [aut, cre] (ORCID: 0000-0001-5350-5203), Musawenkosi Mabaso [aut], Ronel Sewpaul [aut], Sean Jooste [aut], Kutloano Skhosana [aut], Khangelani Zuma [aut] |
| Maintainer: | Sbonelo Chamane <SChamane at hsrc.ac.za> |
| License: | GPL-3 |
| NeedsCompilation: | no |
| CRAN checks: | pickmax results |
| Reference manual: | pickmax.html , pickmax.pdf |
| Package source: | pickmax_0.1.0.tar.gz |
| Windows binaries: | r-devel: pickmax_0.1.0.zip, r-release: pickmax_0.1.0.zip, r-oldrel: pickmax_0.1.0.zip |
| macOS binaries: | r-release (arm64): pickmax_0.1.0.tgz, r-oldrel (arm64): pickmax_0.1.0.tgz, r-release (x86_64): pickmax_0.1.0.tgz, r-oldrel (x86_64): pickmax_0.1.0.tgz |
Please use the canonical form https://CRAN.R-project.org/package=pickmax to link to this page.
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.
This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.