Functions for extracting text and tables from PDF-based order documents. It provides an n-gram-based approach for identifying the language of an order document. It furthermore uses R-package 'pdftools' to extract the text from an order document. In the case that the PDF document is only including an image (because it is scanned document), R package 'tesseract' is used for OCR. Furthermore, the package provides functionality for identifying and extracting order position tables in order documents based on a clustering approach.
| Version: | 1.0.1 |
| Depends: | R (≥ 4.3.0), tidyselect |
| Imports: | data.table, dplyr, matrixcalc, quanteda, rlist, stringr, tibble, tidyr, utils, purrr, digest, lubridate |
| Suggests: | pdftools, tesseract, xml2 |
| Published: | 2026-01-15 |
| DOI: | 10.32614/CRAN.package.orderanalyzer |
| Author: | Michael Scholz [cre, aut], Joerg Bauer [aut] |
| Maintainer: | Michael Scholz <michael.scholz at th-deg.de> |
| License: | GPL-3 |
| NeedsCompilation: | no |
| SystemRequirements: | Tesseract >= 5.0.0, libtesseract-dev (deb), tesseract-devel (rpm), libleptonica-dev (deb), leptonica-devel (rpm), tesseract-ocr-eng (deb), libpoppler-cpp-dev (deb), poppler-cpp-devel (rpm), poppler-data (rpm/deb), libxml2-dev (deb), libxml2-devel (rpm) |
| CRAN checks: | orderanalyzer results |
| Reference manual: | orderanalyzer.html , orderanalyzer.pdf |
| Package source: | orderanalyzer_1.0.1.tar.gz |
| Windows binaries: | r-devel: orderanalyzer_1.0.1.zip, r-release: orderanalyzer_1.0.1.zip, r-oldrel: orderanalyzer_1.0.1.zip |
| macOS binaries: | r-release (arm64): orderanalyzer_1.0.1.tgz, r-oldrel (arm64): orderanalyzer_1.0.1.tgz, r-release (x86_64): orderanalyzer_1.0.1.tgz, r-oldrel (x86_64): orderanalyzer_1.0.1.tgz |
| Old sources: | orderanalyzer archive |
Please use the canonical form https://CRAN.R-project.org/package=orderanalyzer to link to this page.
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.
This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.