Functions for extracting text and tables from 
  PDF-based order documents. It provides an n-gram-based approach for identifying 
  the language of an order document. It furthermore uses R-package 'pdftools' to 
  extract the text from an order document. In the case that the PDF document is 
  only including an image (because it is scanned document), R package 'tesseract' 
  is used for OCR. Furthermore, the package provides functionality for identifying 
  and extracting order position tables in order documents based on a clustering approach.
| Version: | 
1.0.0 | 
| Depends: | 
R (≥ 4.3.0), tidyselect | 
| Imports: | 
data.table, dplyr, matrixcalc, quanteda, rlist, stringr, tibble, tidyr, utils, purrr, digest, lubridate | 
| Suggests: | 
pdftools, tesseract, xml2 | 
| Published: | 
2024-12-12 | 
| DOI: | 
10.32614/CRAN.package.orderanalyzer | 
| Author: | 
Michael Scholz [cre, aut],
  Joerg Bauer [aut] | 
| Maintainer: | 
Michael Scholz  <michael.scholz at th-deg.de> | 
| License: | 
GPL-3 | 
| NeedsCompilation: | 
no | 
| SystemRequirements: | 
Tesseract >= 5.0.0, libtesseract-dev (deb),
tesseract-devel (rpm), libleptonica-dev (deb), leptonica-devel
(rpm), tesseract-ocr-eng (deb), libpoppler-cpp-dev (deb),
poppler-cpp-devel (rpm), poppler-data (rpm/deb), libxml2-dev
(deb), libxml2-devel (rpm) | 
| CRAN checks: | 
orderanalyzer results |