NEWS

Documentation

Six vignettes added covering the full pipeline: web data, Wikipedia data, regex search, dictionary search, semantic search (RAG), and basic NLP processing.

Basic NLP vignette walks through nlp_split_sentences(), nlp_tokenize_text() (word and Biber methods), and nlp_cast_tokens() stepwise and as a single pipe.

README revamped: tighter intro, API map, RAG/agent positioning, vignette links.

Changes

util_fetch_embeddings() re-added for embedding generation via Hugging Face inference endpoints (reversed 1.1.0 removal; now calls the HF inference API rather than loading models locally).

nlp_cast_tokens() documented and surfaced – flattens the token list from nlp_tokenize_text() into a long-format data frame with optional character spans.

Suggests trimmed: ellmer and unused packages removed.

API and naming

Package is now organized around a four-stage pipeline: Fetch → Read → Process → Search. All functions use a consistent verb_noun pattern.

Acquire: fetch_urls() (from web search), fetch_wiki_urls(), fetch_wiki_refs() — return URLs or metadata, not full text.

Ingest: read_urls() — read content from URLs into R (replaces web_scrape_urls).

Process: nlp_split_*, nlp_tokenize_text(), nlp_index_tokens() (and nlp_roll_chunks() for rolling windows).

Search: Four retrieval options — search_regex() (regex/KWIC), search_index() (BM25), search_vector() (cosine over your own embeddings), search_dict() (dictionary match; replaces ner_extract_entities).

Common parameters standardized: corpus (replaces tif), by (replaces text_hierarchy).

Removed

In-package embedding generation (e.g. Hugging Face API). Use your own embedding pipeline and pass your embedding matrix as the argument to .

Legacy names: web_search, wiki_search, wiki_find_references, web_scrape_urls, ner_extract_entities, sem_nearest_neighbors / sem_search_corpus (replaced by search_vector and search_regex).

textpress 1.0.0

Initial release: URL fetching, URL content reading, NLP processing (split, tokenize, index), and corpus/search utilities.

Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.

This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.

Welcome to ClientVPS Mirrors

textpress 1.1.1

Documentation

Changes

textpress 1.1.0

API and naming

Removed

Docs

textpress 1.0.0