Package: textpress 1.1.1

Jason Timm
textpress: A Lightweight and Versatile NLP Toolkit
An R toolkit for building text corpora and searching them. No custom object classes, just plain data frames from start to finish. Covers the full arc from URL to retrieved passage through a consistent four-step API: Fetch, Read, Process, Search. Traditional tools (KWIC, BM25, dictionary matching) sit alongside modern ones (semantic search, LLM-ready chunking), all compatible with the native R pipe ('|>').
Authors:
textpress_1.1.1.tar.gz
textpress_1.1.1.zip(r-4.7)textpress_1.1.1.zip(r-4.6)textpress_1.1.1.zip(r-4.5)
textpress_1.1.1.tgz(r-4.6-any)textpress_1.1.1.tgz(r-4.5-any)
textpress_1.1.1.tar.gz(r-4.7-any)textpress_1.1.1.tar.gz(r-4.6-any)
textpress_1.1.1.tgz(r-4.6-emscripten)
manual.pdf |manual.html✨
card.svg |card.png
textpress/json (API)
NEWS
| # Install 'textpress' in R: |
| install.packages('textpress', repos = c('https://jaytimm.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/jaytimm/textpress/issues
Pkgdown/docs site:https://jaytimm.github.io
Last updated from:33c73e76ae. Checks:7 WARNING, 2 OK. Indexed: yes.
| Target | Result | Time | Files | Syslog |
|---|---|---|---|---|
| linux-devel-x86_64 | WARNING | 170 | ||
| source / vignettes | OK | 158 | ||
| linux-release-x86_64 | WARNING | 164 | ||
| macos-release-arm64 | WARNING | 133 | ||
| macos-oldrel-arm64 | WARNING | 136 | ||
| windows-devel | WARNING | 79 | ||
| windows-release | WARNING | 95 | ||
| windows-oldrel | WARNING | 73 | ||
| wasm-release | OK | 108 |
Exports:abbreviationsdict_generationsdict_politicalfetch_urlsfetch_wiki_refsfetch_wiki_urlsnlp_cast_tokensnlp_index_tokensnlp_roll_chunksnlp_split_paragraphsnlp_split_sentencesnlp_tokenize_textread_urlssearch_dictsearch_indexsearch_regexsearch_vectorutil_fetch_embeddings
Dependencies:askpassclicpp11curldata.tablegenericsgluehttrjsonlitelatticelifecyclelubridatemagrittrMatrixmimeopensslpbapplypillarpkgconfigR6rlangrvestselectrstringistringrsystibbletimechangeutf8vctrsxml2
Readme and manuals
Help Manual
| Help page | Topics |
|---|---|
| Common abbreviations for NLP | abbreviations |
| Demo dictionary of generation-name variants for NER | dict_generations |
| Demo dictionary of political / partisan term variants for NER | dict_political |
| Fetch URLs from a search engine | fetch_urls |
| Fetch external citation URLs from Wikipedia article(s) | fetch_wiki_refs |
| Fetch Wikipedia page URLs by search query | fetch_wiki_urls |
| Convert token list to data frame | nlp_cast_tokens |
| Build a BM25 index for ranked keyword search | nlp_index_tokens |
| Roll units into fixed-size chunks with optional context | nlp_roll_chunks |
| Split text into paragraphs | nlp_split_paragraphs |
| Split text into sentences | nlp_split_sentences |
| Tokenize text into a clean token stream | nlp_tokenize_text |
| Read content from URLs | read_urls |
| Exact phrase / MWE matcher | search_dict |
| Search the BM25 index | search_index |
| Search corpus by regex | search_regex |
| Semantic search by cosine similarity | search_vector |
| Fetch embeddings from a Hugging Face inference endpoint | util_fetch_embeddings |