Skip to main content

One-liner NLP utilities -- summarize, classify, extract entities, analyze sentiment -- with rule-based fallbacks and HuggingFace backends.

Project description

anynlp

NLP utilities that work out of the box — rule-based by default, HuggingFace/LLM-boosted on demand.

PyPI Python License

anynlp is a graceful-degradation NLP toolkit. Every task — summarization, classification, NER, sentiment, keyword extraction, similarity, chunking, clustering — has a pure-Python rule-based implementation that runs with zero dependencies. Install the optional transformers extra for pretrained HuggingFace models, or the llm extra to route any task through anyllm for state-of-the-art quality. Perfect for RAG pipelines, content analysis, and lightweight text processing.

Built by Viet-Anh Nguyen at NRL.ai.

Why anynlp?

  • One-liner APIanynlp.sentiment("I love this") just works, no downloads
  • Plugin architecture — Every task has rules / hf / llm backends, pick per-call
  • Local-first — Rule-based backends run entirely in pure Python, zero deps
  • Minimal core deps — None! Heavy backends are all optional extras
  • Production-ready — Dataclass results, streaming, batch processing

Installation

pip install anynlp

For stronger backends:

pip install anynlp[transformers]    # HuggingFace pretrained models
pip install anynlp[llm]             # route through anyllm (Ollama, OpenAI, ...)
pip install anynlp[sklearn]         # KMeans clustering + TF-IDF speedups
pip install anynlp[all]             # everything

Python 3.8+ supported (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)

Quick Start

import anynlp

# 1. Extractive summarization (pure Python, zero deps)
summary = anynlp.summarize("Long article text...", max_sentences=3)

# 2. Sentiment analysis (AFINN lexicon with negation handling)
s = anynlp.sentiment("I absolutely do not love this product")
print(s.label, s.score)             # "negative" -0.4

# 3. Named-entity recognition (regex patterns by default)
ents = anynlp.ner("Apple was founded by Steve Jobs in Cupertino in 1976.")
for e in ents:
    print(e.text, e.label)          # "Apple" ORG, "Steve Jobs" PERSON, ...

# 4. Zero-shot classification (keyword overlap by default)
result = anynlp.classify(
    "The GPU crashed during training",
    labels=["hardware", "software", "billing"],
)
print(result.label)                 # "hardware"

# 5. RAG-style chunking
chunks = anynlp.chunk(long_text, method="recursive", size=500, overlap=50)

Models & Methods

anynlp has 3 backend tiers — pick based on your needs:

1. Rule-based backend (default, zero dependencies)

  • Summarization: Extractive — sentences scored by position, keyword frequency, and length; top-N selected to meet target ratio
  • Classification: Keyword overlap with class label tokens
  • NER: Regex patterns for emails, URLs, phone numbers, dates, money, percent
  • Sentiment: AFINN-style word lexicon with negation handling and intensifiers
  • Keywords: TF-IDF-like scoring relative to a built-in stopword list
  • Similarity: Jaccard similarity on word sets
  • Chunking: Recursive (paragraph→sentence→word), fixed-size, or sentence-boundary methods
  • Clustering: TF-IDF + KMeans

2. HuggingFace backend (optional via [transformers])

  • Uses transformers.pipeline() for SOTA model-based tasks
  • Models auto-download from HuggingFace Hub
  • Supports: summarization (BART), classification (zero-shot), NER (BERT), sentiment

3. LLM backend (optional via [llm], uses anyllm)

  • Sends task to any LLM (Ollama, OpenAI, Anthropic) for highest quality
  • Structured outputs via JSON mode for entity/keyword extraction
  • Best for: complex documents, multi-language, custom schemas

API Reference

Function Purpose
anynlp.summarize(text, max_sentences=3, backend="rules") Summary text
anynlp.classify(text, labels, backend="rules") ClassificationResult
anynlp.ner(text, backend="rules") list[Entity]
anynlp.sentiment(text, backend="rules") SentimentResult
anynlp.keywords(text, top_k=10) list[(term, score)]
anynlp.similarity(a, b) Float 0.0 - 1.0
anynlp.chunk(text, method="recursive", size=500) list[str]
anynlp.cluster(texts, k=5) list[int] cluster labels

CLI Usage

anynlp summarize article.txt --sentences 5
anynlp sentiment "I loved the book"
anynlp ner "Tim Cook visited Paris on 2024-05-01"
anynlp classify "GPU crash" --labels "hardware,software,billing"
anynlp chunk document.txt --method recursive --size 500

Examples

RAG chunking + clustering

import anynlp

# Split documents for a RAG index
with open("book.txt") as f:
    chunks = anynlp.chunk(f.read(), method="recursive", size=500, overlap=50)

# Cluster the chunks to find topic groups (TF-IDF + KMeans)
labels = anynlp.cluster(chunks, k=8)

Upgrade to HuggingFace when quality matters

import anynlp

# Rule-based: fast, no downloads
s1 = anynlp.sentiment("complicated sentence with irony", backend="rules")

# HuggingFace: slower first call, better accuracy
s2 = anynlp.sentiment("complicated sentence with irony", backend="hf")

# LLM: highest quality, requires anyllm + a provider
s3 = anynlp.sentiment("complicated sentence with irony",
                      backend="llm", model="llama3.1:8b")

Zero-shot text classification with a local LLM

import anynlp

result = anynlp.classify(
    "My order is 3 weeks late",
    labels=["billing", "shipping", "product", "other"],
    backend="llm",
    model="llama3.1:8b",
)
print(result.label, result.score)

License

MIT (c) Viet-Anh Nguyen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anynlp-0.2.3.tar.gz (43.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anynlp-0.2.3-py3-none-any.whl (35.9 kB view details)

Uploaded Python 3

File details

Details for the file anynlp-0.2.3.tar.gz.

File metadata

  • Download URL: anynlp-0.2.3.tar.gz
  • Upload date:
  • Size: 43.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anynlp-0.2.3.tar.gz
Algorithm Hash digest
SHA256 adebbd6923622148697dde329fae3de2be8be5becde8af96ccd5c0bbc55469ea
MD5 1ab965055f28045e8e9991f7ecabacf2
BLAKE2b-256 0d6be9e171b58147f4df1e0c13e3f65fcddbacc2180d86d38cf9829609c2de6c

See more details on using hashes here.

File details

Details for the file anynlp-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: anynlp-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 35.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anynlp-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 53d1e50457cf435ee38030c05116dc5fb73551c50a1da73a0ca3c647766b0b42
MD5 108f0f8b6ffef89ba2034a1dfdb1c3a8
BLAKE2b-256 ac99e3a11d0fadd6ad95dc0ea6e1d49a00fd5bfa8fd6fc63fb5e1718ae8928be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page