Skip to main content

One-liner NLP utilities -- summarize, classify, extract entities, analyze sentiment -- with rule-based fallbacks and HuggingFace backends.

Project description

anynlp

NLP utilities that work out of the box — rule-based by default, HuggingFace/LLM-boosted on demand.

PyPI Python License

anynlp is a graceful-degradation NLP toolkit. Every task — summarization, classification, NER, sentiment, keyword extraction, similarity, chunking, clustering — has a pure-Python rule-based implementation that runs with zero dependencies. Install the optional transformers extra for pretrained HuggingFace models, or the llm extra to route any task through anyllm for state-of-the-art quality. Perfect for RAG pipelines, content analysis, and lightweight text processing.

Built by Viet-Anh Nguyen at NRL.ai.

Why anynlp?

  • One-liner APIanynlp.sentiment("I love this") just works, no downloads
  • Plugin architecture — Every task has rules / hf / llm backends, pick per-call
  • Local-first — Rule-based backends run entirely in pure Python, zero deps
  • Minimal core deps — None! Heavy backends are all optional extras
  • Production-ready — Dataclass results, streaming, batch processing

Installation

pip install anynlp

For stronger backends:

pip install anynlp[transformers]    # HuggingFace pretrained models
pip install anynlp[llm]             # route through anyllm (Ollama, OpenAI, ...)
pip install anynlp[sklearn]         # KMeans clustering + TF-IDF speedups
pip install anynlp[all]             # everything

Python 3.8+ supported (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)

Quick Start

import anynlp

# 1. Extractive summarization (pure Python, zero deps)
summary = anynlp.summarize("Long article text...", max_sentences=3)

# 2. Sentiment analysis (AFINN lexicon with negation handling)
s = anynlp.sentiment("I absolutely do not love this product")
print(s.label, s.score)             # "negative" -0.4

# 3. Named-entity recognition (regex patterns by default)
ents = anynlp.ner("Apple was founded by Steve Jobs in Cupertino in 1976.")
for e in ents:
    print(e.text, e.label)          # "Apple" ORG, "Steve Jobs" PERSON, ...

# 4. Zero-shot classification (keyword overlap by default)
result = anynlp.classify(
    "The GPU crashed during training",
    labels=["hardware", "software", "billing"],
)
print(result.label)                 # "hardware"

# 5. RAG-style chunking
chunks = anynlp.chunk(long_text, method="recursive", size=500, overlap=50)

Models & Methods

anynlp has 3 backend tiers — pick based on your needs:

1. Rule-based backend (default, zero dependencies)

  • Summarization: Extractive — sentences scored by position, keyword frequency, and length; top-N selected to meet target ratio
  • Classification: Keyword overlap with class label tokens
  • NER: Regex patterns for emails, URLs, phone numbers, dates, money, percent
  • Sentiment: AFINN-style word lexicon with negation handling and intensifiers
  • Keywords: TF-IDF-like scoring relative to a built-in stopword list
  • Similarity: Jaccard similarity on word sets
  • Chunking: Recursive (paragraph→sentence→word), fixed-size, or sentence-boundary methods
  • Clustering: TF-IDF + KMeans

2. HuggingFace backend (optional via [transformers])

  • Uses transformers.pipeline() for SOTA model-based tasks
  • Models auto-download from HuggingFace Hub
  • Supports: summarization (BART), classification (zero-shot), NER (BERT), sentiment

3. LLM backend (optional via [llm], uses anyllm)

  • Sends task to any LLM (Ollama, OpenAI, Anthropic) for highest quality
  • Structured outputs via JSON mode for entity/keyword extraction
  • Best for: complex documents, multi-language, custom schemas

API Reference

Function Purpose
anynlp.summarize(text, max_sentences=3, backend="rules") Summary text
anynlp.classify(text, labels, backend="rules") ClassificationResult
anynlp.ner(text, backend="rules") list[Entity]
anynlp.sentiment(text, backend="rules") SentimentResult
anynlp.keywords(text, top_k=10) list[(term, score)]
anynlp.similarity(a, b) Float 0.0 - 1.0
anynlp.chunk(text, method="recursive", size=500) list[str]
anynlp.cluster(texts, k=5) list[int] cluster labels

CLI Usage

anynlp summarize article.txt --sentences 5
anynlp sentiment "I loved the book"
anynlp ner "Tim Cook visited Paris on 2024-05-01"
anynlp classify "GPU crash" --labels "hardware,software,billing"
anynlp chunk document.txt --method recursive --size 500

Examples

RAG chunking + clustering

import anynlp

# Split documents for a RAG index
with open("book.txt") as f:
    chunks = anynlp.chunk(f.read(), method="recursive", size=500, overlap=50)

# Cluster the chunks to find topic groups (TF-IDF + KMeans)
labels = anynlp.cluster(chunks, k=8)

Upgrade to HuggingFace when quality matters

import anynlp

# Rule-based: fast, no downloads
s1 = anynlp.sentiment("complicated sentence with irony", backend="rules")

# HuggingFace: slower first call, better accuracy
s2 = anynlp.sentiment("complicated sentence with irony", backend="hf")

# LLM: highest quality, requires anyllm + a provider
s3 = anynlp.sentiment("complicated sentence with irony",
                      backend="llm", model="llama3.1:8b")

Zero-shot text classification with a local LLM

import anynlp

result = anynlp.classify(
    "My order is 3 weeks late",
    labels=["billing", "shipping", "product", "other"],
    backend="llm",
    model="llama3.1:8b",
)
print(result.label, result.score)

License

MIT (c) Viet-Anh Nguyen

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anynlp-0.2.4.tar.gz (44.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anynlp-0.2.4-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file anynlp-0.2.4.tar.gz.

File metadata

  • Download URL: anynlp-0.2.4.tar.gz
  • Upload date:
  • Size: 44.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anynlp-0.2.4.tar.gz
Algorithm Hash digest
SHA256 66fb62607f90ae52f960e6135ee6c145269a258c6de32fc5fe8fe1e3257a4027
MD5 d4870edcbe50ab32b6ff0183f33ea4e9
BLAKE2b-256 5a86e99c6eeb9a4e087801fa8732027db0f3a08c9792b05c437783c64b5d2a5d

See more details on using hashes here.

File details

Details for the file anynlp-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: anynlp-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 37.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anynlp-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 81ecad3a0895cf2b3cac80b93409e91d8d2c9f483b551de4f74289f344a6b17c
MD5 1fd70db86174b79d60413a83be795921
BLAKE2b-256 2f8bcbc329bf8b69ac2633c1742bb252cd84f4626828178705a81c93746e5d5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page