Domain-agnostic text, image, PDF, and DOCX classification engine powered by LLMs
Project description
cat-stack
Domain-agnostic text, image, and PDF classification engine powered by LLMs.
cat-stack is the shared base package for the CatLLM ecosystem. It provides the core classification, extraction, exploration, and summarization engine that all domain-specific CatLLM packages build on.
Installation
pip install cat-stack
Optional extras:
pip install cat-stack[pdf] # PDF support (PyMuPDF)
pip install cat-stack[embeddings] # Embedding similarity scoring
pip install cat-stack[formatter] # JSON formatter fallback model
Ecosystem
cat-stack is independently useful for classifying any text column. Domain-specific packages extend it with tuned prompts and workflows:
| Package | Domain |
|---|---|
| cat-stack | General-purpose text, image, PDF classification (this package) |
| cat-survey | Survey response classification |
| cat-vader | Social media text (Reddit, Twitter/X) |
| cat-ademic | Academic papers, PDFs, citations |
| cat-cog | Cognitive assessment & visual scoring (CERAD) |
| cat-pol | Political text (manifestos, speeches, legislation) |
Installing cat-llm pulls in all of the above.
Quick Start
import cat_stack as cat
# Classify text into predefined categories
result = cat.classify(
input_data=df["text_column"],
categories=["Positive", "Negative", "Neutral"],
models=[("gpt-4o", "openai", OPENAI_KEY)],
filename="classified.csv"
)
Core API
classify()
Assign predefined categories to text, images, or PDFs. Supports single-model and multi-model ensemble classification with consensus voting.
cat.classify(
input_data=df["text"],
categories=["Cat A", "Cat B", "Cat C"],
models=[("gpt-4o", "openai", key1), ("claude-sonnet-4-20250514", "anthropic", key2)],
filename="results.csv"
)
extract()
Discover categories from a corpus using LLM-driven exploration.
cat.extract(
input_data=df["text"],
survey_question="What is this text about?",
models=[("gpt-4o", "openai", key)],
)
explore()
Raw category extraction for saturation analysis.
cat.explore(
input_data=df["text"],
description="Describe the main themes",
models=[("gpt-4o", "openai", key)],
)
summarize()
Summarize text or PDF documents, with optional multi-model ensemble.
cat.summarize(
input_data=df["text"],
models=[("gpt-4o", "openai", key)],
filename="summaries.csv"
)
Supported Providers
OpenAI, Anthropic, Google (Gemini), Mistral, Perplexity, xAI (Grok), HuggingFace, Ollama (local models).
All providers use the same (model_name, provider, api_key) tuple format. Provider is auto-detected from model name if omitted.
Features
- Multi-model ensemble with consensus voting and agreement scores
- Batch API support for OpenAI, Anthropic, Google, Mistral, and xAI
- Prompt strategies: Chain-of-Thought, Chain-of-Verification, step-back prompting, few-shot examples
- Text, image, and PDF input auto-detection
- Embedding similarity tiebreaker for ensemble consensus ties
License
GPL-3.0-or-later
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cat_stack-0.3.0.tar.gz.
File metadata
- Download URL: cat_stack-0.3.0.tar.gz
- Upload date:
- Size: 463.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6274b512543a3fe677f376ac6f7517e23085f6c634c12b7bba68b7e628679ee0
|
|
| MD5 |
19bbb8ac3ce9c562840a6138c9424f16
|
|
| BLAKE2b-256 |
d84bc97388366c5df099f492237172f96c25b25d4454b031bad27806b994770a
|
File details
Details for the file cat_stack-0.3.0-py3-none-any.whl.
File metadata
- Download URL: cat_stack-0.3.0-py3-none-any.whl
- Upload date:
- Size: 487.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7287b0e2f8d92e9d7b677a8696ebe32b81bd2c4aa90438fb37bc162f47d3a28
|
|
| MD5 |
14a3201c40c63fdaa3fd078b57451662
|
|
| BLAKE2b-256 |
0db9a2e2ea96ad99e784d7ba5f8428cc5fb1c08a4fb094294dc4e68f501081a2
|