The CatLLM ecosystem — LLM-powered classification for text, images, and PDFs across domains.

These details have not been verified by PyPI

Project links

Project description

catllm Logo

cat-llm

CatLLM: A Reproducible LLM Pipeline for Classifying Open-Ended Text Across Domains

The Problem

If you work with open-ended text data — survey responses, social media posts, academic papers, policy documents — you know the pain: hundreds or thousands of free-text entries that need to be categorized before you can do any quantitative analysis. The traditional approach is manual coding — either doing it yourself or hiring research assistants. It's slow, expensive, and doesn't scale.

The Solution

CatLLM is an ecosystem of Python packages that use LLMs to automate the categorization of open-ended text across domains. It handles:

Category Assignment: Classify responses into your predefined categories (multi-label supported)
Category Extraction: Automatically discover and extract categories from your data when you don't have a predefined scheme
Category Exploration: Analyze category stability and saturation through repeated raw extraction
Summarization: Generate concise summaries of text or PDF documents (paragraph, bullets, one-liner, structured, or full report)
Prompt Optimization: Automatically tune classification prompts using user feedback for higher accuracy

With leading models like GPT-5, Gemini, and Qwen 3, CatLLM achieves 98% accuracy compared to human consensus on classification tasks.

Try the web app: https://huggingface.co/spaces/CatLLM/survey-classifier

Ecosystem

cat-llm is a meta-package (like tidyverse for R) that installs the full family of domain-specific classification packages. Each package can also be installed individually for a lighter footprint.

Package	Domain	Install	Import
cat-llm	Everything (meta-package)	`pip install cat-llm`	`import catllm`
cat-stack	General-purpose base	`pip install cat-stack`	`import cat_stack`
cat-survey	Survey responses	`pip install cat-survey`	`import cat_survey`
cat-vader	Social media	`pip install cat-vader`	`import catvader`
cat-ademic	Academic papers	`pip install cat-ademic`	`import catademic`
cat-pol	Political text	`pip install cat-pol`	`import cat_pol`
cat-cog	Cognitive assessment	`pip install cat-cog`	`import cat_cog`
cat-web	Web content	`pip install cat-web`	`import catweb`

Dependency graph:

cat-stack                           ← general base + shared infra
    ↑
cat-survey  cat-vader  cat-ademic   ← domain packages (each depends on cat-stack)
cat-pol     cat-cog    cat-web
    ↑           ↑          ↑
             cat-llm                ← meta-package (depends on all of the above)

Every domain package exposes the same five core functions — classify(), extract(), explore(), summarize(), prompt_tune() — with domain-specific parameters added on top. Learn once, apply anywhere.

Installation
- R Package
Quick Start
Domain Packages
Best Practices for Classification
Configuration
Supported Models
API Reference
- classify() - Unified function for text, image, and PDF (auto-detects input type)
- prompt_tune() - Automatic prompt optimization via user feedback
- extract() - Unified function for category extraction
- explore() - Raw category extraction for saturation analysis
- summarize() - Unified function for text and PDF summarization
- image_score_drawing()
- image_features()
- cerad_drawn_score()
Deprecated Functions
Related Projects
Academic Research
Testing
Contributing & Support
License

Installation

Install the full ecosystem:

pip install cat-llm

Or install only the domain you need (lighter footprint):

pip install cat-survey    # survey responses — pulls in cat-stack automatically
pip install cat-vader     # social media
pip install cat-ademic    # academic papers
pip install cat-pol       # political text
pip install cat-cog       # cognitive assessment (CERAD scoring)
pip install cat-web       # web content classification
pip install cat-stack     # general-purpose base only, no domain framing

Optional extras (apply to both cat-llm and cat-stack):

pip install cat-llm[pdf]          # PDF support
pip install cat-llm[embeddings]   # Embedding-based similarity scores
pip install cat-llm[formatter]    # Local JSON formatter fallback

Dependencies

All dependencies are declared in pyproject.toml and installed automatically by pip. No manual dependency management is needed.

Core dependencies (installed with any pip install cat-llm or cat-stack):

Package	Purpose
`pandas`	Data manipulation and output DataFrames
`tqdm`	Progress bars during classification
`requests`	HTTP calls to LLM provider APIs
`regex`	JSON extraction from LLM responses

Optional dependencies (install with extras syntax):

Extra	Packages	Install
`pdf`	PyMuPDF	`pip install cat-llm[pdf]`
`docx`	python-docx	`pip install cat-llm[docx]`
`embeddings`	sentence-transformers	`pip install cat-llm[embeddings]`
`formatter`	torch, transformers, accelerate	`pip install cat-llm[formatter]`

No provider-specific SDKs are required at runtime. CatLLM communicates with all LLM providers (OpenAI, Anthropic, Google, Mistral, HuggingFace, Perplexity, xAI, Ollama) via their REST APIs using requests directly.

R Package

An R wrapper is available for users who prefer R over Python. It uses reticulate to call the Python package under the hood.

# Install from GitHub
devtools::install_github("chrissoria/cat-llm", subdir = "r-package/catllm")

# Install the Python backend (one-time setup)
catllm::install_catllm()

All three core functions — classify(), extract(), and explore() — are available with native R syntax. See the R package README for full documentation and examples.

Quick Start

This package is designed for building datasets at scale, not one-off queries. While you can categorize individual responses, its primary purpose is batch processing entire text columns, image collections, or PDF corpora into structured research datasets.

All outputs are formatted for immediate statistical analysis and can be exported directly to CSV.

Not to be confused with CAT-LLM for Chinese article‐style transfer (Tao et al. 2024).

Option A — via meta-package

Install cat-llm and access every domain through a single import:

import catllm

# Domain-neutral classification (from cat-stack)
results = catllm.classify(
    input_data=df['responses'],
    categories=["Positive", "Negative", "Neutral"],
    description="Customer feedback",
    api_key=api_key
)

# Survey classification — adds survey-tuned prompts
results = catllm.classify_survey(
    input_data=df['responses'],
    categories=["Job change", "Family reasons", "Cost of living"],
    survey_question="Why did you move to a new city?",
    api_key=api_key
)

# Academic paper classification — adds journal/field context
results = catllm.classify_academic(
    input_data=["paper1.pdf", "paper2.pdf"],
    categories=["Empirical", "Theoretical", "Review"],
    journal_issn="0894-4393",
    api_key=api_key
)

# Social media classification — adds platform context
results = catllm.classify_social(
    input_data=df['posts'],
    categories=["Misinformation", "Opinion", "News"],
    platform="Reddit",
    api_key=api_key
)

# Political text classification — with built-in data sources
results = catllm.classify_policy(
    source="city_san_diego",
    categories=["Housing", "Public Safety", "Finance"],
    doc_type="ordinance",
    since="2022-01-01",
    n=50,
    api_key=api_key
)

# Prompt optimization — correct a small sample, get optimized prompts
result = catllm.prompt_tune(
    input_data=df['responses'],
    categories=["Positive", "Negative", "Neutral"],
    api_key=api_key,
    sample_size=15,
)

# Cognitive assessment scoring
scores = catllm.cerad_drawn_score(
    shape="diamond",
    image_input=df['drawing_paths'],
    api_key=api_key
)

Option B — direct install (lighter footprint)

Install only the domain package you need:

# pip install cat-ademic
import catademic as cat

results = cat.classify(
    input_data=["paper1.pdf", "paper2.pdf"],
    categories=["Empirical", "Theoretical", "Review"],
    journal_issn="0894-4393",
    api_key=api_key
)

# pip install cat-vader
import catvader as cat

results = cat.classify(
    input_data=df['posts'],
    categories=["Misinformation", "Opinion", "News"],
    platform="Reddit",
    api_key=api_key
)

# pip install cat-stack (general-purpose, no domain framing)
import cat_stack as cat

results = cat.classify(
    input_data=df['text_column'],
    categories=["Category A", "Category B", "Category C"],
    description="My text data",
    api_key=api_key
)

Domain Packages

Each domain package wraps cat-stack's classification engine with domain-tuned prompts and domain-specific parameters. The base classify(), extract(), explore(), summarize(), and prompt_tune() parameters all work — domain packages add parameters on top.

cat-survey — Survey Responses

The survey package provides survey-tuned prompts, few-shot example support, and R/Stata wrappers. This was the original heart of cat-llm.

Key parameter: survey_question= — provides the survey question respondents were asked
Supports few-shot examples (example1–example6) for guiding classification
R and Stata wrappers available for multi-language workflows

import cat_survey as cat

results = cat.classify(
    input_data=df['responses'],
    categories=["Job change", "Family reasons", "Cost of living"],
    survey_question="Why did you move to a new city?",
    example1="I got a new job in Seattle|Job change",
    api_key=api_key
)

cat-vader — Social Media

Platform-aware classification with social media metadata injection (platform, handle, hashtags, engagement metrics).

Key parameter: platform= — injects platform-specific context (Reddit, Twitter/X, forums)
Handles nested comment structures and threaded conversations

import catvader as cat

results = cat.classify(
    input_data=df['posts'],
    categories=["Misinformation", "Opinion", "News sharing"],
    platform="Reddit",
    api_key=api_key
)

cat-ademic — Academic Papers

PDF-first classification for academic and long-form documents, with OpenAlex integration for fetching papers by journal, field, or topic.

Key parameter: journal_issn= — adds journal context for more accurate classification
find_journal() helper for looking up journal metadata via OpenAlex
Per-page and whole-document classification modes

import catademic as cat

results = cat.classify(
    input_data=["paper1.pdf", "paper2.pdf"],
    categories=["Empirical", "Theoretical", "Review"],
    journal_issn="0894-4393",
    api_key=api_key
)

cat-pol — Political Text

Domain-tuned prompts for political science — legislation, speeches, executive orders, social media. Includes 15 built-in political data sources on HuggingFace (municipal ordinances from 11 California cities, federal public laws, executive orders, presidential speeches, and Trump's Truth Social posts).

Key parameter: source= — pull data directly from built-in political datasets
Key parameter: document_context= — frames the political text type for better classification
Built-in sources updated weekly via automated scrapers
Designed for policy area coding, ideology classification, cross-city comparison

import cat_pol as pol

# Classify ordinances from a built-in source
results = pol.classify(
    source="city_san_diego",
    categories=["Housing", "Public Safety", "Infrastructure", "Finance"],
    doc_type="ordinance",
    since="2022-01-01",
    n=50,
    api_key=api_key,
)

# Optimize prompts with user feedback
result = pol.prompt_tune(
    source="city_san_diego",
    categories=["Pro-Business", "Pro-Regulation", "Tax Increase"],
    api_key=api_key,
    sample_size=15,
)

# Summarize executive orders as bullet points
summaries = pol.summarize(
    source="federal_executive_orders",
    n=10,
    format="bullets",
    api_key=api_key,
)

# Fetch raw data (no API key needed)
df = pol.fetch_source("social_trump_truth", since="2024-01-01")
pol.list_sources()  # see all 15 sources

cat-cog — Cognitive Assessment

LLM-powered evaluation of drawn images for neuropsychological testing, including CERAD scoring.

Key function: cerad_drawn_score() — scores drawings of circles, diamonds, rectangles, and cubes
Designed for clinical research and cognitive screening studies

import cat_cog

scores = cat_cog.cerad_drawn_score(
    shape="diamond",
    image_input=df['drawing_paths'],
    api_key=api_key
)

cat-web — Web Content

Web content classification and fact-checking. Thin wrapper on cat-stack for URL-based classification.

import catweb as cat

results = cat.classify(
    input_data=df['urls'],
    categories=["News", "Blog", "E-commerce", "Academic"],
    api_key=api_key
)

cat-stack — General-Purpose Base

The domain-neutral classification engine that all other packages build on. Use this directly when your text doesn't fit neatly into a specific domain.

import cat_stack as cat

results = cat.classify(
    input_data=df['text_column'],
    categories=["Category A", "Category B", "Category C"],
    description="My text data",
    api_key=api_key
)

Best Practices for Classification

These recommendations are based on empirical testing across 4 surveys, 4 models (7B to frontier-class), and 250-row subsamples compared against human-coded ground truth.

What works

Detailed category descriptions: The single biggest lever for accuracy. Instead of short labels like "Job change", use verbose descriptions like "The person had a job or school or career change, including transferred and retired." This consistently improves accuracy across all models by several percentage points.
Include an "Other" category: Adding a catch-all category like "Other: The response does not fit any of the above categories." prevents the model from forcing ambiguous responses into ill-fitting categories, improving precision.
Few-shot examples (example1–example6): Providing 2–4 labeled examples can help, particularly for weaker models. Effects are modest (+0–1 pp on average) and model-dependent.
Low temperature (creativity=0): For classification tasks, deterministic output is generally preferable. Higher temperatures introduce noise without improving accuracy.

What doesn't help (or hurts)

Chain of Thought (chain_of_thought): In our testing, enabling COT did not improve classification accuracy for any model and slightly degraded it for some. It is now off by default.
Chain of Verification (chain_of_verification): CoVE uses ~4x the API calls per response for a self-verification loop. Despite the added cost, it consistently reduced accuracy by 1–2 percentage points, primarily by retracting correct classifications during the verification step. Not recommended for classification tasks.
Step-back prompting (step_back_prompt): Results were inconsistent — slight gains for weaker models (~~+1.8 pp) but slight losses for stronger models (~~-0.5 pp), with high variance across surveys. Not recommended as a default strategy.
Context prompting (context_prompt): Adds generic expert context to the prompt. No consistent benefit observed.

Summary

The most effective approach is straightforward: write detailed category descriptions, include an "Other" category, and use a capable model at low temperature. Advanced prompting strategies add complexity and cost without reliable gains for classification tasks.

Configuration

Get Your API Key

Get an API key from your preferred provider:

OpenAI: platform.openai.com
Anthropic: console.anthropic.com
Google: aistudio.google.com
Huggingface: huggingface.co/settings/tokens
xAI: console.x.ai
Mistral: console.mistral.ai
Perplexity: perplexity.ai/settings/api

Most providers require adding a payment method and purchasing credits. Store your key securely and never share it publicly.

Supported Models

OpenAI: GPT-4o, GPT-4, GPT-5, etc.
Anthropic: Claude Sonnet 4, Claude 3.5 Sonnet, Claude Haiku, etc.
Google: Gemini 2.5 Flash, Gemini 2.5 Pro, etc.
Huggingface: Qwen, Llama 4, DeepSeek, and thousands of community models
xAI: Grok models
Mistral: Mistral Large, Pixtral, etc.
Perplexity: Sonar Large, Sonar Small, etc.

Fully Tested:

OpenAI (GPT-4, GPT-4o, GPT-5, etc.)
Anthropic (Claude Sonnet 4, Claude 3.5 Sonnet, Haiku)
Perplexity (Sonar models)
Google Gemini - Free tier has severe rate limits (5 RPM). Requires Google AI Studio billing account for large-scale use.
Huggingface - Access to Qwen, Llama 4, DeepSeek, and thousands of user-trained models for specific tasks. API routing can occasionally be unstable.
xAI (Grok models)
Mistral (Mistral Large, Pixtral, etc.)

Note: For best results, I recommend starting with OpenAI or Anthropic.

API Reference

Note: The functions documented below are the domain-neutral versions from cat-stack. They work with any text, image, or PDF data without domain-specific framing. Domain packages (cat-survey, cat-vader, cat-ademic, etc.) accept all the same parameters and add domain-specific ones on top (e.g., survey_question=, platform=, journal_issn=). See Domain Packages for details.

`classify()`

Unified classification function for text, image, and PDF inputs. Input type is auto-detected from your data—no need to specify whether you're classifying text, images, or PDFs.

Supports both single-model and multi-model ensemble classification for improved accuracy through consensus voting.

Parameters:

input_data: The data to classify. Can be:
- Text: list of strings or pandas Series
- Images: directory path, single file, or list of image paths
- PDFs: directory path, single file, or list of PDF paths
categories (list): List of category names for classification
api_key (str): API key for the LLM service (single-model mode)
description (str): Description of the input data context
user_model (str, default="gpt-4o"): Model to use
mode (str, default="image"): PDF processing mode - "image", "text", or "both"
creativity (float, optional): Temperature setting (0.0-1.0)
survey_question (str, default=""): The survey question respondents were asked. Provides important context for classification.
safety (bool, default=False): Save progress after each row. If the process fails midway, you won't lose your work. Requires filename.
chain_of_thought (bool, default=False): Enable step-by-step reasoning within a single prompt. Low cost increase.
context_prompt (bool, default=False): Add expert analyst role to the prompt. Minimal cost increase.
step_back_prompt (bool, default=False): Ask the model to consider broader conceptual background before classifying. Moderate cost increase.
chain_of_verification (bool, default=False): Multi-prompt verification loop where the model checks its own work. High cost increase (3-5x).
example1–example6 (str, optional): Few-shot examples to guide classification (up to 6).
filename (str, optional): Output filename for CSV
save_directory (str, optional): Directory to save results
model_source (str, default="auto"): Provider - "auto", "openai", "anthropic", "google", "mistral", "perplexity", "huggingface", "xai"
multi_label (bool, default=True): If True, multiple categories can be assigned per input (multi-label). If False, the model picks the single best category (single-label). Output format is unchanged—still one 0/1 column per category.
categories_per_call (int, default=None): Maximum number of categories per LLM call. When set, splits the category list into chunks, runs a separate call per chunk, and merges results. Each chunk automatically gets a temporary "Other" catch-all to improve accuracy. A unified "Other" column is added to the output when all real categories are 0 but at least one chunk's "Other" fired. Useful for large category sets (20+). Not supported with batch_mode=True.
models (list, optional): For multi-model ensemble, list of (model, provider, api_key) or (model, provider, api_key, config_dict) tuples
consensus_threshold (str or float, default="unanimous"): Agreement threshold for ensemble mode. Options: "unanimous" (100%, default — best accuracy), "majority" (50%), "two-thirds" (67%), or a custom float between 0 and 1.
parallel (bool, default=None): Controls concurrent vs sequential model execution in ensemble mode. None (default) auto-detects: sequential for local models (Ollama), parallel for cloud providers. Set True to force parallel or False to force sequential. Sequential mode is useful for resource-constrained environments or debugging.
batch_mode (bool, default=False): (Experimental) Submit the entire job as an async batch request instead of making synchronous calls. Supported providers: OpenAI, Anthropic, Google, Mistral, xAI. Reduces API costs by ~50%. Works with both single-model and multi-model ensemble (each model submits its own batch job concurrently; providers without batch API fall back to synchronous calls). Not compatible with PDF/image inputs. The function blocks until the batch completes.
batch_poll_interval (float, default=30.0): Seconds between status polls when batch_mode=True.
batch_timeout (float, default=86400.0): Maximum seconds to wait for a batch job before raising BatchJobExpiredError.
embeddings (bool, default=False): Add embedding-based similarity scores alongside binary 0/1 classifications. Adds category_N_similarity columns (0–1 float) using a local sentence-transformer model (BAAI/bge-small-en-v1.5, ~130MB). Text input only (skipped for PDF/image). Requires pip install cat-llm[embeddings].
category_descriptions (dict, optional): Richer text descriptions per category for embedding similarity (e.g., {"Past_Support": "References to help received from family"}). Only used when embeddings=True.
embedding_tiebreaker (bool, default=False): When ensemble consensus produces a tie (equal votes for 0 and 1), use embedding centroid similarity to break the tie. Requires pip install cat-llm[embeddings].
min_centroid_size (int, default=3): Minimum number of confirmed-positive responses needed to build a reliable centroid for embedding_tiebreaker. If fewer positives exist, falls back to raw similarity against the category text.
json_formatter (bool, default=False): Use a local fine-tuned model to fix malformed JSON output before marking responses as failed. The formatter runs only when extract_json() produces invalid output—zero cost on the happy path. On first use, the model (~1GB) is downloaded from HuggingFace Hub. Requires pip install cat-llm[formatter].
add_other (str or bool, default="prompt"): Controls auto-addition of an "Other" catch-all category. "prompt" asks the user, True adds silently, False never adds.
check_verbosity (bool, default=True): Check whether categories have descriptions and examples (1 API call). Set to False to skip.
use_json_schema (bool, default=True): Use structured JSON schema for LLM output. Set to False for providers that don't support it well.
max_categories (int, default=12): Maximum categories for auto-extraction when categories="auto".
categories_per_chunk (int, default=10): Categories to extract per chunk during auto-extraction.
divisions (int, default=10): Number of chunks to divide data into during auto-extraction.
research_question (str, optional): Research context to guide classification.
row_delay (float, default=0.0): Seconds to wait between processing each row. Useful for rate-limited APIs (e.g., Google free tier at 5 RPM).
max_retries (int, default=5): Maximum number of retries for failed API calls per row.
retry_delay (float, default=1.0): Base delay in seconds between retries (uses exponential backoff).
fail_strategy (str, default="partial"): How to handle rows that fail after all retries. "partial" returns results with failed rows marked; "strict" raises an error on any failure.
max_workers (int, default=None): Maximum parallel workers for API calls. None auto-selects.
auto_download (bool, default=False): Automatically download missing Ollama models without prompting.
progress_callback (callable, optional): Callback function for progress updates. Called as progress_callback(current_step, total_steps).
pdf_dpi (int, default=150): DPI resolution for rendering PDF pages as images. Higher values improve quality but increase processing time and cost.
thinking_budget (int, default=0): Token budget for model reasoning/thinking. Set to 0 to disable. Behavior varies by provider:

Provider	`thinking_budget=0`	`thinking_budget > 0` (e.g., 8192)
OpenAI	`reasoning_effort="minimal"`	`reasoning_effort="high"`
Anthropic	Thinking disabled	Extended thinking enabled (min 1024 tokens, forces temperature=1)
Google	Thinking disabled	`thinkingConfig: {thinkingBudget: N}` (min 128 tokens)
HuggingFace	Thinking disabled (or use non-thinking model variant)	Thinking enabled (or use thinking model variant)

Note: Mistral and xAI models do not have reasoning/thinking toggles — thinking_budget has no effect on these providers. For Qwen3 on HuggingFace, reasoning is controlled by choosing the model variant: use Qwen3-VL-235B-A22B-Thinking for reasoning or Qwen3-VL-235B-A22B-Instruct for standard mode with thinking_budget=0.

Returns:

pandas.DataFrame: Classification results with category columns

Examples:

import catllm as cat

# Text classification (auto-detected)
results = cat.classify(
    input_data=df['responses'],
    categories=["Positive feedback", "Negative feedback", "Neutral"],
    description="Customer satisfaction survey",
    api_key=api_key
)

# Image classification (auto-detected from file paths)
results = cat.classify(
    input_data="/path/to/images/",
    categories=["Contains person", "Outdoor scene", "Has text"],
    description="Product photos",
    api_key=api_key
)

# PDF classification (auto-detected, processes each page separately)
results = cat.classify(
    input_data="/path/to/reports/",
    categories=["Contains table", "Has chart", "Is summary page"],
    description="Financial reports",
    mode="both",  # Use both image and extracted text
    api_key=api_key
)

# Single-label classification (pick one best category per response)
results = cat.classify(
    input_data=df['responses'],
    categories=["Positive", "Negative", "Neutral"],
    multi_label=False,
    api_key=api_key
)

# Multi-model ensemble for higher accuracy
results = cat.classify(
    input_data=df['responses'],
    categories=["Positive", "Negative", "Neutral"],
    models=[
        ("gpt-4o", "openai", "sk-..."),
        ("claude-sonnet-4-20250514", "anthropic", "sk-ant-..."),
        ("gemini-2.5-flash", "google", "AIza..."),
    ],
    consensus_threshold="unanimous",
)

Multi-Model Ensemble:

When you provide the models parameter, CatLLM runs classification across multiple models and combines results using majority voting. This can significantly improve accuracy by reducing individual model biases. By default, cloud models run in parallel while local models (Ollama) run sequentially — controlled by the parallel parameter.

The output includes:

Individual model predictions (e.g., category_1_gpt_4o, category_1_claude)
Consensus columns (e.g., category_1_consensus)
Agreement scores showing how many models agreed

`extract()`

Unified category extraction function for text, image, and PDF inputs. Automatically discovers categories in your data when you don't have a predefined scheme.

Planned improvement: Allow specifying a separate, more powerful model for the semantic merge step (e.g., use GPT-4o-mini for bulk extraction, GPT-4o for the final consolidation). This "tiered" approach could improve merge quality without significantly increasing cost.

Parameters:

input_data: The data to explore (text list, image paths, or PDF paths)
api_key (str): API key for the LLM service
input_type (str, default="text"): Type of input - "text", "image", or "pdf"
survey_question (str, default=""): The survey question or description of the data. Provides context for category discovery.
description (str, optional): Deprecated alias for survey_question. Use survey_question instead.
max_categories (int, default=12): Maximum number of final categories to return
categories_per_chunk (int, default=10): Categories to extract per chunk
divisions (int, default=12): Number of chunks to divide data into
iterations (int, default=8): Number of extraction passes over the data
user_model (str, default="gpt-4o"): Model to use
model_source (str, default="auto"): Provider - "auto", "openai", "anthropic", "google", etc.
creativity (float, optional): Temperature setting (0.0-1.0). None uses model default.
specificity (str, default="broad"): "broad" or "specific" category granularity
research_question (str, optional): Research context to guide extraction
focus (str, optional): Focus instruction for category extraction (e.g., "emotional responses")
mode (str, default="text"): Processing mode for non-text inputs - "text", "image", or "both"
filename (str, optional): Output filename for CSV
random_state (int, optional): Random seed for reproducibility of chunk sampling
chunk_delay (float, default=0.0): Seconds to wait between processing each chunk. Useful for rate-limited APIs.
auto_download (bool, default=False): Automatically download missing Ollama models without prompting.
progress_callback (callable, optional): Callback function for progress updates.

Default parameter rationale: The defaults of divisions=12 and iterations=8 were determined through empirical analysis. We ran a 6x6 grid search over [1, 4, 8, 12, 16, 20] for both parameters, repeating each combination 10 times and measuring pairwise Jaro-Winkler consistency across runs. Consistency peaked at 12 divisions and 8 iterations, with values beyond this point offering no meaningful improvement.

Returns:

dict with keys:
- counts_df: DataFrame of categories with counts
- top_categories: List of top category names
- raw_top_text: Raw model output

Example:

import catllm as cat

# Extract categories from survey responses
results = cat.extract(
    input_data=df['responses'],
    survey_question="Why did you move?",
    api_key=api_key,
    max_categories=10,
    focus="decisions to relocate"  # Optional focus
)

print(results['top_categories'])
# ['Employment opportunity', 'Family reasons', 'Cost of living', ...]

`explore()`

Raw category extraction for frequency and saturation analysis. Unlike extract(), which normalizes, deduplicates, and semantically merges categories into a clean final set, explore() returns every category string from every chunk across every iteration — with duplicates intact.

This is useful for analyzing which categories are robust (consistently discovered across runs) versus which are noise (appearing only once or twice). By increasing iterations, you can build saturation curves showing when category discovery converges.

Parameters:

input_data: List of text responses or pandas Series
api_key (str): API key for the LLM service
description (str): The survey question or description of the data
max_categories (int, default=12): Maximum categories passed through to the extraction prompt.
categories_per_chunk (int, default=10): Categories to extract per chunk
divisions (int, default=12): Number of chunks to divide data into
user_model (str, default="gpt-4o"): Model to use
model_source (str, default="auto"): Provider - "auto", "openai", "anthropic", "google", etc.
creativity (float, optional): Temperature setting (0.0-1.0). None uses model default.
specificity (str, default="broad"): "broad" or "specific" category granularity
research_question (str, optional): Research context to guide extraction
focus (str, optional): Focus instruction (e.g., "decisions to relocate")
iterations (int, default=8): Number of passes over the data
random_state (int, optional): Random seed for reproducibility
filename (str, optional): Output CSV filename (one category per row)
chunk_delay (float, default=0.0): Seconds to wait between processing each chunk. Useful for rate-limited APIs.
auto_download (bool, default=False): Automatically download missing Ollama models without prompting.
progress_callback (callable, optional): Callback function for progress updates.

Returns:

list[str]: Every category extracted from every chunk across every iteration. Length ≈ iterations × divisions × categories_per_chunk.

Example:

import catllm as cat

# Run extraction with many iterations for saturation analysis
raw_categories = cat.explore(
    input_data=df['responses'],
    description="Why did you move?",
    api_key=api_key,
    iterations=20,
    divisions=5,
    categories_per_chunk=10,
)

# Count how often each category appears across runs
from collections import Counter
counts = Counter(raw_categories)
for category, freq in counts.most_common(15):
    print(f"{freq:3d}x  {category}")

`summarize()`

Unified summarization function for text and PDF inputs. Generates concise summaries of survey responses, documents, or any text data. Input type is auto-detected from your data.

Supports both single-model and multi-model ensemble summarization. In multi-model mode, summaries from all models are synthesized into a consensus summary.

Parameters:

input_data: The data to summarize. Can be:
- Text: list of strings, pandas Series, or single string
- PDF: directory path, single PDF path, or list of PDF paths
api_key (str): API key for the LLM service (single-model mode)
description (str): Description of what the content contains (provides context)
instructions (str): Specific summarization instructions (e.g., "bullet points")
max_length (int): Maximum summary length in words
focus (str): What to focus on (e.g., "main arguments", "emotional content")
user_model (str, default="gpt-4o"): Model to use
model_source (str, default="auto"): Provider - "auto", "openai", "anthropic", "google", etc.
creativity (float, optional): Temperature setting (0.0-1.0). None uses model default.
thinking_budget (int, default=0): Token budget for extended thinking/reasoning. See classify() for provider-specific behavior.
chain_of_thought (bool, default=True): Enable step-by-step reasoning. On by default for summarization.
context_prompt (bool, default=False): Add expert analyst role to the prompt.
step_back_prompt (bool, default=False): Ask the model to consider broader context before summarizing.
mode (str, default="image"): PDF processing mode:
- "image": Render pages as images (best for visual documents)
- "text": Extract text only (faster, good for text-heavy PDFs)
- "both": Send both image and extracted text (most comprehensive)
filename (str): Output CSV filename
save_directory (str): Directory to save results
pdf_dpi (int, default=150): DPI resolution for rendering PDF pages as images.
models (list): For multi-model mode, list of (model, provider, api_key) tuples
parallel (bool, default=None): Controls concurrent vs sequential model execution. None auto-detects (sequential for Ollama, parallel for cloud).
max_workers (int, default=None): Maximum parallel workers for API calls.
auto_download (bool, default=False): Automatically download missing Ollama models without prompting.
progress_callback (callable, optional): Callback function for progress updates.
safety (bool, default=False): If True, saves progress to CSV after each item. Requires filename.
max_retries (int, default=5): Max retries per API call.
batch_retries (int, default=2): Number of batch retry passes for failed items.
retry_delay (float, default=1.0): Delay between retries in seconds.
row_delay (float, default=0.0): Delay in seconds between processing each row. Useful to avoid rate limits.
fail_strategy (str, default="partial"): How to handle failures — "partial" keeps successful results, "strict" blanks the row if any model fails.
batch_mode (bool, default=False): If True, use async batch API (50% cost savings). Supported providers: openai, anthropic, google, mistral, xai. Not compatible with PDF input.
batch_poll_interval (float, default=30): Seconds between batch job status checks.
batch_timeout (float, default=86400): Max seconds to wait for batch completion (default 24h).

Returns:

pandas.DataFrame: Results with summary columns:
- survey_input: Original text or page label (for PDFs)
- summary: Generated summary (or consensus for multi-model)
- processing_status: "success", "error", "skipped"
- pdf_path: Path to source PDF (PDF mode only)
- page_index: Page number, 0-indexed (PDF mode only)

Examples:

import catllm as cat

# Single model text summarization
results = cat.summarize(
    input_data=df['responses'],
    description="Customer feedback",
    api_key=api_key
)

# PDF summarization (auto-detected from file paths)
results = cat.summarize(
    input_data="/path/to/pdfs/",
    description="Research papers",
    mode="image",
    api_key=api_key
)

# PDF summarization with specific files and focus
results = cat.summarize(
    input_data=["doc1.pdf", "doc2.pdf"],
    description="Financial reports",
    mode="both",
    focus="key metrics and trends",
    max_length=100,
    api_key=api_key
)

# With safety saves and row delay
results = cat.summarize(
    input_data=df['responses'],
    description="Customer feedback",
    api_key=api_key,
    safety=True,
    filename="results.csv",
    row_delay=1.0,
)

# Batch mode (50% cost savings)
results = cat.summarize(
    input_data=df['responses'],
    description="Customer feedback",
    api_key=api_key,
    batch_mode=True,
    filename="batch_results.csv",
)

# Multi-model with synthesis
results = cat.summarize(
    input_data=df['responses'],
    models=[
        ("gpt-4o", "openai", "sk-..."),
        ("claude-sonnet-4-20250514", "anthropic", "sk-ant-..."),
    ],
)

`image_score_drawing()`

Performs quality scoring of images against a reference description and optional reference image, returning structured results with optional CSV export.

Methodology: Processes each image individually, assigning a drawing quality score on a 5-point scale based on similarity to the expected description:

1: No meaningful similarity (fundamentally different)
2: Barely recognizable similarity (25% match)
3: Partial match (50% key features)
4: Strong alignment (75% features)
5: Near-perfect match (90%+ similarity)

Parameters:

reference_image_description (str): A description of what the model should expect to see
image_input (list): List of image file paths or folder path containing images
reference_image (str): A file path to the reference image
api_key (str): API key for the LLM service
user_model (str, default="gpt-4o"): Specific vision model to use
creativity (float, default=0): Temperature/randomness setting (0.0-1.0)
safety (bool, default=False): Enable safety checks and save results at each API call step
filename (str, default="image_scores.csv"): Filename for CSV output
save_directory (str, optional): Directory path to save the CSV file
model_source (str, default="OpenAI"): Model provider

Returns:

pandas.DataFrame: DataFrame with image paths, quality scores, and analysis details

Example:

import catllm as cat

image_scores = cat.image_score_drawing(
    reference_image_description='A hand-drawn circle',
    image_input=['image1.jpg', 'image2.jpg', 'image3.jpg'],
    user_model="gpt-4o",
    api_key="OPENAI_API_KEY"
)

`image_features()`

Extracts specific features and attributes from images, returning exact answers to user-defined questions (e.g., counts, colors, presence of objects).

Methodology: Processes each image individually using vision models to extract precise information about specified features. Unlike scoring and classification functions, this returns factual data such as object counts, color identification, or presence/absence of specific elements.

Parameters:

image_description (str): A description of what the model should expect to see
image_input (list): List of image file paths or folder path containing images
features_to_extract (list): List of specific features to extract (e.g., ["number of people", "primary color", "contains text"])
api_key (str): API key for the LLM service
user_model (str, default="gpt-4o"): Specific vision model to use
creativity (float, default=0): Temperature/randomness setting (0.0-1.0)
to_csv (bool, default=False): Whether to save the output to a CSV file
safety (bool, default=False): Enable safety checks and save results at each API call step
filename (str, default="categorized_data.csv"): Filename for CSV output
save_directory (str, optional): Directory path to save the CSV file
model_source (str, default="OpenAI"): Model provider

Returns:

pandas.DataFrame: DataFrame with image paths and extracted feature values

Example:

import catllm as cat

features = cat.image_features(
    image_description='Product photos from e-commerce site',
    features_to_extract=['number of items', 'primary color', 'has price tag'],
    image_input='/path/to/images/',
    user_model="gpt-4o",
    api_key="OPENAI_API_KEY"
)

`cerad_drawn_score()`

Automatically scores drawings of circles, diamonds, overlapping rectangles, and cubes according to the official Consortium to Establish a Registry for Alzheimer's Disease (CERAD) scoring system.

Methodology: Processes each image individually, evaluating the drawn shapes based on CERAD criteria. Works even with images that contain other drawings or writing.

Parameters:

shape (str): The type of shape to score ("circle", "diamond", "rectangles", "cube")
image_input (list): List of image file paths or folder path containing images
api_key (str): API key for the LLM service
user_model (str, default="gpt-4o"): Specific model to use
creativity (float, default=0): Temperature/randomness setting (0.0-1.0)
safety (bool, default=False): Enable safety checks and save results at each API call step
filename (str, optional): Filename for CSV output
model_source (str, default="auto"): Model provider

Returns:

pandas.DataFrame: DataFrame with image paths, CERAD scores, and analysis details

Example:

import catllm as cat

diamond_scores = cat.cerad_drawn_score(
    shape="diamond",
    image_input=df['diamond_pic_path'],
    api_key=api_key,
    safety=True,
    filename="diamond_scores.csv",
)

Deprecated Functions

The following functions are deprecated and will be removed in a future version. Please use classify() instead, which auto-detects input type and supports all the same features.

Deprecated Function	Replacement
`multi_class()`	`classify(input_data=texts, ...)`
`image_multi_class()`	`classify(input_data=images, ...)`
`pdf_multi_class()`	`classify(input_data=pdfs, ...)`
`explore_corpus()`	`extract(input_data=texts, ...)`
`explore_common_categories()`	`extract(input_data=texts, ...)`

These functions still work but will show deprecation warnings. Migration is straightforward—simply use classify() with your data and it will automatically detect whether you're passing text, images, or PDFs.

Related Projects

Looking for web research capabilities? Check out llm-web-research - a precision-focused LLM-powered web research tool that uses a novel Funnel of Verification (FoVe) methodology to reduce false positives. It's designed for use cases where accuracy matters more than completeness.

pip install llm-web-research

Academic Research

This package implements methodology from research on LLM performance in social science applications, including the UC Berkeley Social Networks Study. The package addresses reproducibility challenges in LLM-assisted research by providing standardized interfaces and consistent output formatting.

If you use this package for research, please cite:

Soria, C. (2025). CatLLM (0.1.0). Zenodo. https://doi.org/10.5281/zenodo.15532317

Testing

To verify that CatLLM is working correctly, you can run a local classification test using Ollama (free, no API keys required):

Install Ollama and pull a small model:

ollama pull llama3.1:8b

Run the local classification example notebook: examples/Classifying Text with Local Models (Ollama).ipynb

This notebook walks through text classification using a local model, verifying that the full pipeline — prompt construction, structured JSON output, and result parsing — works end-to-end without any cloud API calls.

For cloud provider testing, additional example notebooks are available in the examples/ directory covering ensemble classification, category extraction, and summarization.

Contributing & Support

Contributions are welcome! Please see CONTRIBUTING.md for detailed guidelines.

Report bugs or request features: Open a GitHub Issue
Ask questions or get help: GitHub Discussions or Issues
Contribute code: Fork the repo, create a branch, and submit a pull request — see CONTRIBUTING.md
Research collaboration: Email ChrisSoria@Berkeley.edu

License

cat-llm is distributed under the terms of the GNU license.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.1.1

May 12, 2026

3.0.0

Mar 19, 2026

2.7.0

Mar 7, 2026

2.5.1

Mar 1, 2026

2.5.0

Feb 26, 2026

2.4.0

Feb 11, 2026

2.3.4

Feb 11, 2026

2.3.3

Feb 11, 2026

2.3.2

Feb 11, 2026

2.3.1

Feb 10, 2026

2.3.0

Feb 9, 2026

2.2.0

Feb 8, 2026

2.1.0

Jan 30, 2026

2.0.0

Jan 17, 2026

0.1.15

Jan 11, 2026

0.1.14

Jan 11, 2026

0.1.13

Jan 9, 2026

0.1.12

Jan 9, 2026

0.1.11

Jan 9, 2026

0.1.10

Jan 9, 2026

0.1.9

Jan 9, 2026

0.1.8

Jan 7, 2026

0.1.7

Jan 7, 2026

0.1.6

Jan 5, 2026

0.1.4

Jan 3, 2026

0.1.3

Jan 3, 2026

0.1.2

Jan 3, 2026

0.1.1

Dec 30, 2025

0.0.103

Dec 10, 2025

0.0.102

Dec 10, 2025

0.0.101

Nov 5, 2025

0.0.100

Nov 4, 2025

0.0.99

Nov 4, 2025

0.0.98

Oct 29, 2025

0.0.97

Oct 26, 2025

0.0.96

Oct 26, 2025

0.0.95

Oct 25, 2025

0.0.94

Oct 25, 2025

0.0.93

Oct 25, 2025

0.0.92

Oct 25, 2025

0.0.91

Oct 25, 2025

0.0.90

Oct 25, 2025

0.0.89

Oct 25, 2025

0.0.88

Oct 25, 2025

0.0.87

Oct 25, 2025

0.0.85

Oct 24, 2025

0.0.84

Oct 24, 2025

0.0.83

Oct 24, 2025

0.0.82

Oct 23, 2025

0.0.81

Oct 23, 2025

0.0.80

Oct 23, 2025

0.0.79

Oct 23, 2025

0.0.78

Oct 23, 2025

0.0.77

Oct 23, 2025

0.0.76

Oct 23, 2025

0.0.75

Oct 23, 2025

0.0.74

Oct 21, 2025

0.0.73

Oct 21, 2025

0.0.72

Oct 21, 2025

0.0.71

Oct 21, 2025

0.0.70

Oct 21, 2025

0.0.69

Oct 21, 2025

0.0.68

Oct 21, 2025

0.0.67

Oct 20, 2025

0.0.66

Oct 20, 2025

0.0.65

Oct 13, 2025

0.0.64

Oct 13, 2025

0.0.63

Oct 8, 2025

0.0.62

Oct 8, 2025

0.0.61

Oct 7, 2025

0.0.60

Sep 29, 2025

0.0.59

Sep 19, 2025

0.0.58

Sep 19, 2025

0.0.57

Sep 19, 2025

0.0.56

Sep 19, 2025

0.0.55

Sep 19, 2025

0.0.54

Sep 19, 2025

0.0.53

Sep 18, 2025

0.0.52

Sep 18, 2025

0.0.51

Sep 18, 2025

0.0.50

Sep 18, 2025

0.0.43

Aug 8, 2025

0.0.42

Aug 8, 2025

0.0.41

Aug 8, 2025

0.0.40

Aug 8, 2025

0.0.39

Jul 23, 2025

0.0.38

Jun 7, 2025

0.0.37

Jun 7, 2025

0.0.36

Jun 7, 2025

0.0.35

Jun 7, 2025

0.0.34

Jun 7, 2025

0.0.33

Jun 5, 2025

0.0.32

Jun 5, 2025

0.0.31

Jun 5, 2025

0.0.30

Jun 5, 2025

0.0.29

Jun 5, 2025

0.0.28

Jun 5, 2025

0.0.27

Jun 5, 2025

0.0.26

Jun 4, 2025

0.0.25

Jun 1, 2025

0.0.24

Jun 1, 2025

0.0.23

Jun 1, 2025

0.0.22

Jun 1, 2025

0.0.21

Jun 1, 2025

0.0.20

Jun 1, 2025

0.0.19

May 30, 2025

0.0.18

May 30, 2025

0.0.17

May 30, 2025

0.0.16

May 30, 2025

0.0.15

May 30, 2025

0.0.14

May 30, 2025

0.0.13

May 29, 2025

0.0.12

May 29, 2025

0.0.11

May 28, 2025

0.0.10

May 28, 2025

0.0.9

May 28, 2025

0.0.8

May 28, 2025

0.0.7

May 28, 2025

0.0.6

May 27, 2025

0.0.5

May 27, 2025

0.0.4

May 21, 2025

0.0.3

May 21, 2025

0.0.2

May 12, 2025

0.0.1

May 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cat_llm-3.1.1-py3-none-any.whl (30.7 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file cat_llm-3.1.1-py3-none-any.whl.

File metadata

Download URL: cat_llm-3.1.1-py3-none-any.whl
Upload date: May 12, 2026
Size: 30.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for cat_llm-3.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c832e06bd9a4c67c612570337979ef5b27e2ec8c5abe07e47ff82513d99376d`
MD5	`16593e4550f96b44d036c9f34941a618`
BLAKE2b-256	`ff92097d3154369ea5baa16c68ccbd1ddd28f3c8d8d6c3a26faae58d56896953`

See more details on using hashes here.

cat-llm 3.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cat-llm

The Problem

The Solution

Ecosystem

Table of Contents

Installation

Dependencies

R Package

Quick Start

Option A — via meta-package

Option B — direct install (lighter footprint)

Domain Packages

cat-survey — Survey Responses

cat-vader — Social Media

cat-ademic — Academic Papers

cat-pol — Political Text

cat-cog — Cognitive Assessment

cat-web — Web Content

cat-stack — General-Purpose Base

Best Practices for Classification

What works

What doesn't help (or hurts)

Summary

Configuration

Get Your API Key

Supported Models

API Reference

classify()

extract()

explore()

summarize()

image_score_drawing()

image_features()

cerad_drawn_score()

Deprecated Functions

Related Projects

Academic Research

Testing

Contributing & Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

`classify()`

`extract()`

`explore()`

`summarize()`

`image_score_drawing()`

`image_features()`

`cerad_drawn_score()`