Skip to main content

Structured LLM based classification, clustering and extraction framework that works with all major API providers

Project description

llm-classifier

Structured LLM based classification, clustering and extraction framework that works with all major API providers (see full list).

Why use it

  • Return validated Pydantic models instead of free-form text
  • Add few-shot examples directly in each call (prebuilt prompts - can be customized/replaced with your own)
  • Optionally collect reasoning and confidence to enhance results
  • Reduce variance with consensus voting
  • Run batched predictions with intermediate caches and per-item error capture
  • Mutithreading support to speed up large batches

Installation

pip install llm-classifier

Quickstart

from typing import Literal
from pydantic import BaseModel
from llm_classifier import LLMClassifier


## Can put in any fields you want as long as it is supported by the model's structured output capabilities. Here we just want a simple label.
class Sentiment(BaseModel):
    label: Literal["positive", "negative", "neutral"]


clf = LLMClassifier(model="openai/gpt-4.1-nano")

result = clf.predict(
    input="This movie was amazing!",
    output_schema=Sentiment,
    examples=[
        ("I hated it", Sentiment(label="negative")),
        ("It was okay", Sentiment(label="neutral")),
    ],
    reasoning=True,
    confidence=True,
)

print(result.value.label)   # "positive"
print(result.reasoning)     # Optional[str]
print(result.confidence)    # Optional[float]

How it works

flowchart LR
    A[Input text] --> B[Build prompts<br/>+ schema + examples]
    B --> C[LLM call]
    C --> D[Structured response<br/>+ Pydantic-validated]
    D --> E[Final Result object]
    E --> F[value]
    E --> G[reasoning optional]
    E --> H[confidence optional]

Core API

Single prediction with Consensus features

result = clf.predict(
    input="This is somewhat good",
    output_schema=Sentiment,
    consensus=5,
    consensus_parallel=True,
    max_parallel=3,
)

print(result.value)
print(result.compliant_variants)     # Variants matching selected output
print(result.noncompliant_variants)  # Variants not matching selected output

Batch prediction (Mutiple rows of data)

batch = clf.batch_predict(
    inputs=["Great", "Bad", "Okay"],
    output_schema=Sentiment,
    parallel=True,
    max_parallel=5,
    cache_dir="./.llm_cache",
    cache_key="sentiment_run_2026_02_21",
)

print(batch.successes, batch.failures)
print(batch.values())   # [Sentiment | None, ...]
print(batch.errors)     # [(index, Exception), ...]

Resumable batch cache

When cache_dir is set, each processed index is appended to a cache log so reruns skip already successful items.

  • Cache file: <cache_dir>/<cache_key>.jsonl (defaults to input_cache.jsonl when cache_key is omitted)
  • cache_key requires cache_dir to also be set, otherwise a ValueError is raised

Each .jsonl line is one step record keyed by a SHA-256 hash of the full input configuration (model, text, schema, examples, prompts, settings). On rerun with the same cache_dir + cache_key, already-cached inputs are skipped.

Note: It is your responsibility to track token usage and API costs, this package is simply a framework to make repetetive tasks easier and more robust.

Clustering with LLMCluster

For bulk clustering of many items in a single LLM call, use LLMCluster. This is ideal when you have many rows (e.g., 100 survey responses) and want to group them into high-level clusters without making N separate calls. This approach is recommended for one time analyses for large datasets, where you can enable a strong reasoning model (eg: gpt-5) to get a result with one attempt - it will likely be less reliable than LLMClassifier / row by row approach but faster and potentially cheaper (Depending on the models you compare with).

Basic usage

from pydantic import BaseModel
from llm_classifier import LLMCluster


class ClusterSchema(BaseModel):
    name: str
    summary: str


clusterer = LLMCluster(model="openai/gpt-4.1")

surveys = [
    "The product quality is excellent!",
    "Shipping was too slow",
    "Great customer service",
    "Product broke after one week",
    "Fast delivery, very happy",
    "Support team was unhelpful",
]

# cluster() requires (index, text) tuples — use enumerate to build them
indexed_surveys = list(enumerate(surveys, 1))

result = clusterer.cluster(
    inputs=indexed_surveys,
    cluster_schema=ClusterSchema,
)

for cluster in result.clusters:
    print(f"\n{cluster.cluster.name}: {cluster.cluster.summary}")
    for idx, text in cluster.references:
        print(f"  [{idx}] {text}")

How it works

flowchart LR
    A[N input items] --> B[Auto-assign fake IDs for the LLM<br/>1...N]
    B --> C[Single LLM call]
    C --> D[Structured clusters\nwith item_ids]
    D --> E[Post-hoc customizable validation]
    E -->|Pass| F[ClusterResult]
    F -->H[List of Cluster]
    F -->I[Validation retries used]
    F -->J[Original LLM response for debugging]
    H -->|Each cluster| K[Cluster fields as per schema + reference_ids]
    E -->|Fail, Retry with feedback| C

Cluster schema

Define a Pydantic model for per-cluster fields. A reference_ids: list[int] field is automatically injected at runtime to track which items belong to each cluster — you do not need to add it yourself:

class TopicCluster(BaseModel):
    name: str
    description: str
    sentiment: Literal["positive", "negative", "mixed"]

Validation and retries

The clusterer validates LLM responses and retries on failures:

Check Behavior
Invalid ID (outside 1..N) Always fails
Duplicate ID across clusters Fails when allow_overlap=False (default)
Missing ID (item not in any cluster) Fails when require_all=True (default)
Empty cluster Always fails
result = clusterer.cluster(
    inputs=list(enumerate(surveys, 1)),
    cluster_schema=ClusterSchema,
    allow_overlap=False,     # Each item in exactly one cluster
    require_all=True,        # Every item must be assigned
    max_retries=3,           # Instructor retries for malformed JSON
    validation_retries=2,    # Our retries for referential integrity errors
)

print(f"Validation retries used: {result.retries_used}")

Cluster count hint

Let the LLM decide the number of clusters, or provide a hint:

# LLM decides
result = clusterer.cluster(inputs=list(enumerate(surveys, 1)), cluster_schema=ClusterSchema)

# Suggest 3 clusters
result = clusterer.cluster(inputs=list(enumerate(surveys, 1)), cluster_schema=ClusterSchema, n_clusters=3)

# Free text
result = clusterer.cluster(inputs=list(enumerate(surveys, 1)), cluster_schema=ClusterSchema, n_clusters="3-5")

Error handling

from llm_classifier import ClusterValidationError, ContextLengthError

try:
    result = clusterer.cluster(inputs=list(enumerate(huge_list, 1)), cluster_schema=ClusterSchema)
except ContextLengthError as e:
    print(f"Too many items for model context: {e}")
except ClusterValidationError as e:
    print(f"Validation failed after retries: {e.errors}")

Real-world examples

Runnable scripts are in the examples/ folder. Each includes inline data so you only need an API key to run them.

Key Note

The package is built on the exceptional ability of Large Language Models (LLMs) to understand text, context, and perform zero-shot tasks. The underlying models handle the heavy lifting of interpreting input and generating output, while this package provides a structured framework to interact with them in a more robust and reliable way.

As a result, the package inherits the same limitations as the underlying models. If the model struggles to understand a task, or if the input is ambiguous or overly complex, errors or unexpected results may occur. Always validate outputs and consider the model’s capabilities when designing your workflows.

Note that Large Language Models typically exhibit output variance, even with parameters such as temperature=0. Results are not guaranteed to be identical across runs. While features like consensus, examples, and custom prompts can help steer behavior toward more deterministic outcomes, LLMs are inherently non-deterministic systems. Keep this in mind when relying on generated outputs.

Model support

Use any provider/model string supported by instructor.from_provider(...), for example:

  • openai/gpt-4.1
  • anthropic/claude-3-5-sonnet-20241022
  • google/gemini-1.5-pro

License

Apache-2.0 License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_classifier-0.1.2.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_classifier-0.1.2-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file llm_classifier-0.1.2.tar.gz.

File metadata

  • Download URL: llm_classifier-0.1.2.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_classifier-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ab6f1f740c5b61be0a1c0722d068775dc4490f134d760e9f2644c0408e452e5b
MD5 9bcd41a638194f60348dded3cc6c6a90
BLAKE2b-256 44439cac178df71cdf5b2da0b1fc80e9334c727fcd1e2cdede874810a458793b

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_classifier-0.1.2.tar.gz:

Publisher: publish.yml on Fir121/llm-classifier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_classifier-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: llm_classifier-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_classifier-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e54ccd77dcbeaae1861b69c9eca3598d6c3fafd30b7234defc796eae13d1cac1
MD5 6352ef4c3d0357a10e8b438be6b48a3c
BLAKE2b-256 554f17e7a4f7afa3e1c924217e59c8707bcfe2f815d3b757c81ea223afda8877

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_classifier-0.1.2-py3-none-any.whl:

Publisher: publish.yml on Fir121/llm-classifier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page