Skip to main content

A wrapper around Ollama Python SDK for text classification with constrained output and confidence scoring

Project description

ollama-classifier

A Python wrapper around the Ollama Python SDK for text classification with constrained output and confidence scoring. Now supports multiple inference backends: Ollama, vLLM, SGLang, and llama.cpp.

Features

  • Constrained Output: Uses JSON schema with enum constraints to ensure only valid choices are generated
  • Confidence Scoring: Multi-call evaluation with softmax for calibrated probabilities
  • Sync & Async: Full support for both synchronous and asynchronous operations
  • Batch Processing: Classify multiple texts efficiently
  • Flexible Choices: Support for simple labels or labels with descriptions
  • Custom Prompts: Override the default system prompt for specialized tasks
  • Multiple Backends: Use Ollama, vLLM, SGLang, or llama.cpp as your inference engine (local or remote)

Installation

Core (Ollama only)

pip install ollama-classifier

Or with uv:

uv add ollama-classifier

With additional backends (vLLM, SGLang, llama.cpp)

pip install "ollama-classifier[backends]"

Or with uv:

uv add "ollama-classifier[backends]"

Prerequisites

  • Ollama backend: Ollama installed and running, with a model pulled (e.g., ollama pull llama3.2)
  • vLLM backend: A running vLLM server
  • SGLang backend: A running SGLang server
  • llama.cpp backend: A running llama.cpp server

Quick Start

Ollama (original backend)

from ollama import Client
from ollama_classifier import OllamaClassifier

client = Client()
classifier = OllamaClassifier(client, model="llama3.2")

result = classifier.classify(
    text="I love this product!",
    choices=["positive", "negative", "neutral"]
)

print(f"Prediction: {result.prediction}")
print(f"Confidence: {result.confidence:.2%}")
print(f"Probabilities: {result.probabilities}")

vLLM

from ollama_classifier.backends import VLLMBackend
from ollama_classifier import LLMClassifier

backend = VLLMBackend(
    model="meta-llama/Llama-3.2-3B-Instruct",
    base_url="http://localhost:8000/v1",
)
classifier = LLMClassifier(backend)

result = classifier.classify(
    text="I love this product!",
    choices=["positive", "negative", "neutral"]
)

SGLang

from ollama_classifier.backends import SGLangBackend
from ollama_classifier import LLMClassifier

backend = SGLangBackend(
    model="meta-llama/Llama-3.2-3B-Instruct",
    base_url="http://localhost:30000/v1",
)
classifier = LLMClassifier(backend)

result = classifier.classify(
    text="I love this product!",
    choices=["positive", "negative", "neutral"]
)

llama.cpp

from ollama_classifier.backends import LlamaCppBackend
from ollama_classifier import LLMClassifier

backend = LlamaCppBackend(
    model="model",
    base_url="http://localhost:8080/v1",
)
classifier = LLMClassifier(backend)

result = classifier.classify(
    text="I love this product!",
    choices=["positive", "negative", "neutral"]
)

Usage

Basic Classification

from ollama import Client
from ollama_classifier import OllamaClassifier

client = Client()
classifier = OllamaClassifier(client, model="llama3.2")

result = classifier.classify(
    text="The goalkeeper made an incredible save!",
    choices=["sports", "politics", "technology", "entertainment"]
)

Classification with Label Descriptions

Providing descriptions helps the model understand each category better:

choices = {
    "positive": "Text expresses happiness, satisfaction, or approval",
    "negative": "Text expresses anger, disappointment, or disapproval",
    "mixed": "Text contains both positive and negative sentiments",
    "neutral": "Text is factual without strong emotional content",
}

result = classifier.classify(
    text="The food was amazing but the service was terrible.",
    choices=choices
)

Custom System Prompt

result = classifier.classify(
    text="The quarterly earnings exceeded analyst expectations.",
    choices=["bullish", "bearish", "neutral"],
    system_prompt="You are a financial sentiment analyzer. "
                  "Classify financial news based on market sentiment."
)

Scoring (Multi-Call with Softmax)

Get calibrated probability distribution over all choices. Makes N API calls for N choices:

result = classifier.score(
    text="The movie was fantastic!",
    choices=["positive", "negative", "neutral"]
)

Generate Only (Fastest)

When you only need the prediction without confidence scores:

prediction = classifier.generate(
    text="The team won the championship!",
    choices=["sports", "finance", "politics"]
)

Batch Classification

texts = [
    "The goalkeeper made an incredible save!",
    "The central bank raised interest rates.",
    "The new smartphone features a revolutionary camera.",
]

results = classifier.batch_classify(
    texts=texts,
    choices=["sports", "finance", "technology"]
)

for text, result in zip(texts, results):
    print(f"{text} -> {result.prediction} ({result.confidence:.2%})")

Async Usage

import asyncio
from ollama import AsyncClient
from ollama_classifier import OllamaClassifier

async def main():
    client = AsyncClient()
    classifier = OllamaClassifier(client, model="llama3.2")
    
    # Single classification
    result = await classifier.aclassify(
        text="The concert was amazing!",
        choices=["positive", "negative", "neutral"]
    )
    
    # Batch classification (concurrent)
    results = await classifier.abatch_classify(
        texts=["Text 1", "Text 2", "Text 3"],
        choices=["positive", "negative", "neutral"]
    )

asyncio.run(main())

Inference Backends

Ollama (default)

The original backend using the Ollama Python SDK. Requires Ollama installed locally.

from ollama import Client
from ollama_classifier import OllamaClassifier

classifier = OllamaClassifier(Client(), model="llama3.2")

vLLM

High-throughput serving engine. Supports local and remote servers.

Local server:

python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-3.2-3B-Instruct \
    --host 0.0.0.0 --port 8000

Connect:

from ollama_classifier.backends import VLLMBackend
from ollama_classifier import LLMClassifier

backend = VLLMBackend(
    model="meta-llama/Llama-3.2-3B-Instruct",
    base_url="http://localhost:8000/v1",
)
classifier = LLMClassifier(backend)

Remote server:

backend = VLLMBackend(
    model="your-model",
    base_url="https://your-vllm-server.com/v1",
    api_key="your-api-key",  # if authentication is required
)

SGLang

Fast serving system for large language models. Supports local and remote servers.

Local server:

python -m sglang.launch_server \
    --model-path meta-llama/Llama-3.2-3B-Instruct \
    --host 0.0.0.0 --port 30000

Connect:

from ollama_classifier.backends import SGLangBackend
from ollama_classifier import LLMClassifier

backend = SGLangBackend(
    model="meta-llama/Llama-3.2-3B-Instruct",
    base_url="http://localhost:30000/v1",
)
classifier = LLMClassifier(backend)

llama.cpp

Lightweight inference via llama-server. Ideal for CPU or mixed CPU/GPU environments.

Local server:

./llama-server -m model.gguf --host 0.0.0.0 --port 8080 -c 4096

Connect:

from ollama_classifier.backends import LlamaCppBackend
from ollama_classifier import LLMClassifier

backend = LlamaCppBackend(
    model="model",
    base_url="http://localhost:8080/v1",
)
classifier = LLMClassifier(backend)

Note: JSON schema constraints and logprobs require llama.cpp to be compiled with the appropriate flags (e.g., LLAMA_JSON_SCHEMA and LLAMA_SUPPORT_LOGPROBS).

Backend Configuration

All backends share common configuration options:

Parameter Default Description
model (required) Model identifier
base_url Engine-specific Base URL of the inference server
api_key "not-needed" API key for authentication
timeout 120.0 Request timeout in seconds
max_tokens 256 Maximum tokens to generate
extra_body {} Extra parameters merged into every request

API Reference

ClassificationResult

@dataclass
class ClassificationResult:
    prediction: str              # The predicted choice label
    confidence: float            # Confidence score (0.0 to 1.0)
    probabilities: Dict[str, float]  # Probability distribution over all choices
    raw_response: Dict           # Raw response for debugging

OllamaClassifier Methods

Method Async Description
generate(text, choices, system_prompt) agenerate Constrained output only (fastest)
score(text, choices, system_prompt) ascore Multi-call evaluation with softmax
classify(text, choices, system_prompt) aclassify Full classification with confidence scores
batch_generate(texts, choices, system_prompt) abatch_generate Batch constrained output
batch_score(texts, choices, system_prompt) abatch_score Batch scoring
batch_classify(texts, choices, system_prompt) abatch_classify Batch classification

LLMClassifier Methods

LLMClassifier exposes the same API as OllamaClassifier but accepts any LLMBackend:

Method Async Description
generate(text, choices, system_prompt) agenerate Constrained output only (fastest)
score(text, choices, system_prompt) ascore Multi-call evaluation with softmax
classify(text, choices, system_prompt) aclassify Full classification with confidence scores
batch_generate(texts, choices, system_prompt) abatch_generate Batch constrained output
batch_score(texts, choices, system_prompt) abatch_score Batch scoring
batch_classify(texts, choices, system_prompt) abatch_classify Batch classification

Parameters

  • text (str): The text to classify
  • texts (List[str]): List of texts to classify (batch methods)
  • choices (Union[List[str], Dict[str, str]]): Either a list of choice labels, or a dict mapping labels to descriptions
  • system_prompt (str | None): Optional custom system prompt

Choosing a Method

Use Case Recommended Method
Speed is critical, no confidence needed generate
Accurate confidence scores classify / score
Batch processing batch_classify or batch_score
Concurrent processing Async variants (aclassify, etc.)

License

MIT License

Development

This project just started! Looking forward to suggestions, issues, and pull requests!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ollama_classifier-0.3.0.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ollama_classifier-0.3.0-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file ollama_classifier-0.3.0.tar.gz.

File metadata

  • Download URL: ollama_classifier-0.3.0.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ollama_classifier-0.3.0.tar.gz
Algorithm Hash digest
SHA256 1f7c5ebc1038a601161a2ec1326c62267481b9ad9dbb72d0f1e21f4095366bb1
MD5 90459b9253f6d056eb18e84604c04ea8
BLAKE2b-256 10a3e25d4505aa11691faeb69773ce2b587ad7b4f7d6617dd848037d8fd38508

See more details on using hashes here.

Provenance

The following attestation bundles were made for ollama_classifier-0.3.0.tar.gz:

Publisher: python-publish.yml on paluigi/ollama-classifier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ollama_classifier-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ollama_classifier-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 42add244dba464fa59e9c81f6977fa5c80cc688fbd0b9e5b597d029363bf59b5
MD5 6de2a25241060fb6d3308c26521adbc0
BLAKE2b-256 32b4dcea9e38b97307f6ddbf8cce06b49afcdf9499df80d1038a62d6d8cc7ac3

See more details on using hashes here.

Provenance

The following attestation bundles were made for ollama_classifier-0.3.0-py3-none-any.whl:

Publisher: python-publish.yml on paluigi/ollama-classifier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page