LLM call optimization across various providers (online and local models)

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Programming Language

Project description

Valtron Core

A Python framework for evaluating and optimizing LLM calls across any provider.

Proudly built and backed by InferLink.

Website | Documentation | Quick Start | Examples | Contributing

Valtron lets you run the same task across multiple LLMs simultaneously, then compare them on accuracy, cost, and speed. Define a prompt, a labeled dataset, and the models you want to test — Valtron evaluates each one, scores the results, and produces an interactive HTML/PDF report with an AI recommendation.

Features

Multi-model comparison — run GPT-4o, Claude, Gemini, Llama, and any LiteLLM-compatible model side-by-side on the same dataset
Detailed evaluation — label classification (string match) and structured extraction (nested JSON with field-level precision/recall/F1)
Prompt optimization — seven built-in strategies (few-shot generation, chain-of-thought, decomposition, hallucination filtering, and more) applied per model
Cost and latency tracking — per-document and aggregate cost, response time, and accuracy metrics
HTML and PDF reports — interactive charts, per-document breakdowns, and an AI-generated recommendation
Transformer training — train a local DistilBERT classifier from your evaluation data for zero-cost inference
Self-hosted model support — Ollama, vLLM, LM Studio, and more for free, private inference

Quick Start

Prerequisites

Python 3.12+
Poetry for dependency management
Docker (optional)

Install

pip install valtron-core

Or with Poetry:

poetry install

Copy .env.example to .env and add at least one provider key:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...

Run an evaluation

from valtron_core.recipes import ModelEval

data = [
    {"id": "1", "content": "Fast shipping, exactly what I ordered.", "label": "positive"},
    {"id": "2", "content": "Wrong item sent. Refund process was painful.", "label": "negative"},
    {"id": "3", "content": "Average experience, nothing special.", "label": "neutral"},
]

config = {
    "models": [
        {"name": "gpt-4o-mini"},
        {"name": "claude-haiku-4-5-20251001"},
    ],
    "prompt": "Classify the sentiment of the following review as positive, negative, or neutral.\n\nReview: {document}\n\nSentiment:",
    "output_dir": "./results",
}

experiment = ModelEval(config=config, data=data)
report_path = experiment.run()
print(f"Report: {report_path}")

Open ./results/evaluation_report.html to see accuracy, cost, latency, and a recommendation.

Configuration

The config accepts a dict, a JSON file path, or a typed ModelEvalConfig object. Key fields:

Field	Required	Description
`models`	Yes	Models to evaluate; each has a `name` and optional `prompt_manipulation` list
`prompt`	Yes	Prompt template containing `{document}`
`output_dir`	No	Where to write results (can also be passed to `run()`)
`few_shot`	No	Settings for few-shot example generation
`output_formats`	No	`["html"]` by default; add `"pdf"` for a PDF report

Prompt manipulation options (set per model via "prompt_manipulation": [...]):

"explanation" — add chain-of-thought reasoning before the answer
"few_shot" — inject generated few-shot examples into the prompt
"decompose" — split the prompt into sequential sub-calls (structured extraction only)
"hallucination_filter" — drop extracted values not grounded in the source text (structured extraction only)
"multi_pass" — call the model twice and merge results (structured extraction only)

See Config Format for the full reference, including field-level metrics, structured extraction, and few-shot settings.

Prefer a guided setup? The Configuration Wizard is a browser-based UI that builds your config file step by step:

poetry run python -m valtron_core.utilities.config_wizard
# or with Docker
docker compose run --rm -p 5000:5000 server poetry run python -m valtron_core.utilities.config_wizard

Open http://localhost:5000 in your browser.

Examples

Script	What it demonstrates
`examples/sentiment_classification.py`	Label classification across two models
`examples/affiliation_extraction.py`	Structured extraction with field-level metrics
`examples/transformer_comparison.py`	Train a local transformer and compare against cloud LLMs
`examples/multimodal_molecules.py`	Multimodal classification with image attachments
`examples/incremental_evaluation.py`	Load a prior run and add new models without re-evaluating

# Run any example
poetry run python examples/sentiment_classification.py

# With Docker
docker compose run --rm server poetry run python examples/sentiment_classification.py

See Examples for detailed walkthroughs.

Docker

# Build the image (first run is slow; installs system dependencies including LaTeX)
docker compose build server

# Run an example
docker compose run --rm server poetry run python examples/sentiment_classification.py

# Interactive shell
docker compose run --rm --service-ports server bash

# Run the test suite
docker compose run --rm pipeline-test

Using Ollama with Docker? Set OLLAMA_API_BASE=http://host.docker.internal:11434 in .env so the container can reach Ollama running on your host.

Development

# Install dependencies
poetry install

# Run tests
poetry run pytest

# Run with coverage
poetry run pytest --cov=src/valtron_core --cov-report=html

# Format and lint
poetry run black src/ tests/
poetry run ruff check src/ tests/
poetry run mypy src/

The test suite covers the evaluation pipeline, report generation, prompt optimizers, few-shot generation, transformer training, and the ModelEval recipe. The Docker Compose pipeline-test service runs the full suite with coverage and JUnit XML output.

Architecture

Valtron is built on LiteLLM, which provides a unified interface to 100+ LLM providers. On top of that, Valtron adds the evaluation pipeline, prompt optimization strategies, report generation, and transformer training:

Input data (documents + expected labels)
    ↓
Config (models, prompt, optimizations)
    ↓
ModelEval.run()
    ├── Generate few-shot examples (optional)
    ├── Prepare per-model prompts (apply manipulations)
    ├── Evaluate all models concurrently
    ├── Compute metrics (accuracy, cost, time, field scores)
    └── Generate report (HTML + optional PDF)
    ↓
evaluation_report.html  ·  models/*.json  ·  metadata.json

Full documentation lives in docs/valtron/. To run the docs site locally:

cd docs/valtron && docker compose up

Then open http://localhost:3000.

Contributing

Fork the repository
Create a feature branch
Make your changes and run the test suite
Submit a pull request

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Programming Language

Release history Release notifications | RSS feed

0.1.8

May 1, 2026

0.1.7

Apr 29, 2026

0.1.6

Apr 28, 2026

0.1.5

Apr 28, 2026

0.1.4

Apr 24, 2026

This version

0.1.3

Apr 24, 2026

0.1.2

Apr 23, 2026

0.1.1

Apr 23, 2026

0.1.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valtron_core-0.1.3.tar.gz (164.8 kB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

valtron_core-0.1.3-py3-none-any.whl (181.6 kB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file valtron_core-0.1.3.tar.gz.

File metadata

Download URL: valtron_core-0.1.3.tar.gz
Upload date: Apr 24, 2026
Size: 164.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.3 CPython/3.14.2 Darwin/24.6.0

File hashes

Hashes for valtron_core-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`049bf3ad8151a2be7f68c7e0f33272aaaa40006456f00a1f7028eb6d55c6e48d`
MD5	`37f8d0b72cae20c09ea5740287d9cf12`
BLAKE2b-256	`4e280ab8ffe8fe0ecfd23740c06cdf327f29b9ab5026d26b27a6e42d45f558ae`

See more details on using hashes here.

File details

Details for the file valtron_core-0.1.3-py3-none-any.whl.

File metadata

Download URL: valtron_core-0.1.3-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 181.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.3 CPython/3.14.2 Darwin/24.6.0

File hashes

Hashes for valtron_core-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a689ef037f641b25e904ea84ba39c042d017fce43fad4436fcebd04d7dc1733a`
MD5	`b3cba46038caf631a502fa6fd3244e02`
BLAKE2b-256	`1cc865aa43936ffc0d8a1eb4766dc12f930d84e61186b6b0593cd063aeb36b93`

See more details on using hashes here.

valtron-core 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Valtron Core

Website | Documentation | Quick Start | Examples | Contributing

Features

Quick Start

Prerequisites

Install

Run an evaluation

Configuration

Examples

Docker

Development

Architecture

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes