A CLI tool for comparing LLM outputs — semantically, visually, and at scale

These details have not been verified by PyPI

Project links

Project description

llm-diff

A CLI tool and Python library for comparing LLM outputs — semantically, visually, and at scale.

llm-diff calls two LLM models in parallel, diffs their responses word-by-word, scores them semantically, and renders results in the terminal or as a self-contained HTML report. It scales to batch workloads, caches API responses, and gates CI pipelines via --fail-under.

What is llm-diff?

LLMs do not produce deterministic output. Evaluating models, iterating on prompts, or assessing the impact of a model upgrade all require you to compare responses — and doing that by hand does not scale.

llm-diff automates the entire workflow: it calls both models concurrently, produces a word-level diff, optionally scores semantic similarity via sentence embeddings, and outputs results to the terminal or as a shareable HTML report. It supports batch workloads from a YAML file, caches API calls so iterating on thresholds costs nothing, and emits exit code 1 when similarity falls below a threshold — making it a first-class citizen in CI/CD pipelines.

Version 1.2 adds LLM-as-a-Judge scoring, per-call USD cost tracking, multi-model (3–4 model) comparison, and structured JSON diff.

Documentation

Guide	Description
Getting Started	Installation, API keys, first diff
CLI Reference	All flags, option groups, exit codes, YAML format
Python API	All public functions, dataclasses, and field descriptions
Configuration	`.llmdiff` TOML schema, env vars, config priority
Provider Setup	OpenAI, Groq, Mistral, Ollama, LM Studio, Anthropic
HTML Reports	Report anatomy, batch reports, judge card, cost table
CI / CD Integration	GitHub Actions examples, threshold recommendations

Quick Start

# Install with semantic scoring support
pip install "llm-diff[semantic]"

# Set an API key
export OPENAI_API_KEY="sk-..."

# Compare two models on the same prompt
llm-diff "Explain recursion in one sentence." -a gpt-4o -b gpt-4o-mini --semantic

# Save a self-contained HTML report
llm-diff "Explain recursion." -a gpt-4o -b gpt-4o-mini --semantic --out report.html

# Run a batch from a YAML prompt file and gate on similarity
llm-diff --batch prompts.yml -a gpt-4o -b gpt-4o-mini --semantic --fail-under 0.85

See Getting Started for more examples including prompt-diff mode, BLEU/ROUGE metrics, LLM-as-a-Judge, cost tracking, and multi-model comparison.

Getting Help


Bug reports	Open an issue
Feature requests	Open a feature request
Questions & discussion	GitHub Discussions
Open issues	github.com/sriramrathinavelu/llm-diff/issues
Roadmap	IMPLEMENTATION_PLAN.md
Changelog	CHANGELOG.md

When filing a bug, please include: llm-diff --version, your OS, Python version, the full command you ran, and the complete error output.

Contributing

See CONTRIBUTING.md for development setup, running the test suite, code style guidelines, and pull request instructions.

License

llm-diff is distributed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.1

Mar 6, 2026

1.3.0

Mar 1, 2026

1.2.3

Mar 1, 2026

1.2.2

Mar 1, 2026

1.2.1

Feb 28, 2026

This version

1.2.0

Feb 28, 2026

1.1.0

Feb 28, 2026

1.0.0

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_diff-1.2.0.tar.gz (51.6 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_diff-1.2.0-py3-none-any.whl (60.3 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file llm_diff-1.2.0.tar.gz.

File metadata

Download URL: llm_diff-1.2.0.tar.gz
Upload date: Feb 28, 2026
Size: 51.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for llm_diff-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`bdeeb1a4d1df8c5b8c5a1fd9edb7717699f08a33ada7ca7035f3bee1ea77a68e`
MD5	`1fab97147d535a0923ddb22cb66abd4f`
BLAKE2b-256	`45513df02659cd0e86e34521b1624f9d531a12eeeadec9e75b35c59f1773fe4b`

See more details on using hashes here.

File details

Details for the file llm_diff-1.2.0-py3-none-any.whl.

File metadata

Download URL: llm_diff-1.2.0-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 60.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for llm_diff-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3e6042d4138b199c6cd043544cca92c82f15422beeecddc84757eac976390c7f`
MD5	`ab09a1bab75eac40248e5ad8bdb9aef5`
BLAKE2b-256	`5da293d627353892b696fce1f146babfa5430e0ac2cac44ff3b77ceb16595f7c`

See more details on using hashes here.

llm-diff 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llm-diff

What is llm-diff?

Documentation

Quick Start

Getting Help

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes