The most comprehensive LLM testing and evaluation framework for Python.

These details have not been verified by PyPI

Project description

checkllm

The pytest of LLM testing.

pip install checkllm

def test_my_llm(check):
    output = my_llm("What is Python?")
    check.contains(output, "programming language")
    check.no_pii(output)
    check.hallucination(output, context="Python is a programming language created by Guido van Rossum.")

That's it. No setup, no boilerplate. The check fixture works in any pytest test.

Why checkllm?

Zero learning curve — if you know pytest, you know checkllm. Just add a check parameter.
33 free checks run instantly with zero API calls. No API key needed to start.
24 LLM-as-judge metrics — hallucination, relevance, faithfulness, bias, toxicity, and more.
Same checks everywhere — use them in tests, CI, and production guardrails.

Quickstart

Install

pip install checkllm
checkllm init --use-case rag  # generates a tailored test file

1. Deterministic checks (free, instant)

def test_basic_quality(check):
    output = my_llm("Summarize this article.")

    check.contains(output, "key finding")
    check.max_tokens(output, limit=200)
    check.no_pii(output)
    check.is_json(output)  # if expecting structured output
    check.regex(output, pattern=r"\d+ results found")

2. LLM-as-judge (deeper evaluation)

def test_rag_quality(check):
    output = my_rag("What causes climate change?")
    context = retrieve_context("climate change")

    check.hallucination(output, context=context)
    check.faithfulness(output, context=context)
    check.relevance(output, query="What causes climate change?")
    check.toxicity(output)

3. Fluent chaining

def test_with_chaining(check):
    output = my_llm("Explain quantum physics simply.")

    check.that(output) \
        .contains("quantum") \
        .max_tokens(200) \
        .has_no_pii() \
        .scores_above("relevance", 0.8, query="quantum physics")

4. Production guardrails

from checkllm import Guard, CheckSpec

guard = Guard(checks=[
    CheckSpec(check_type="no_pii"),
    CheckSpec(check_type="max_tokens", params={"limit": 500}),
    CheckSpec(check_type="toxicity"),
])

result = guard.validate(llm_output)
if not result.valid:
    result.raise_on_failure()

How checkllm compares

Feature	checkllm	DeepEval	Ragas	promptfoo
pytest native	Yes	Yes	No	No
Free deterministic checks	33	Limited	No	Yes
LLM-as-judge metrics	24	14+	8+	Custom
Multi-provider judges	7 backends	OpenAI only	OpenAI only	Multiple
Consensus judging	7 strategies	No	No	No
Production guardrails	Built-in	No	No	No
Cost estimation	Built-in	No	No	No
Runtime overhead	Zero (pytest plugin)	Separate runner	Separate runner	CLI only
Fluent chaining	`check.that()`	No	No	No

Features by use case

RAG Applications

hallucination · faithfulness · context_relevance · answer_completeness · groundedness · contextual_precision · contextual_recall

Chatbots & Assistants

relevance · toxicity · fluency · coherence · sentiment · role_adherence · instruction_following

AI Agents

tool_accuracy · task_completion · knowledge_retention · conversation_completeness

Safety & Compliance

no_pii · toxicity · bias · language

Quality & Structure

is_json · json_schema · regex · readability · similarity · bleu · rouge_l

Multi-provider judges

from checkllm import create_judge

judge = create_judge("openai", model="gpt-4o")           # OpenAI
judge = create_judge("anthropic", model="claude-sonnet-4-6")  # Anthropic
judge = create_judge("gemini", model="gemini-2.0-flash")  # Google
judge = create_judge("ollama", model="llama3.1")          # Free, local
judge = create_judge("litellm", model="any-model")        # 100+ models

Auto-detection: if you set OPENAI_API_KEY, ANTHROPIC_API_KEY, or have Ollama running, checkllm picks the best judge automatically. Zero config needed.

Cost control

checkllm estimate tests/              # See costs before running
checkllm run tests/ --budget 5.0      # Cap spend at $5
checkllm run tests/ --dry-run         # Estimate without executing

Configuration

# pyproject.toml
[tool.checkllm]
judge_backend = "auto"       # auto-detects from environment
judge_model = "gpt-4o"
default_threshold = 0.8
budget = 10.0
cache_enabled = true
engine = "auto"

CLI

Command	Description
`checkllm init`	Scaffold a project (`--use-case`, `--ci`)
`checkllm run`	Run tests (`--budget`, `--dry-run`, `--snapshot`)
`checkllm estimate`	Estimate costs before running
`checkllm watch`	Re-run on file changes
`checkllm report`	Generate HTML report
`checkllm snapshot`	Save baseline for regression detection
`checkllm diff`	Compare snapshots
`checkllm history`	View run history and trends
`checkllm list-metrics`	Show all available checks and metrics
`checkllm cache`	Manage judge response cache
`checkllm dashboard`	Launch web dashboard

Custom metrics

from checkllm import metric, CheckResult

@metric("brevity")
def brevity_check(output: str, max_words: int = 50, **kwargs) -> CheckResult:
    words = len(output.split())
    return CheckResult(
        passed=words <= max_words,
        score=min(1.0, max_words / max(words, 1)),
        reasoning=f"{words} words (limit: {max_words})",
        cost=0.0, latency_ms=0, metric_name="brevity",
    )

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

5.0.1

Apr 18, 2026

5.0.0

Apr 10, 2026

This version

3.2.0

Apr 6, 2026

0.1.0

Mar 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

checkllm-3.2.0.tar.gz (255.2 kB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

checkllm-3.2.0-py3-none-any.whl (198.9 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file checkllm-3.2.0.tar.gz.

File metadata

Download URL: checkllm-3.2.0.tar.gz
Upload date: Apr 6, 2026
Size: 255.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for checkllm-3.2.0.tar.gz
Algorithm	Hash digest
SHA256	`4f64e1e2f8b5f17ff724a9a9630a79e5730726aa2eeb3ec024edda6d75f690d5`
MD5	`923acec7f570281c13514a58834af2c0`
BLAKE2b-256	`479aced279f2d6478de16ab83c1ffdb49800aee45716bc68666ce396a0ee37c9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for checkllm-3.2.0.tar.gz:

Publisher: publish.yml on javierdejesusda/checkllm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: checkllm-3.2.0.tar.gz
- Subject digest: 4f64e1e2f8b5f17ff724a9a9630a79e5730726aa2eeb3ec024edda6d75f690d5
- Sigstore transparency entry: 1244514181
- Sigstore integration time: Apr 6, 2026
Source repository:
- Permalink: javierdejesusda/checkllm@ae9bec5cb739e15df64b0ddf797d80ddb0f9fb27
- Branch / Tag: refs/tags/v3.2.0
- Owner: https://github.com/javierdejesusda
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ae9bec5cb739e15df64b0ddf797d80ddb0f9fb27
- Trigger Event: release

File details

Details for the file checkllm-3.2.0-py3-none-any.whl.

File metadata

Download URL: checkllm-3.2.0-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 198.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for checkllm-3.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`77512aa972a0d16a3c94f449529b259c721aece9bd52f16f8ac4edc2f2f9df66`
MD5	`3abd417c324edb0574489073c6faeaa3`
BLAKE2b-256	`aa794791678ba7e8291164e2291bcb2c9025f25896a3dad712c28457d46dea0e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for checkllm-3.2.0-py3-none-any.whl:

Publisher: publish.yml on javierdejesusda/checkllm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: checkllm-3.2.0-py3-none-any.whl
- Subject digest: 77512aa972a0d16a3c94f449529b259c721aece9bd52f16f8ac4edc2f2f9df66
- Sigstore transparency entry: 1244514198
- Sigstore integration time: Apr 6, 2026
Source repository:
- Permalink: javierdejesusda/checkllm@ae9bec5cb739e15df64b0ddf797d80ddb0f9fb27
- Branch / Tag: refs/tags/v3.2.0
- Owner: https://github.com/javierdejesusda
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ae9bec5cb739e15df64b0ddf797d80ddb0f9fb27
- Trigger Event: release

checkllm 3.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

checkllm

Why checkllm?

Quickstart

Install

1. Deterministic checks (free, instant)

2. LLM-as-judge (deeper evaluation)

3. Fluent chaining

4. Production guardrails

How checkllm compares

Features by use case

RAG Applications

Chatbots & Assistants

AI Agents

Safety & Compliance

Quality & Structure

Multi-provider judges

Cost control

Configuration

CLI

Custom metrics

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance