The most comprehensive LLM testing and evaluation framework for Python.
Project description
checkllm
The pytest of LLM testing.
pip install checkllm
def test_my_llm(check):
output = my_llm("What is Python?")
check.contains(output, "programming language")
check.no_pii(output)
check.hallucination(output, context="Python is a programming language created by Guido van Rossum.")
That's it. No setup, no boilerplate. The check fixture works in any pytest test.
Why checkllm?
- Zero learning curve — if you know pytest, you know checkllm. Just add a
checkparameter. - 33 free checks run instantly with zero API calls. No API key needed to start.
- 24 LLM-as-judge metrics — hallucination, relevance, faithfulness, bias, toxicity, and more.
- Same checks everywhere — use them in tests, CI, and production guardrails.
Quickstart
Install
pip install checkllm
checkllm init --use-case rag # generates a tailored test file
1. Deterministic checks (free, instant)
def test_basic_quality(check):
output = my_llm("Summarize this article.")
check.contains(output, "key finding")
check.max_tokens(output, limit=200)
check.no_pii(output)
check.is_json(output) # if expecting structured output
check.regex(output, pattern=r"\d+ results found")
2. LLM-as-judge (deeper evaluation)
def test_rag_quality(check):
output = my_rag("What causes climate change?")
context = retrieve_context("climate change")
check.hallucination(output, context=context)
check.faithfulness(output, context=context)
check.relevance(output, query="What causes climate change?")
check.toxicity(output)
3. Fluent chaining
def test_with_chaining(check):
output = my_llm("Explain quantum physics simply.")
check.that(output) \
.contains("quantum") \
.max_tokens(200) \
.has_no_pii() \
.scores_above("relevance", 0.8, query="quantum physics")
4. Production guardrails
from checkllm import Guard, CheckSpec
guard = Guard(checks=[
CheckSpec(check_type="no_pii"),
CheckSpec(check_type="max_tokens", params={"limit": 500}),
CheckSpec(check_type="toxicity"),
])
result = guard.validate(llm_output)
if not result.valid:
result.raise_on_failure()
How checkllm compares
| Feature | checkllm | DeepEval | Ragas | promptfoo |
|---|---|---|---|---|
| pytest native | Yes | Yes | No | No |
| Free deterministic checks | 33 | Limited | No | Yes |
| LLM-as-judge metrics | 24 | 14+ | 8+ | Custom |
| Multi-provider judges | 7 backends | OpenAI only | OpenAI only | Multiple |
| Consensus judging | 7 strategies | No | No | No |
| Production guardrails | Built-in | No | No | No |
| Cost estimation | Built-in | No | No | No |
| Runtime overhead | Zero (pytest plugin) | Separate runner | Separate runner | CLI only |
| Fluent chaining | check.that() |
No | No | No |
Features by use case
RAG Applications
hallucination · faithfulness · context_relevance · answer_completeness · groundedness · contextual_precision · contextual_recall
Chatbots & Assistants
relevance · toxicity · fluency · coherence · sentiment · role_adherence · instruction_following
AI Agents
tool_accuracy · task_completion · knowledge_retention · conversation_completeness
Safety & Compliance
no_pii · toxicity · bias · language
Quality & Structure
is_json · json_schema · regex · readability · similarity · bleu · rouge_l
Multi-provider judges
from checkllm import create_judge
judge = create_judge("openai", model="gpt-4o") # OpenAI
judge = create_judge("anthropic", model="claude-sonnet-4-6") # Anthropic
judge = create_judge("gemini", model="gemini-2.0-flash") # Google
judge = create_judge("ollama", model="llama3.1") # Free, local
judge = create_judge("litellm", model="any-model") # 100+ models
Auto-detection: if you set OPENAI_API_KEY, ANTHROPIC_API_KEY, or have Ollama running, checkllm picks the best judge automatically. Zero config needed.
Cost control
checkllm estimate tests/ # See costs before running
checkllm run tests/ --budget 5.0 # Cap spend at $5
checkllm run tests/ --dry-run # Estimate without executing
Configuration
# pyproject.toml
[tool.checkllm]
judge_backend = "auto" # auto-detects from environment
judge_model = "gpt-4o"
default_threshold = 0.8
budget = 10.0
cache_enabled = true
engine = "auto"
CLI
| Command | Description |
|---|---|
checkllm init |
Scaffold a project (--use-case, --ci) |
checkllm run |
Run tests (--budget, --dry-run, --snapshot) |
checkllm estimate |
Estimate costs before running |
checkllm watch |
Re-run on file changes |
checkllm report |
Generate HTML report |
checkllm snapshot |
Save baseline for regression detection |
checkllm diff |
Compare snapshots |
checkllm history |
View run history and trends |
checkllm list-metrics |
Show all available checks and metrics |
checkllm cache |
Manage judge response cache |
checkllm dashboard |
Launch web dashboard |
Custom metrics
from checkllm import metric, CheckResult
@metric("brevity")
def brevity_check(output: str, max_words: int = 50, **kwargs) -> CheckResult:
words = len(output.split())
return CheckResult(
passed=words <= max_words,
score=min(1.0, max_words / max(words, 1)),
reasoning=f"{words} words (limit: {max_words})",
cost=0.0, latency_ms=0, metric_name="brevity",
)
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file checkllm-3.2.0.tar.gz.
File metadata
- Download URL: checkllm-3.2.0.tar.gz
- Upload date:
- Size: 255.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f64e1e2f8b5f17ff724a9a9630a79e5730726aa2eeb3ec024edda6d75f690d5
|
|
| MD5 |
923acec7f570281c13514a58834af2c0
|
|
| BLAKE2b-256 |
479aced279f2d6478de16ab83c1ffdb49800aee45716bc68666ce396a0ee37c9
|
Provenance
The following attestation bundles were made for checkllm-3.2.0.tar.gz:
Publisher:
publish.yml on javierdejesusda/checkllm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
checkllm-3.2.0.tar.gz -
Subject digest:
4f64e1e2f8b5f17ff724a9a9630a79e5730726aa2eeb3ec024edda6d75f690d5 - Sigstore transparency entry: 1244514181
- Sigstore integration time:
-
Permalink:
javierdejesusda/checkllm@ae9bec5cb739e15df64b0ddf797d80ddb0f9fb27 -
Branch / Tag:
refs/tags/v3.2.0 - Owner: https://github.com/javierdejesusda
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae9bec5cb739e15df64b0ddf797d80ddb0f9fb27 -
Trigger Event:
release
-
Statement type:
File details
Details for the file checkllm-3.2.0-py3-none-any.whl.
File metadata
- Download URL: checkllm-3.2.0-py3-none-any.whl
- Upload date:
- Size: 198.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77512aa972a0d16a3c94f449529b259c721aece9bd52f16f8ac4edc2f2f9df66
|
|
| MD5 |
3abd417c324edb0574489073c6faeaa3
|
|
| BLAKE2b-256 |
aa794791678ba7e8291164e2291bcb2c9025f25896a3dad712c28457d46dea0e
|
Provenance
The following attestation bundles were made for checkllm-3.2.0-py3-none-any.whl:
Publisher:
publish.yml on javierdejesusda/checkllm
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
checkllm-3.2.0-py3-none-any.whl -
Subject digest:
77512aa972a0d16a3c94f449529b259c721aece9bd52f16f8ac4edc2f2f9df66 - Sigstore transparency entry: 1244514198
- Sigstore integration time:
-
Permalink:
javierdejesusda/checkllm@ae9bec5cb739e15df64b0ddf797d80ddb0f9fb27 -
Branch / Tag:
refs/tags/v3.2.0 - Owner: https://github.com/javierdejesusda
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ae9bec5cb739e15df64b0ddf797d80ddb0f9fb27 -
Trigger Event:
release
-
Statement type: