Pytest for your prompts — test, version and audit LLM outputs in pure Python
Project description
Chitragupta
वाक्येषु दोषान् गणयन् सत्यं परीक्षणमेव च । चित्रगुप्तः सदा रक्षेत् बुद्धिवाक्यप्रमाणतः ॥
“The one who counts errors in expressions and verifies truth through testing — Chitragupta always protects correctness through reasoning and validation.”
Chitragupta, the divine record-keeper of truth and action, is reimagined for the age of AI.
This library evaluates LLM outputs with precision—enforcing correctness, structure, and reliability through programmable assertions, just like unit tests for prompts.
Features
Pytest for your prompts
Test, validate, and catch breaking changes in your LLM outputs before they reach users.
pip install chitragupta
Stop guessing if your prompt changes broke something.
Every developer who builds with LLMs faces this: you change one word in your prompt, manually test a few inputs, and ship it. Two days later a user reports wrong output. You don't know which change caused it, when it broke, or how to reproduce it.
Chitragupta gives you a safety net. Add a decorator to your function, define your rules, run chitragupta run. You know immediately — before shipping — whether your LLM still behaves the way you expect.
Quick start
from chitragupta import prompttest, contains, max_length
@prompttest(
inputs=["What is 2+2?"],
asserts=[contains("4"), max_length(200)]
)
def my_bot(question):
return "The answer is 4."
if __name__ == "__main__":
print(my_bot("What is 2+2?"))
chitragupta v0.1.0 · 1 file scanned · 1 prompt function found
● my_bot 'What is 2+2?'
contains("4") PASS
max_length(200) PASS
────────────────────────────────────────────────────
2 passed · 1 input · 2 assertions total · 1 prompt function
How it works
- Wrap any Python function that calls an LLM with @prompttest decorator
- Define test inputs and rules the output must satisfy
- Run all prompt tests
chitragupta run
- Get clear pass/fail results for every rule
- No cloud, no YAML, no Node.js, no external dependencies
Minimal example
from chitragupta import prompttest, contains
@prompttest(inputs=["2+2"], asserts=[contains("4")])
def bot(q):
return "4"
Why not just pytest?
You can test LLM outputs with pytest, but it quickly becomes repetitive:
- You have to manually call functions with test inputs
- Assertions are not reusable
- No standard way to define prompt rules
- No CLI to scan and run all prompt tests automatically
Chitragupta solves this by:
- Attaching tests directly to your functions
- Providing reusable assertions
- Running everything with a single command
Why Chitragupta?
Most LLM testing tools add complexity. Chitragupta removes it.
- No cloud - everything runs locally
- No YAML - define tests directly in Python
- No Node.js - pure Python, zero ecosystem friction
- No external dependencies - lightweight and fast
Who is this for?
- Developers building LLM applications
- Teams that want to catch prompt regressions early
- Anyone who needs to validate LLM outputs consistently
- Anyone tired of manually testing prompt changes
Real world use cases
Customer support chatbot
A developer builds a support bot. After tweaking the prompt for tone, the bot starts leaking internal pricing info. With Chitragupta they would have caught it in seconds using not_contains("internal") before any user saw it.
from chitragupta import prompttest, not_contains, max_length
@prompttest(
inputs=["What are your pricing plans?"],
asserts=[not_contains("internal"), max_length(300)]
)
def support_bot(query):
# Your LLM call here
return "Our pricing starts at $10/month for the basic plan."
JSON output validation
A code review tool expects the LLM to always return valid JSON with specific keys. After a model upgrade, the output schema silently changed. valid_json() would have caught it before deploy.
from chitragupta import prompttest, valid_json
@prompttest(
inputs=["Review this Python function"],
asserts=[valid_json()]
)
def code_reviewer(code):
# Your LLM call here
return '{"suggestions": ["Add docstring"], "score": 8}'
Content policy enforcement
An HR tool screening resumes must never mention age, gender, or race in its output for legal reasons. A custom assertion function no_bias_words() catches any prompt change that accidentally enables biased output.
from chitragupta import prompttest
def no_bias_words(text):
bias_terms = ["age", "gender", "race", "young", "old", "male", "female"]
return not any(term in text.lower() for term in bias_terms)
@prompttest(
inputs=["Screen this resume for senior developer role"],
asserts=[no_bias_words]
)
def hr_screening(resume_text):
# Your LLM call here
return "Candidate has strong technical skills and relevant experience."
Product description generator
An e-commerce platform generates descriptions that must be 80-150 words for SEO, always mention the product name, and never include competitor names. min_length(), max_length(), not_contains() enforce all of this automatically.
from chitragupta import prompttest, min_length, max_length, not_contains
@prompttest(
inputs=["Wireless headphones"],
asserts=[min_length(80), max_length(150), not_contains("Sony"), not_contains("Bose")]
)
def description_generator(product):
# Your LLM call here
return "Experience premium sound with our wireless headphones. Features include noise cancellation and 24-hour battery life."
Safety-critical apps
A health app must never give dosage advice or sound like a diagnosis. no_dosage() and no_diagnosis() custom assertions block any prompt version that enables medical advice from being deployed.
from chitragupta import prompttest
def no_dosage(text):
dosage_terms = ["mg", "dose", "dosage", "pill", "tablet", "take"]
return not any(term in text.lower() for term in dosage_terms)
def no_diagnosis(text):
diagnosis_terms = ["diagnosis", "condition", "disease", "illness", "symptoms"]
return not any(term in text.lower() for term in diagnosis_terms)
@prompttest(
inputs=["I have a headache, what should I do?"],
asserts=[no_dosage, no_diagnosis]
)
def health_advisor(query):
# Your LLM call here
return "For health concerns, please consult with a qualified healthcare professional."
Built-in assertions
| Assertion | Description | Example |
|---|---|---|
| contains() | Text must contain substring | contains("hello") |
| not_contains() | Text must not contain substring | not_contains("error") |
| max_length() | Text length must be ≤ value | max_length(100) |
| min_length() | Text length must be ≥ value | min_length(10) |
| valid_json() | Text must be valid JSON | valid_json() |
| matches_regex() | Text must match regex pattern | matches_regex(r"\d{4}") |
Custom assertions
Any Python function that returns True/False works as a custom assertion:
def no_emojis(text):
return not any(char in text for char in ["😀", "😢", "🎉"])
@prompttest(
inputs=["Generate a response"],
asserts=[no_emojis]
)
def formal_response(query):
return "This is a formal response without emojis."
CI/CD integration
name: Test Prompts
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: pip install chitragupta
- run: chitragupta run
Works with any LLM
Chitragupta is LLM-agnostic. It works with OpenAI, Anthropic, Groq, Gemini, local models, or any other LLM you can call from Python. Just wrap your LLM function with the decorator and test away.
Roadmap
- v0.2 — run history saved locally, chitragupta history command
- v1.0 — @promptversion decorator, chitragupta diff v1 v2, pytest plugin
- v1.1 — llm_judge() assertion, async support
- v2.0 — HTML reports, production monitoring, plugin ecosystem
About the name
Named after Chitragupta — the divine record-keeper who tracks every action and evaluates it with precision. This library does the same for your LLM outputs.
License
MIT License © 2026 Rohan Khairnar
Support
If you find this useful, consider giving it a ⭐ on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chitragupta-0.1.1.tar.gz.
File metadata
- Download URL: chitragupta-0.1.1.tar.gz
- Upload date:
- Size: 17.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61107b5d09a20b625ba4ec293d7c71ae651008c7514441fe74026aba0d38671e
|
|
| MD5 |
828f70407136d2cbcfa2fd1268cf7ae4
|
|
| BLAKE2b-256 |
bc0de87efe5b67766865b3b04e29da4571d4c72979eceeb80b402f550bcc124c
|
File details
Details for the file chitragupta-0.1.1-py3-none-any.whl.
File metadata
- Download URL: chitragupta-0.1.1-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bf3348cd58ab981017c82a95a54729436291872147209d1c33bf93e4c4c4432
|
|
| MD5 |
f37e6b6930b921db0dfddba11f7ed528
|
|
| BLAKE2b-256 |
7c717fc03cc71495cb09f264b79201289d23b20e14ef71cf03efafb472588f66
|