Pytest for your prompts — test, version and audit LLM outputs in pure Python

These details have not been verified by PyPI

Project links

Project description

Chitragupta

वाक्येषु दोषान् गणयन् सत्यं परीक्षणमेव च । चित्रगुप्तः सदा रक्षेत् बुद्धिवाक्यप्रमाणतः ॥

“The one who counts errors in expressions and verifies truth through testing — Chitragupta always protects correctness through reasoning and validation.”

Chitragupta, the divine record-keeper of truth and action, is reimagined for the age of AI.

This library evaluates LLM outputs with precision—enforcing correctness, structure, and reliability through programmable assertions, just like unit tests for prompts.

Features

Pytest for your prompts

Test, validate, and catch breaking changes in your LLM outputs before they reach users.

pip install chitragupta

Stop guessing if your prompt changes broke something.

Every developer who builds with LLMs faces this: you change one word in your prompt, manually test a few inputs, and ship it. Two days later a user reports wrong output. You don't know which change caused it, when it broke, or how to reproduce it.

Chitragupta gives you a safety net. Add a decorator to your function, define your rules, run chitragupta run. You know immediately — before shipping — whether your LLM still behaves the way you expect.

Quick start

from chitragupta import prompttest, contains, max_length

@prompttest(
    inputs=["What is 2+2?"],
    asserts=[contains("4"), max_length(200)]
)
def my_bot(question):
    return "The answer is 4."

if __name__ == "__main__":
    print(my_bot("What is 2+2?"))

chitragupta  v0.1.0  ·  1 file scanned  ·  1 prompt function found
●  my_bot  'What is 2+2?'
contains("4")              PASS
max_length(200)            PASS
────────────────────────────────────────────────────
2 passed  ·  1 input  ·  2 assertions total  ·  1 prompt function

How it works

Wrap any Python function that calls an LLM with @prompttest decorator
Define test inputs and rules the output must satisfy
Run all prompt tests

chitragupta run

Get clear pass/fail results for every rule
No cloud, no YAML, no Node.js, no external dependencies

Minimal example

from chitragupta import prompttest, contains

@prompttest(inputs=["2+2"], asserts=[contains("4")])
def bot(q):
    return "4"

Why not just pytest?

You can test LLM outputs with pytest, but it quickly becomes repetitive:

You have to manually call functions with test inputs
Assertions are not reusable
No standard way to define prompt rules
No CLI to scan and run all prompt tests automatically

Chitragupta solves this by:

Attaching tests directly to your functions
Providing reusable assertions
Running everything with a single command

Why Chitragupta?

Most LLM testing tools add complexity. Chitragupta removes it.

No cloud - everything runs locally
No YAML - define tests directly in Python
No Node.js - pure Python, zero ecosystem friction
No external dependencies - lightweight and fast

Who is this for?

Developers building LLM applications
Teams that want to catch prompt regressions early
Anyone who needs to validate LLM outputs consistently
Anyone tired of manually testing prompt changes

Real world use cases

Customer support chatbot

A developer builds a support bot. After tweaking the prompt for tone, the bot starts leaking internal pricing info. With Chitragupta they would have caught it in seconds using not_contains("internal") before any user saw it.

from chitragupta import prompttest, not_contains, max_length

@prompttest(
    inputs=["What are your pricing plans?"],
    asserts=[not_contains("internal"), max_length(300)]
)
def support_bot(query):
    # Your LLM call here
    return "Our pricing starts at $10/month for the basic plan."

JSON output validation

A code review tool expects the LLM to always return valid JSON with specific keys. After a model upgrade, the output schema silently changed. valid_json() would have caught it before deploy.

from chitragupta import prompttest, valid_json

@prompttest(
    inputs=["Review this Python function"],
    asserts=[valid_json()]
)
def code_reviewer(code):
    # Your LLM call here
    return '{"suggestions": ["Add docstring"], "score": 8}'

Content policy enforcement

An HR tool screening resumes must never mention age, gender, or race in its output for legal reasons. A custom assertion function no_bias_words() catches any prompt change that accidentally enables biased output.

from chitragupta import prompttest

def no_bias_words(text):
    bias_terms = ["age", "gender", "race", "young", "old", "male", "female"]
    return not any(term in text.lower() for term in bias_terms)

@prompttest(
    inputs=["Screen this resume for senior developer role"],
    asserts=[no_bias_words]
)
def hr_screening(resume_text):
    # Your LLM call here
    return "Candidate has strong technical skills and relevant experience."

Product description generator

An e-commerce platform generates descriptions that must be 80-150 words for SEO, always mention the product name, and never include competitor names. min_length(), max_length(), not_contains() enforce all of this automatically.

from chitragupta import prompttest, min_length, max_length, not_contains

@prompttest(
    inputs=["Wireless headphones"],
    asserts=[min_length(80), max_length(150), not_contains("Sony"), not_contains("Bose")]
)
def description_generator(product):
    # Your LLM call here
    return "Experience premium sound with our wireless headphones. Features include noise cancellation and 24-hour battery life."

Safety-critical apps

A health app must never give dosage advice or sound like a diagnosis. no_dosage() and no_diagnosis() custom assertions block any prompt version that enables medical advice from being deployed.

from chitragupta import prompttest

def no_dosage(text):
    dosage_terms = ["mg", "dose", "dosage", "pill", "tablet", "take"]
    return not any(term in text.lower() for term in dosage_terms)

def no_diagnosis(text):
    diagnosis_terms = ["diagnosis", "condition", "disease", "illness", "symptoms"]
    return not any(term in text.lower() for term in diagnosis_terms)

@prompttest(
    inputs=["I have a headache, what should I do?"],
    asserts=[no_dosage, no_diagnosis]
)
def health_advisor(query):
    # Your LLM call here
    return "For health concerns, please consult with a qualified healthcare professional."

Built-in assertions

Assertion	Description	Example
contains()	Text must contain substring	contains("hello")
not_contains()	Text must not contain substring	not_contains("error")
max_length()	Text length must be ≤ value	max_length(100)
min_length()	Text length must be ≥ value	min_length(10)
valid_json()	Text must be valid JSON	valid_json()
matches_regex()	Text must match regex pattern	matches_regex(r"\d{4}")

Custom assertions

Any Python function that returns True/False works as a custom assertion:

def no_emojis(text):
    return not any(char in text for char in ["😀", "😢", "🎉"])

@prompttest(
    inputs=["Generate a response"],
    asserts=[no_emojis]
)
def formal_response(query):
    return "This is a formal response without emojis."

CI/CD integration

name: Test Prompts
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: pip install chitragupta
      - run: chitragupta run

Works with any LLM

Chitragupta is LLM-agnostic. It works with OpenAI, Anthropic, Groq, Gemini, local models, or any other LLM you can call from Python. Just wrap your LLM function with the decorator and test away.

Roadmap

v0.2 — run history saved locally, chitragupta history command
v1.0 — @promptversion decorator, chitragupta diff v1 v2, pytest plugin
v1.1 — llm_judge() assertion, async support
v2.0 — HTML reports, production monitoring, plugin ecosystem

About the name

Named after Chitragupta — the divine record-keeper who tracks every action and evaluates it with precision. This library does the same for your LLM outputs.

License

Support

If you find this useful, consider giving it a ⭐ on GitHub.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

May 7, 2026

0.1.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chitragupta-0.1.1.tar.gz (17.9 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chitragupta-0.1.1-py3-none-any.whl (15.5 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file chitragupta-0.1.1.tar.gz.

File metadata

Download URL: chitragupta-0.1.1.tar.gz
Upload date: May 7, 2026
Size: 17.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for chitragupta-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`61107b5d09a20b625ba4ec293d7c71ae651008c7514441fe74026aba0d38671e`
MD5	`828f70407136d2cbcfa2fd1268cf7ae4`
BLAKE2b-256	`bc0de87efe5b67766865b3b04e29da4571d4c72979eceeb80b402f550bcc124c`

See more details on using hashes here.

File details

Details for the file chitragupta-0.1.1-py3-none-any.whl.

File metadata

Download URL: chitragupta-0.1.1-py3-none-any.whl
Upload date: May 7, 2026
Size: 15.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for chitragupta-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2bf3348cd58ab981017c82a95a54729436291872147209d1c33bf93e4c4c4432`
MD5	`f37e6b6930b921db0dfddba11f7ed528`
BLAKE2b-256	`7c717fc03cc71495cb09f264b79201289d23b20e14ef71cf03efafb472588f66`

See more details on using hashes here.

chitragupta 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Chitragupta

Features

Quick start

How it works

Minimal example

Why not just pytest?

Why Chitragupta?

Who is this for?

Real world use cases

Customer support chatbot

JSON output validation

Content policy enforcement

Product description generator

Safety-critical apps

Built-in assertions

Custom assertions

CI/CD integration

Works with any LLM

Roadmap

About the name

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes