Skip to main content

Pytest for your prompts — test, version and audit LLM outputs in pure Python

Project description

Chitragupta

वाक्येषु दोषान् गणयन् सत्यं परीक्षणमेव च । चित्रगुप्तः सदा रक्षेत् बुद्धिवाक्यप्रमाणतः ॥

“The one who counts errors in expressions and verifies truth through testing — Chitragupta always protects correctness through reasoning and validation.”

Chitragupta, the divine record-keeper of truth and action, is reimagined for the age of AI.

This library evaluates LLM outputs with precision—enforcing correctness, structure, and reliability through programmable assertions, just like unit tests for prompts.

Features

Pytest for your prompts

Test, validate, and catch breaking changes in your LLM outputs before they reach users.

pip install chitragupta

Stop guessing if your prompt changes broke something.

Every developer who builds with LLMs faces this: you change one word in your prompt, manually test a few inputs, and ship it. Two days later a user reports wrong output. You don't know which change caused it, when it broke, or how to reproduce it.

Chitragupta gives you a safety net. Add a decorator to your function, define your rules, run chitragupta run. You know immediately — before shipping — whether your LLM still behaves the way you expect.

Quick start

from chitragupta import prompttest, contains, max_length

@prompttest(
    inputs=["What is 2+2?"],
    asserts=[contains("4"), max_length(200)]
)
def my_bot(question):
    return "The answer is 4."

if __name__ == "__main__":
    print(my_bot("What is 2+2?"))
chitragupta  v0.1.0  ·  1 file scanned  ·  1 prompt function found
●  my_bot  'What is 2+2?'
contains("4")              PASS
max_length(200)            PASS
────────────────────────────────────────────────────
2 passed  ·  1 input  ·  2 assertions total  ·  1 prompt function

How it works

  • Wrap any Python function that calls an LLM with @prompttest decorator
  • Define test inputs and rules the output must satisfy
  • Run all prompt tests
chitragupta run 
  • Get clear pass/fail results for every rule
  • No cloud, no YAML, no Node.js, no external dependencies

Minimal example

from chitragupta import prompttest, contains

@prompttest(inputs=["2+2"], asserts=[contains("4")])
def bot(q):
    return "4"

Why not just pytest?

You can test LLM outputs with pytest, but it quickly becomes repetitive:

  • You have to manually call functions with test inputs
  • Assertions are not reusable
  • No standard way to define prompt rules
  • No CLI to scan and run all prompt tests automatically

Chitragupta solves this by:

  • Attaching tests directly to your functions
  • Providing reusable assertions
  • Running everything with a single command

Why Chitragupta?

Most LLM testing tools add complexity. Chitragupta removes it.

  • No cloud - everything runs locally
  • No YAML - define tests directly in Python
  • No Node.js - pure Python, zero ecosystem friction
  • No external dependencies - lightweight and fast

Who is this for?

  • Developers building LLM applications
  • Teams that want to catch prompt regressions early
  • Anyone who needs to validate LLM outputs consistently
  • Anyone tired of manually testing prompt changes

Real world use cases

Customer support chatbot

A developer builds a support bot. After tweaking the prompt for tone, the bot starts leaking internal pricing info. With Chitragupta they would have caught it in seconds using not_contains("internal") before any user saw it.

from chitragupta import prompttest, not_contains, max_length

@prompttest(
    inputs=["What are your pricing plans?"],
    asserts=[not_contains("internal"), max_length(300)]
)
def support_bot(query):
    # Your LLM call here
    return "Our pricing starts at $10/month for the basic plan."

JSON output validation

A code review tool expects the LLM to always return valid JSON with specific keys. After a model upgrade, the output schema silently changed. valid_json() would have caught it before deploy.

from chitragupta import prompttest, valid_json

@prompttest(
    inputs=["Review this Python function"],
    asserts=[valid_json()]
)
def code_reviewer(code):
    # Your LLM call here
    return '{"suggestions": ["Add docstring"], "score": 8}'

Content policy enforcement

An HR tool screening resumes must never mention age, gender, or race in its output for legal reasons. A custom assertion function no_bias_words() catches any prompt change that accidentally enables biased output.

from chitragupta import prompttest

def no_bias_words(text):
    bias_terms = ["age", "gender", "race", "young", "old", "male", "female"]
    return not any(term in text.lower() for term in bias_terms)

@prompttest(
    inputs=["Screen this resume for senior developer role"],
    asserts=[no_bias_words]
)
def hr_screening(resume_text):
    # Your LLM call here
    return "Candidate has strong technical skills and relevant experience."

Product description generator

An e-commerce platform generates descriptions that must be 80-150 words for SEO, always mention the product name, and never include competitor names. min_length(), max_length(), not_contains() enforce all of this automatically.

from chitragupta import prompttest, min_length, max_length, not_contains

@prompttest(
    inputs=["Wireless headphones"],
    asserts=[min_length(80), max_length(150), not_contains("Sony"), not_contains("Bose")]
)
def description_generator(product):
    # Your LLM call here
    return "Experience premium sound with our wireless headphones. Features include noise cancellation and 24-hour battery life."

Safety-critical apps

A health app must never give dosage advice or sound like a diagnosis. no_dosage() and no_diagnosis() custom assertions block any prompt version that enables medical advice from being deployed.

from chitragupta import prompttest

def no_dosage(text):
    dosage_terms = ["mg", "dose", "dosage", "pill", "tablet", "take"]
    return not any(term in text.lower() for term in dosage_terms)

def no_diagnosis(text):
    diagnosis_terms = ["diagnosis", "condition", "disease", "illness", "symptoms"]
    return not any(term in text.lower() for term in diagnosis_terms)

@prompttest(
    inputs=["I have a headache, what should I do?"],
    asserts=[no_dosage, no_diagnosis]
)
def health_advisor(query):
    # Your LLM call here
    return "For health concerns, please consult with a qualified healthcare professional."

Built-in assertions

Assertion Description Example
contains() Text must contain substring contains("hello")
not_contains() Text must not contain substring not_contains("error")
max_length() Text length must be ≤ value max_length(100)
min_length() Text length must be ≥ value min_length(10)
valid_json() Text must be valid JSON valid_json()
matches_regex() Text must match regex pattern matches_regex(r"\d{4}")

Custom assertions

Any Python function that returns True/False works as a custom assertion:

def no_emojis(text):
    return not any(char in text for char in ["😀", "😢", "🎉"])

@prompttest(
    inputs=["Generate a response"],
    asserts=[no_emojis]
)
def formal_response(query):
    return "This is a formal response without emojis."

CI/CD integration

name: Test Prompts
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: pip install chitragupta
      - run: chitragupta run

Works with any LLM

Chitragupta is LLM-agnostic. It works with OpenAI, Anthropic, Groq, Gemini, local models, or any other LLM you can call from Python. Just wrap your LLM function with the decorator and test away.

Roadmap

  • v0.2 — run history saved locally, chitragupta history command
  • v1.0 — @promptversion decorator, chitragupta diff v1 v2, pytest plugin
  • v1.1 — llm_judge() assertion, async support
  • v2.0 — HTML reports, production monitoring, plugin ecosystem

About the name

Named after Chitragupta — the divine record-keeper who tracks every action and evaluates it with precision. This library does the same for your LLM outputs.

License

MIT License © 2026 Rohan Khairnar

Support

If you find this useful, consider giving it a ⭐ on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chitragupta-0.1.1.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chitragupta-0.1.1-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file chitragupta-0.1.1.tar.gz.

File metadata

  • Download URL: chitragupta-0.1.1.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for chitragupta-0.1.1.tar.gz
Algorithm Hash digest
SHA256 61107b5d09a20b625ba4ec293d7c71ae651008c7514441fe74026aba0d38671e
MD5 828f70407136d2cbcfa2fd1268cf7ae4
BLAKE2b-256 bc0de87efe5b67766865b3b04e29da4571d4c72979eceeb80b402f550bcc124c

See more details on using hashes here.

File details

Details for the file chitragupta-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: chitragupta-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for chitragupta-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2bf3348cd58ab981017c82a95a54729436291872147209d1c33bf93e4c4c4432
MD5 f37e6b6930b921db0dfddba11f7ed528
BLAKE2b-256 7c717fc03cc71495cb09f264b79201289d23b20e14ef71cf03efafb472588f66

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page