Skip to main content

Pytest for your prompts — test, version and audit LLM outputs in pure Python

Project description

Chitragupta

वाक्येषु दोषान् गणयन् सत्यं परीक्षणमेव च । चित्रगुप्तः सदा रक्षेत् बुद्धिवाक्यप्रमाणतः ॥

“The one who counts errors in expressions and verifies truth through testing — Chitragupta always protects correctness through reasoning and validation.”

Chitragupta, the divine record-keeper of truth and action, is reimagined for the age of AI.

This library evaluates LLM outputs with precision—enforcing correctness, structure, and reliability through programmable assertions, just like unit tests for prompts.

Features

Pytest for your prompts

Test, validate, and catch breaking changes in your LLM outputs before they reach users.

pip install chitragupta

Stop guessing if your prompt changes broke something.

Every developer who builds with LLMs faces this: you change one word in your prompt, manually test a few inputs, and ship it. Two days later a user reports wrong output. You don't know which change caused it, when it broke, or how to reproduce it.

Chitragupta gives you a safety net. Add a decorator to your function, define your rules, run chitragupta run. You know immediately — before shipping — whether your LLM still behaves the way you expect.

Quick start

from chitragupta import prompttest, contains, max_length

@prompttest(
    inputs=["What is 2+2?"],
    asserts=[contains("4"), max_length(200)]
)
def my_bot(question):
    return "The answer is 4."

if __name__ == "__main__":
    print(my_bot("What is 2+2?"))
chitragupta  v0.1.0  ·  1 file scanned  ·  1 prompt function found
●  my_bot  'What is 2+2?'
contains("4")              PASS
max_length(200)            PASS
────────────────────────────────────────────────────
2 passed  ·  1 input  ·  2 assertions total  ·  1 prompt function

How it works

  • Wrap any Python function that calls an LLM with @prompttest decorator
  • Define test inputs and rules the output must satisfy
  • Run all prompt tests
chitragupta run 
  • Get clear pass/fail results for every rule
  • No cloud, no YAML, no Node.js, no external dependencies

Minimal example

from chitragupta import prompttest, contains

@prompttest(inputs=["2+2"], asserts=[contains("4")])
def bot(q):
    return "4"

Why not just pytest?

You can test LLM outputs with pytest, but it quickly becomes repetitive:

  • You have to manually call functions with test inputs
  • Assertions are not reusable
  • No standard way to define prompt rules
  • No CLI to scan and run all prompt tests automatically

Chitragupta solves this by:

  • Attaching tests directly to your functions
  • Providing reusable assertions
  • Running everything with a single command

Why Chitragupta?

Most LLM testing tools add complexity. Chitragupta removes it.

  • No cloud - everything runs locally
  • No YAML - define tests directly in Python
  • No Node.js - pure Python, zero ecosystem friction
  • No external dependencies - lightweight and fast

Who is this for?

  • Developers building LLM applications
  • Teams that want to catch prompt regressions early
  • Anyone who needs to validate LLM outputs consistently
  • Anyone tired of manually testing prompt changes

Real world use cases

Customer support chatbot

A developer builds a support bot. After tweaking the prompt for tone, the bot starts leaking internal pricing info. With Chitragupta they would have caught it in seconds using not_contains("internal") before any user saw it.

from chitragupta import prompttest, not_contains, max_length

@prompttest(
    inputs=["What are your pricing plans?"],
    asserts=[not_contains("internal"), max_length(300)]
)
def support_bot(query):
    # Your LLM call here
    return "Our pricing starts at $10/month for the basic plan."

JSON output validation

A code review tool expects the LLM to always return valid JSON with specific keys. After a model upgrade, the output schema silently changed. valid_json() would have caught it before deploy.

from chitragupta import prompttest, valid_json

@prompttest(
    inputs=["Review this Python function"],
    asserts=[valid_json()]
)
def code_reviewer(code):
    # Your LLM call here
    return '{"suggestions": ["Add docstring"], "score": 8}'

Content policy enforcement

An HR tool screening resumes must never mention age, gender, or race in its output for legal reasons. A custom assertion function no_bias_words() catches any prompt change that accidentally enables biased output.

from chitragupta import prompttest

def no_bias_words(text):
    bias_terms = ["age", "gender", "race", "young", "old", "male", "female"]
    return not any(term in text.lower() for term in bias_terms)

@prompttest(
    inputs=["Screen this resume for senior developer role"],
    asserts=[no_bias_words]
)
def hr_screening(resume_text):
    # Your LLM call here
    return "Candidate has strong technical skills and relevant experience."

Product description generator

An e-commerce platform generates descriptions that must be 80-150 words for SEO, always mention the product name, and never include competitor names. min_length(), max_length(), not_contains() enforce all of this automatically.

from chitragupta import prompttest, min_length, max_length, not_contains

@prompttest(
    inputs=["Wireless headphones"],
    asserts=[min_length(80), max_length(150), not_contains("Sony"), not_contains("Bose")]
)
def description_generator(product):
    # Your LLM call here
    return "Experience premium sound with our wireless headphones. Features include noise cancellation and 24-hour battery life."

Safety-critical apps

A health app must never give dosage advice or sound like a diagnosis. no_dosage() and no_diagnosis() custom assertions block any prompt version that enables medical advice from being deployed.

from chitragupta import prompttest

def no_dosage(text):
    dosage_terms = ["mg", "dose", "dosage", "pill", "tablet", "take"]
    return not any(term in text.lower() for term in dosage_terms)

def no_diagnosis(text):
    diagnosis_terms = ["diagnosis", "condition", "disease", "illness", "symptoms"]
    return not any(term in text.lower() for term in diagnosis_terms)

@prompttest(
    inputs=["I have a headache, what should I do?"],
    asserts=[no_dosage, no_diagnosis]
)
def health_advisor(query):
    # Your LLM call here
    return "For health concerns, please consult with a qualified healthcare professional."

Built-in assertions

Assertion Description Example
contains() Text must contain substring contains("hello")
not_contains() Text must not contain substring not_contains("error")
max_length() Text length must be ≤ value max_length(100)
min_length() Text length must be ≥ value min_length(10)
valid_json() Text must be valid JSON valid_json()
matches_regex() Text must match regex pattern matches_regex(r"\d{4}")

Custom assertions

Any Python function that returns True/False works as a custom assertion:

def no_emojis(text):
    return not any(char in text for char in ["😀", "😢", "🎉"])

@prompttest(
    inputs=["Generate a response"],
    asserts=[no_emojis]
)
def formal_response(query):
    return "This is a formal response without emojis."

CI/CD integration

name: Test Prompts
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: pip install chitragupta
      - run: chitragupta run

Works with any LLM

Chitragupta is LLM-agnostic. It works with OpenAI, Anthropic, Groq, Gemini, local models, or any other LLM you can call from Python. Just wrap your LLM function with the decorator and test away.

Roadmap

  • v0.2 — run history saved locally, chitragupta history command
  • v1.0 — @promptversion decorator, chitragupta diff v1 v2, pytest plugin
  • v1.1 — llm_judge() assertion, async support
  • v2.0 — HTML reports, production monitoring, plugin ecosystem

About the name

Named after Chitragupta — the divine record-keeper who tracks every action and evaluates it with precision. This library does the same for your LLM outputs.

License

MIT License © 2026 Rohan Khairnar

Support

If you find this useful, consider giving it a ⭐ on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chitragupta-0.1.0.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chitragupta-0.1.0-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file chitragupta-0.1.0.tar.gz.

File metadata

  • Download URL: chitragupta-0.1.0.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for chitragupta-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d37f4a1608dd700ff3bdaa6d322903064f449f473793354e2ba8da02b28d41a5
MD5 7c8fc5f74fdecb1dfda87dabd2684a11
BLAKE2b-256 a8c7e4d8bfa5899af34b495c3007106c1ce3ac289d99e0c7853404a6c6dc38c4

See more details on using hashes here.

File details

Details for the file chitragupta-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: chitragupta-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for chitragupta-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 62930941ebb289d83d9be6f5140f36aea12c3448a62d6aff7394061a1bdfd606
MD5 2e90873e2c8253e47eeb69c72a46d5a2
BLAKE2b-256 425a1efb3eb981b8e4c734592f3921dde95df1fcf8e724d88a94ffb7d41ec088

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page