CLI tool for running LLM drift detection evaluations
Project description
pramana · प्रमाण
Track whether LLM API outputs stay consistent over time.
Crowdsourced drift detection for LLM APIs. Run reproducible evals, compare results over time, catch silent model changes.
The Problem
When you call gpt-5 or claude-sonnet-4-6 today, you might get different behavior than yesterday. Providers update, fine-tune, and swap models behind stable identifiers. This is invisible.
There's no standard way to notice — let alone measure — these changes.
The Fix
git clone https://github.com/syd-ppt/pramana && cd pramana
uv pip install -e ".[dev]"
$ pramana run --tier cheap --model gpt-5.2
Running cheap suite against gpt-5.2...
✓ 10/10 passed
Pass rate: 100.0%
Same prompts. Same parameters. Deterministic where the provider allows it. Compare across runs and users.
Usage
# See all supported models
pramana models
# Run evals (auto-detects provider from model name)
export OPENAI_API_KEY=sk-...
pramana run --tier cheap --model gpt-4o
# Aliases work too
pramana run --tier cheap --model opus
# Submit to the community dashboard
pramana submit results.json
Tiers:
| Tier | Tests | Purpose |
|---|---|---|
cheap |
10 | Smoke test, CI gates |
moderate |
25 | Regular monitoring |
comprehensive |
75 | Full evaluation |
All tiers cover 6 categories: reasoning, factual, instruction following, coding, safety, creative.
Providers
| Provider | Temperature | Seed | Reproducibility |
|---|---|---|---|
| OpenAI | ✅ Enforced | ✅ Enforced | High |
| Anthropic | ✅ Enforced | ❌ Ignored | Low |
| ✅ Enforced | ✅ Enforced | Medium |
For scientific drift detection, use OpenAI API with explicit keys. See REPRODUCIBILITY.md.
How It Works
You run pramana ──► Fixed prompts hit the API ──► Results hashed & stored
│
Other users run pramana ──► Same prompts ──► Results compared
│
Drift detected via
output consistency tracking
- Content-addressable hashing — SHA-256 of (model, prompt, output) for deduplication
- Deterministic parameters —
temperature=0.0,seed=42enforced by default - No normalization layer — raw API responses, not filtered through LiteLLM
Authentication (Optional)
pramana login # GitHub/Google OAuth
pramana whoami # Check status
pramana delete # GDPR: delete all your data
No login required to run evals or submit results. Auth enables personalized tracking.
Development
git clone https://github.com/syd-ppt/pramana && cd pramana
uv pip install -e ".[dev]"
pytest tests/
Backend: pramana-api · Dashboard: pramana.pages.dev
Contributing
- Add test cases — append to
suites/v1.0/{tier}.jsonl - Add providers — subclass
BaseProviderinsrc/pramana/providers/ - Improve assertions — new types in
assertions.py
See CONTRIBUTING.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pramana_ai-0.1.0.tar.gz.
File metadata
- Download URL: pramana_ai-0.1.0.tar.gz
- Upload date:
- Size: 165.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81376dad3979d8c3667ae1b3560d3a3170017462ba32f7a17cea67562d9a9f31
|
|
| MD5 |
1fa05465b348bfd035432f0f58bd8fda
|
|
| BLAKE2b-256 |
11b4b9905dd16ce35dc03301599d5468f3516788ebb009589e8cc17f71ea2432
|
File details
Details for the file pramana_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pramana_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 35.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d28f46e00ca1ee5d7c2349792c0ab47a2cfaf059c74f77394f999e5b73af0018
|
|
| MD5 |
4a9143b0ba551d84fd9b285e81224a14
|
|
| BLAKE2b-256 |
32dfa5245888f1f5f2af7fa74d85462a7ed0e3da3eb37d9bac20d0cd4f3cc2a5
|