CLI for Pruvium, the Governance-grade LLM Quality Engineering platform.
Project description
Pruvium CLI
Measure AI quality in business terms. Enforce policy. Export evidence.
The Pruvium CLI is the command-line entry point to the Pruvium cloud engine: a governance-grade AI quality, observability, and evidence layer for LLM applications.
Use it to:
- run evaluations from your terminal
- analyze precomputed responses
- surface business-facing KPI signals
- enforce policy gates in CI/CD
- export evidence bundles and audit PDFs
- inspect runs in the dashboard with diagnostics, traces, and sensitivity context
If your team uses LLMs and needs more than “looks good to me,” Pruvium gives you a structured way to measure, review, and prove output quality.
Get started at app.pruvium.com Create an account-> go to settings -> create a key for CLI use -> submit API key for generation (not required for analysis of responses created elsewhere)
Why Pruvium
Most prompt tools help you test outputs.
Pruvium is built to help you measure, govern, and defend them.
It combines:
- business-facing KPIs for Helpfulness, Reliability, and Efficiency
- technical quality metrics underneath the KPI layer
- policy-aware verdicts with explicit decision states and advisories
- audit-ready artifacts with evidence exports and integrity metadata
- dashboard inspection for diagnostics, traces, and investigation
- a thin-client architecture so proprietary evaluation logic stays in the hosted engine
That makes the CLI useful across the org:
- Developers: run evals, compare outputs, automate checks
- AI platform teams: investigate failures, inspect traces, monitor release quality
- Risk / compliance / QA: export evidence bundles and audit PDFs
- Leaders: turn “we tested it” into a measurable quality and governance story
Start with the KPI layer
Pruvium translates technical evaluation into business-friendly signals:
- Helpfulness — did the system accomplish the task?
- Reliability — did it clear safety, policy, and stability checks?
- Efficiency — did it deliver value at a reasonable cost?
And for teams managing many runs:
- Reliability Pass Rate — how often runs clear the reliability bar over time
These KPIs are designed for fast operational reading. The deeper technical metrics remain available when you need them.
The model underneath
Behind the KPI layer, Pruvium scores outputs across four core dimensions:
| Metric | Signal | What it measures |
|---|---|---|
| $\alpha$ (alpha) | TaskScore | Did the output do the job it was asked to do? |
| $\beta$ (beta) | Coherence | Did it stay logically and semantically on track? |
| $\gamma$ (gamma) | Order / Entropy | Was the variability appropriate for the task and domain? |
| $\delta$ (delta) | Efficiency | How much useful result was delivered for the cost in tokens, time, and steps? |
The KPI layer makes the system legible to operators and decision-makers. The four-metric layer preserves technical depth for engineers, evaluators, and governance teams.
What the CLI does
1) Run evaluations
Submit prompts to Pruvium and get back a scored run with a response, KPI summary, technical metrics, and policy outcome.
2) Analyze existing outputs
Pruvium can evaluate responses you already have. That means it can act as a validator and governance layer even when generation happens elsewhere.
If you provide --response or --response-file, the CLI automatically switches to analyze mode unless you explicitly set --exec.
This is especially useful for:
- grading outputs from external LLM systems
- auditing responses captured from production
- evaluating structured multi-step or agent-like outputs in the analyze layer
3) Score simulated agent behavior in analyze mode
For imported multi-step or role-simulated outputs, Pruvium can score agent-like behavior without becoming your production agent runtime.
Advanced analyze-mode hints include:
--structure-mode single--structure-mode agent--structure-mode sim_agent--micro-alphafor per-virtual-step micro-alpha scoring in SIM-AGENT workflows
Use this when you want to inspect:
- handoff quality across steps
- stability across multi-step reasoning or workflow chains
- whether an imported structured run should pass review thresholds before broader rollout
4) Enforce policy in CI/CD
Use --strict to return exit code 1 when a run fails policy gates. This acts as a powerful CI/CD gatekeeper for your prompts.
# Example CI/CD gate
- name: Certify Prompt Quality
run: |
pruvium run -f prompts/onboarding.txt --strict
# Fails the build if the new prompt violates ISO 42001 or internal thresholds
5) Export evidence
Generate:
- JSON evidence bundles for automation and downstream systems (outputs a cryptographically hashed JSON receipt containing policy gates, model parameters, and metrics)
- PDF audit reports for incident review, governance, and stakeholder communication
The CLI supports both inline save-on-run and standalone export.
6) Investigate in the dashboard
Move from terminal to the full run view to inspect:
- KPI breakdowns
- raw $\alpha$ / $\beta$ / $\gamma$ / $\delta$
- policy decisions and failed gates
- advisories
- traces and operator spans
- Sensitivity Analysis (Wind Tunnel) to see how mathematically fragile a prompt is to minor structural changes
Governance and evidence
A run can produce more than a score. Pruvium also returns:
- policy gate results
- explicit decision states
- advisories with remediation-oriented guidance
- evidence artifacts for export and review
- integrity metadata for verification workflows
This is the difference between a testing utility and a quality/evidence layer.
Architecture
Pruvium uses a thin-client architecture.
[CLI] -> [API] -> [Pruvium evaluation engine] ->[runs, traces, evidence, audit artifacts]
^ |
|----------------------------------------------------------|
summarized results + exports
What runs locally:
- authentication
- file input/output
- request submission
- terminal rendering
- export handling
What runs in the cloud engine:
- scoring
- KPI computation
- policy evaluation
- diagnostics
- evidence generation
- audit artifact generation
- trace and sensitivity processing
Installation
pip install git+https://github.com/Pruvium/pruvium-cli.git@v0.1.8
Requires Python 3.9+.
Quickstart
Create a CLI secret key at app.pruvium.com/settings -> submit API key for generation (not required for analysis of responses created elsewhere)
Authenticate
pruvium login --key pru_sk_your_secret_key_here
Or in CI:
export PRUVIUM_API_KEY="pru_sk_..."
Run an evaluation
pruvium run "Summarize the main findings of this abstract in 5 bullets." --domain scientific
Analyze an existing output
pruvium run "Review this customer support response." \
--response "Thanks for your patience. Your order has shipped..."
Analyze from file
pruvium run "Review this structured support workflow output." \
--response-file workflow_output.json
Save the evidence bundle during a run
pruvium run "Extract all dates from this contract." \
--task extract \
--domain legal \
--output evidence.json
Export a JSON evidence bundle
pruvium export run_a1b2c3d4 --format json --output evidence.json
Or pipe JSON to stdout:
pruvium export run_a1b2c3d4 --format json
Export a PDF audit report
pruvium export run_a1b2c3d4 --format pdf --output report.pdf
Fail CI on policy failure
pruvium run -f prompt.txt --strict --json
Common workflows
Evaluate against a reference
pruvium run "Explain quantum entanglement." \
--reference "Quantum entanglement is a phenomenon where..."
Use a reference file
pruvium run "Compare this answer to the approved response." \
--reference-file approved_answer.txt
Be explicit about analyze mode
pruvium run "Review this structured support workflow output." \
--exec analyze \
--response-file workflow_output.json
Score a SIM-AGENT-style imported workflow
pruvium run "Review this structured multi-step workflow output." \
--exec analyze \
--response-file workflow_output.json \
--structure-mode sim_agent \
--micro-alpha
Tune generation
pruvium run "Write a creative poem" --temp 0.9
Who this is for
Developers
Use Pruvium to test prompts, score outputs, automate checks, and wire evaluations into delivery workflows.
AI platform teams
Use it to centralize run evaluation, compare outputs, inspect traces, and review imported outputs from other systems.
Compliance, QA, and risk
Use evidence bundles and audit PDFs to support review, signoff, and governance processes.
Founders and executives
Use Pruvium when AI quality needs to be measurable, reviewable, and explainable to customers, partners, or internal stakeholders.
Observability
Pruvium can integrate with OpenTelemetry-style workflows so traces and metrics can sit alongside broader system telemetry.
Security and data handling
Pruvium is designed for auditability without forcing unnecessary retention. Key ideas include:
- BYOK-oriented generation workflows
- configurable prompt/response retention behavior
- evidence artifacts that preserve integrity metadata even when raw text handling is minimized
Troubleshooting
Command not found after install
python -m pruvium --version
python -m pruvium login --key pru_sk_your_secret_key_here
python -m pruvium run "Test prompt"
Not logged in
pruvium login --key pru_sk_your_secret_key_here
Evidence not ready yet
If a run completes before evidence is available, retry the export command after a short delay.
PDF export requires an output file
pruvium export run_a1b2c3d4 --format pdf --output report.pdf
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pruvium-0.1.9.tar.gz.
File metadata
- Download URL: pruvium-0.1.9.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94023420133c42588ba70c338a3ad18d3de2a13bda3d71d961b82ec9ac76beaf
|
|
| MD5 |
db9e9371732956b1bca107955795f9d2
|
|
| BLAKE2b-256 |
741199896e73350eeaedd78f6dcb58efd046a00fae84e50f0c5e2ec94f163d4f
|
File details
Details for the file pruvium-0.1.9-py3-none-any.whl.
File metadata
- Download URL: pruvium-0.1.9-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2b825e938bd6ab7a9c8cd3a4886d93db4828e4c080c1252182e4fdc1e2bf457
|
|
| MD5 |
a10f1075d79d2fe695d1f9fc27048e1e
|
|
| BLAKE2b-256 |
a692750dbab1d4bf50d851fc1bedcf9ae1a6bebe27968686fbfb38b416263bde
|