Skip to main content

CLI for Pruvium, the Governance-grade LLM Quality Engineering platform.

Project description

Pruvium CLI

Measure AI quality in business terms. Enforce policy. Export evidence.

The Pruvium CLI is the command-line entry point to the Pruvium cloud engine: a governance-grade AI quality, observability, and evidence layer for LLM applications.

Use it to:

  • run evaluations from your terminal
  • analyze precomputed responses
  • surface business-facing KPI signals
  • enforce policy gates in CI/CD
  • export evidence bundles and audit PDFs
  • inspect runs in the dashboard with diagnostics, traces, and sensitivity context

If your team uses LLMs and needs more than “looks good to me,” Pruvium gives you a structured way to measure, review, and prove output quality.

Get started at app.pruvium.com Create an account-> go to settings -> create a key for CLI use -> submit API key for generation (not required for analysis of responses created elsewhere)


Why Pruvium

Most prompt tools help you test outputs.

Pruvium is built to help you measure, govern, and defend them.

It combines:

  • business-facing KPIs for Helpfulness, Reliability, and Efficiency
  • technical quality metrics underneath the KPI layer
  • policy-aware verdicts with explicit decision states and advisories
  • audit-ready artifacts with evidence exports and integrity metadata
  • dashboard inspection for diagnostics, traces, and investigation
  • a thin-client architecture so proprietary evaluation logic stays in the hosted engine

That makes the CLI useful across the org:

  • Developers: run evals, compare outputs, automate checks
  • AI platform teams: investigate failures, inspect traces, monitor release quality
  • Risk / compliance / QA: export evidence bundles and audit PDFs
  • Leaders: turn “we tested it” into a measurable quality and governance story

Start with the KPI layer

Pruvium translates technical evaluation into business-friendly signals:

  • Helpfulness — did the system accomplish the task?
  • Reliability — did it clear safety, policy, and stability checks?
  • Efficiency — did it deliver value at a reasonable cost?

And for teams managing many runs:

  • Reliability Pass Rate — how often runs clear the reliability bar over time

These KPIs are designed for fast operational reading. The deeper technical metrics remain available when you need them.


The model underneath

Behind the KPI layer, Pruvium scores outputs across four core dimensions:

Metric Signal What it measures
$\alpha$ (alpha) TaskScore Did the output do the job it was asked to do?
$\beta$ (beta) Coherence Did it stay logically and semantically on track?
$\gamma$ (gamma) Order / Entropy Was the variability appropriate for the task and domain?
$\delta$ (delta) Efficiency How much useful result was delivered for the cost in tokens, time, and steps?

The KPI layer makes the system legible to operators and decision-makers. The four-metric layer preserves technical depth for engineers, evaluators, and governance teams.


What the CLI does

1) Run evaluations

Submit prompts to Pruvium and get back a scored run with a response, KPI summary, technical metrics, and policy outcome.

2) Analyze existing outputs

Pruvium can evaluate responses you already have. That means it can act as a validator and governance layer even when generation happens elsewhere.

If you provide --response or --response-file, the CLI automatically switches to analyze mode unless you explicitly set --exec.

This is especially useful for:

  • grading outputs from external LLM systems
  • auditing responses captured from production
  • evaluating structured multi-step or agent-like outputs in the analyze layer

3) Score simulated agent behavior in analyze mode

For imported multi-step or role-simulated outputs, Pruvium can score agent-like behavior without becoming your production agent runtime.

Advanced analyze-mode hints include:

  • --structure-mode single
  • --structure-mode agent
  • --structure-mode sim_agent
  • --micro-alpha for per-virtual-step micro-alpha scoring in SIM-AGENT workflows

Use this when you want to inspect:

  • handoff quality across steps
  • stability across multi-step reasoning or workflow chains
  • whether an imported structured run should pass review thresholds before broader rollout

4) Enforce policy in CI/CD

Use --strict to return exit code 1 when a run fails policy gates. This acts as a powerful CI/CD gatekeeper for your prompts.

# Example CI/CD gate
- name: Certify Prompt Quality
  run: |
    pruvium run -f prompts/onboarding.txt --strict
    # Fails the build if the new prompt violates ISO 42001 or internal thresholds

5) Export evidence

Generate:

  • JSON evidence bundles for automation and downstream systems (outputs a cryptographically hashed JSON receipt containing policy gates, model parameters, and metrics)
  • PDF audit reports for incident review, governance, and stakeholder communication

The CLI supports both inline save-on-run and standalone export.

6) Investigate in the dashboard

Move from terminal to the full run view to inspect:

  • KPI breakdowns
  • raw $\alpha$ / $\beta$ / $\gamma$ / $\delta$
  • policy decisions and failed gates
  • advisories
  • traces and operator spans
  • Sensitivity Analysis (Wind Tunnel) to see how mathematically fragile a prompt is to minor structural changes

Governance and evidence

A run can produce more than a score. Pruvium also returns:

  • policy gate results
  • explicit decision states
  • advisories with remediation-oriented guidance
  • evidence artifacts for export and review
  • integrity metadata for verification workflows

This is the difference between a testing utility and a quality/evidence layer.


Architecture

Pruvium uses a thin-client architecture.

[CLI] -> [API] -> [Pruvium evaluation engine] ->[runs, traces, evidence, audit artifacts]
   ^                                                          |
   |----------------------------------------------------------|
                      summarized results + exports

What runs locally:

  • authentication
  • file input/output
  • request submission
  • terminal rendering
  • export handling

What runs in the cloud engine:

  • scoring
  • KPI computation
  • policy evaluation
  • diagnostics
  • evidence generation
  • audit artifact generation
  • trace and sensitivity processing

Installation

pip install git+https://github.com/Pruvium/pruvium-cli.git@v0.1.8

Requires Python 3.9+.


Quickstart

Create a CLI secret key at app.pruvium.com/settings -> submit API key for generation (not required for analysis of responses created elsewhere)

Authenticate

pruvium login --key pru_sk_your_secret_key_here

Or in CI:

export PRUVIUM_API_KEY="pru_sk_..."

Run an evaluation

pruvium run "Summarize the main findings of this abstract in 5 bullets." --domain scientific

Analyze an existing output

pruvium run "Review this customer support response." \
  --response "Thanks for your patience. Your order has shipped..."

Analyze from file

pruvium run "Review this structured support workflow output." \
  --response-file workflow_output.json

Save the evidence bundle during a run

pruvium run "Extract all dates from this contract." \
  --task extract \
  --domain legal \
  --output evidence.json

Export a JSON evidence bundle

pruvium export run_a1b2c3d4 --format json --output evidence.json

Or pipe JSON to stdout:

pruvium export run_a1b2c3d4 --format json

Export a PDF audit report

pruvium export run_a1b2c3d4 --format pdf --output report.pdf

Fail CI on policy failure

pruvium run -f prompt.txt --strict --json

Common workflows

Evaluate against a reference

pruvium run "Explain quantum entanglement." \
  --reference "Quantum entanglement is a phenomenon where..."

Use a reference file

pruvium run "Compare this answer to the approved response." \
  --reference-file approved_answer.txt

Be explicit about analyze mode

pruvium run "Review this structured support workflow output." \
  --exec analyze \
  --response-file workflow_output.json

Score a SIM-AGENT-style imported workflow

pruvium run "Review this structured multi-step workflow output." \
  --exec analyze \
  --response-file workflow_output.json \
  --structure-mode sim_agent \
  --micro-alpha

Tune generation

pruvium run "Write a creative poem" --temp 0.9

Who this is for

Developers

Use Pruvium to test prompts, score outputs, automate checks, and wire evaluations into delivery workflows.

AI platform teams

Use it to centralize run evaluation, compare outputs, inspect traces, and review imported outputs from other systems.

Compliance, QA, and risk

Use evidence bundles and audit PDFs to support review, signoff, and governance processes.

Founders and executives

Use Pruvium when AI quality needs to be measurable, reviewable, and explainable to customers, partners, or internal stakeholders.


Observability

Pruvium can integrate with OpenTelemetry-style workflows so traces and metrics can sit alongside broader system telemetry.


Security and data handling

Pruvium is designed for auditability without forcing unnecessary retention. Key ideas include:

  • BYOK-oriented generation workflows
  • configurable prompt/response retention behavior
  • evidence artifacts that preserve integrity metadata even when raw text handling is minimized

Troubleshooting

Command not found after install

python -m pruvium --version
python -m pruvium login --key pru_sk_your_secret_key_here
python -m pruvium run "Test prompt"

Not logged in

pruvium login --key pru_sk_your_secret_key_here

Evidence not ready yet

If a run completes before evidence is available, retry the export command after a short delay.

PDF export requires an output file

pruvium export run_a1b2c3d4 --format pdf --output report.pdf

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pruvium-0.1.9.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pruvium-0.1.9-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file pruvium-0.1.9.tar.gz.

File metadata

  • Download URL: pruvium-0.1.9.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for pruvium-0.1.9.tar.gz
Algorithm Hash digest
SHA256 94023420133c42588ba70c338a3ad18d3de2a13bda3d71d961b82ec9ac76beaf
MD5 db9e9371732956b1bca107955795f9d2
BLAKE2b-256 741199896e73350eeaedd78f6dcb58efd046a00fae84e50f0c5e2ec94f163d4f

See more details on using hashes here.

File details

Details for the file pruvium-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: pruvium-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for pruvium-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 a2b825e938bd6ab7a9c8cd3a4886d93db4828e4c080c1252182e4fdc1e2bf457
MD5 a10f1075d79d2fe695d1f9fc27048e1e
BLAKE2b-256 a692750dbab1d4bf50d851fc1bedcf9ae1a6bebe27968686fbfb38b416263bde

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page