Skip to main content

CLI for Invarum, the Governance-grade LLM Quality Engineering platform.

Project description

Invarum CLI

Measure AI quality in business terms. Enforce policy. Export evidence.

The Invarum CLI is the command-line entry point to the Invarum cloud engine: a governance-grade AI quality, observability, and evidence layer for LLM applications.

Use it to:

  • run evaluations from your terminal
  • analyze precomputed responses
  • surface business-facing KPI signals
  • enforce policy gates in CI/CD
  • export evidence bundles and audit PDFs
  • inspect runs in the dashboard with diagnostics, traces, and sensitivity context

If your team uses LLMs and needs more than “looks good to me,” Invarum gives you a structured way to measure, review, and prove output quality.

Get started at app.invarum.com Create an account-> go to settings -> create a key for CLI use -> submit API key for generation (not required for analysis of responses created elsewhere)


Why Invarum

Most prompt tools help you test outputs.

Invarum is built to help you measure, govern, and defend them.

It combines:

  • business-facing KPIs for Helpfulness, Reliability, and Efficiency
  • technical quality metrics underneath the KPI layer
  • policy-aware verdicts with explicit decision states and advisories
  • audit-ready artifacts with evidence exports and integrity metadata
  • dashboard inspection for diagnostics, traces, and investigation
  • a thin-client architecture so proprietary evaluation logic stays in the hosted engine

That makes the CLI useful across the org:

  • Developers: run evals, compare outputs, automate checks
  • AI platform teams: investigate failures, inspect traces, monitor release quality
  • Risk / compliance / QA: export evidence bundles and audit PDFs
  • Leaders: turn “we tested it” into a measurable quality and governance story

Start with the KPI layer

Invarum translates technical evaluation into business-friendly signals:

  • Helpfulness — did the system accomplish the task?
  • Reliability — did it clear safety, policy, and stability checks?
  • Efficiency — did it deliver value at a reasonable cost?

And for teams managing many runs:

  • Reliability Pass Rate — how often runs clear the reliability bar over time

These KPIs are designed for fast operational reading. The deeper technical metrics remain available when you need them.


The model underneath

Behind the KPI layer, Invarum scores outputs across four core dimensions:

Metric Signal What it measures
$\alpha$ (alpha) TaskScore Did the output do the job it was asked to do?
$\beta$ (beta) Coherence Did it stay logically and semantically on track?
$\gamma$ (gamma) Order / Entropy Was the variability appropriate for the task and domain?
$\delta$ (delta) Efficiency How much useful result was delivered for the cost in tokens, time, and steps?

The KPI layer makes the system legible to operators and decision-makers. The four-metric layer preserves technical depth for engineers, evaluators, and governance teams.


What the CLI does

1) Run evaluations

Submit prompts to Invarum and get back a scored run with a response, KPI summary, technical metrics, and policy outcome.

2) Analyze existing outputs

Invarum can evaluate responses you already have. That means it can act as a validator and governance layer even when generation happens elsewhere.

If you provide --response or --response-file, the CLI automatically switches to analyze mode unless you explicitly set --exec.

This is especially useful for:

  • grading outputs from external LLM systems
  • auditing responses captured from production
  • evaluating structured multi-step or agent-like outputs in the analyze layer

3) Score simulated agent behavior in analyze mode

For imported multi-step or role-simulated outputs, Invarum can score agent-like behavior without becoming your production agent runtime.

Advanced analyze-mode hints include:

  • --structure-mode single
  • --structure-mode agent
  • --structure-mode sim_agent
  • --micro-alpha for per-virtual-step micro-alpha scoring in SIM-AGENT workflows

Use this when you want to inspect:

  • handoff quality across steps
  • stability across multi-step reasoning or workflow chains
  • whether an imported structured run should pass review thresholds before broader rollout

4) Enforce policy in CI/CD

Use --strict to return exit code 1 when a run fails policy gates. This acts as a powerful CI/CD gatekeeper for your prompts.

# Example CI/CD gate
- name: Certify Prompt Quality
  run: |
    invarum run -f prompts/onboarding.txt --strict
    # Fails the build if the new prompt violates ISO 42001 or internal thresholds

5) Export evidence

Generate:

  • JSON evidence bundles for automation and downstream systems (outputs a cryptographically hashed JSON receipt containing policy gates, model parameters, and metrics)
  • PDF audit reports for incident review, governance, and stakeholder communication

The CLI supports both inline save-on-run and standalone export.

6) Investigate in the dashboard

Move from terminal to the full run view to inspect:

  • KPI breakdowns
  • raw $\alpha$ / $\beta$ / $\gamma$ / $\delta$
  • policy decisions and failed gates
  • advisories
  • traces and operator spans
  • Sensitivity Analysis (Wind Tunnel) to see how mathematically fragile a prompt is to minor structural changes

Governance and evidence

A run can produce more than a score. Invarum also returns:

  • policy gate results
  • explicit decision states
  • advisories with remediation-oriented guidance
  • evidence artifacts for export and review
  • integrity metadata for verification workflows

This is the difference between a testing utility and a quality/evidence layer.


Architecture

Invarum uses a thin-client architecture.

[CLI] -> [API] -> [Invarum evaluation engine] ->[runs, traces, evidence, audit artifacts]
   ^                                                          |
   |----------------------------------------------------------|
                      summarized results + exports

What runs locally:

  • authentication
  • file input/output
  • request submission
  • terminal rendering
  • export handling

What runs in the cloud engine:

  • scoring
  • KPI computation
  • policy evaluation
  • diagnostics
  • evidence generation
  • audit artifact generation
  • trace and sensitivity processing

Installation

pip install git+https://github.com/Invarum/invarum-cli.git@v0.1.8

Requires Python 3.9+.


Quickstart

Create a CLI secret key at app.invarum.com/settings -> submit API key for generation (not required for analysis of responses created elsewhere)

Authenticate

invarum login --key inv_sk_your_secret_key_here

Or in CI:

export INVARUM_API_KEY="inv_sk_..."

Run an evaluation

invarum run "Summarize the main findings of this abstract in 5 bullets." --domain scientific

Analyze an existing output

invarum run "Review this customer support response." \
  --response "Thanks for your patience. Your order has shipped..."

Analyze from file

invarum run "Review this structured support workflow output." \
  --response-file workflow_output.json

Save the evidence bundle during a run

invarum run "Extract all dates from this contract." \
  --task extract \
  --domain legal \
  --output evidence.json

Export a JSON evidence bundle

invarum export run_a1b2c3d4 --format json --output evidence.json

Or pipe JSON to stdout:

invarum export run_a1b2c3d4 --format json

Export a PDF audit report

invarum export run_a1b2c3d4 --format pdf --output report.pdf

Fail CI on policy failure

invarum run -f prompt.txt --strict --json

Common workflows

Evaluate against a reference

invarum run "Explain quantum entanglement." \
  --reference "Quantum entanglement is a phenomenon where..."

Use a reference file

invarum run "Compare this answer to the approved response." \
  --reference-file approved_answer.txt

Be explicit about analyze mode

invarum run "Review this structured support workflow output." \
  --exec analyze \
  --response-file workflow_output.json

Score a SIM-AGENT-style imported workflow

invarum run "Review this structured multi-step workflow output." \
  --exec analyze \
  --response-file workflow_output.json \
  --structure-mode sim_agent \
  --micro-alpha

Tune generation

invarum run "Write a creative poem" --temp 0.9

Who this is for

Developers

Use Invarum to test prompts, score outputs, automate checks, and wire evaluations into delivery workflows.

AI platform teams

Use it to centralize run evaluation, compare outputs, inspect traces, and review imported outputs from other systems.

Compliance, QA, and risk

Use evidence bundles and audit PDFs to support review, signoff, and governance processes.

Founders and executives

Use Invarum when AI quality needs to be measurable, reviewable, and explainable to customers, partners, or internal stakeholders.


Observability

Invarum can integrate with OpenTelemetry-style workflows so traces and metrics can sit alongside broader system telemetry.


Security and data handling

Invarum is designed for auditability without forcing unnecessary retention. Key ideas include:

  • BYOK-oriented generation workflows
  • configurable prompt/response retention behavior
  • evidence artifacts that preserve integrity metadata even when raw text handling is minimized

Troubleshooting

Command not found after install

python -m invarum --version
python -m invarum login --key inv_sk_your_secret_key_here
python -m invarum run "Test prompt"

Not logged in

invarum login --key inv_sk_your_secret_key_here

Evidence not ready yet

If a run completes before evidence is available, retry the export command after a short delay.

PDF export requires an output file

invarum export run_a1b2c3d4 --format pdf --output report.pdf

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

invarum-0.1.9.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

invarum-0.1.9-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file invarum-0.1.9.tar.gz.

File metadata

  • Download URL: invarum-0.1.9.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for invarum-0.1.9.tar.gz
Algorithm Hash digest
SHA256 76aa57370688a91383c59beb8161b32cf8d2b882727dedd5ccc554792183a7a1
MD5 b926be75d7a2876d58fbf4bdf0024b45
BLAKE2b-256 da5e379a9a828ecb60509fc7a93a99962f194c50735c60913aabfa1d2eaf39ef

See more details on using hashes here.

File details

Details for the file invarum-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: invarum-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for invarum-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f9186a5b5d8b29302d2b0ab3b41322fe484726d0d5f6845a3a493710f4f57481
MD5 9844f11b669d5c942fe97be25ffa68aa
BLAKE2b-256 e8305de121c6215df937d283c1145e8b0164abe5251ab229859478d1eae6b878

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page