CLI for Invarum, the Governance-grade LLM Quality Engineering platform.

These details have not been verified by PyPI

Project links

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Quality Assurance

Project description

Invarum CLI

Measure AI quality in business terms. Enforce policy. Export evidence.

The Invarum CLI is the command-line entry point to the Invarum cloud engine: a governance-grade AI quality, observability, and evidence layer for LLM applications.

Use it to:

run evaluations from your terminal
analyze precomputed responses
surface business-facing KPI signals
enforce policy gates in CI/CD
export evidence bundles and audit PDFs
inspect runs in the dashboard with diagnostics, traces, and sensitivity context

If your team uses LLMs and needs more than “looks good to me,” Invarum gives you a structured way to measure, review, and prove output quality.

Get started at app.invarum.com Create an account-> go to settings -> create a key for CLI use -> submit API key for generation (not required for analysis of responses created elsewhere)

Why Invarum

Most prompt tools help you test outputs.

Invarum is built to help you measure, govern, and defend them.

It combines:

business-facing KPIs for Helpfulness, Reliability, and Efficiency
technical quality metrics underneath the KPI layer
policy-aware verdicts with explicit decision states and advisories
audit-ready artifacts with evidence exports and integrity metadata
dashboard inspection for diagnostics, traces, and investigation
a thin-client architecture so proprietary evaluation logic stays in the hosted engine

That makes the CLI useful across the org:

Developers: run evals, compare outputs, automate checks
AI platform teams: investigate failures, inspect traces, monitor release quality
Risk / compliance / QA: export evidence bundles and audit PDFs
Leaders: turn “we tested it” into a measurable quality and governance story

Start with the KPI layer

Invarum translates technical evaluation into business-friendly signals:

Helpfulness — did the system accomplish the task?
Reliability — did it clear safety, policy, and stability checks?
Efficiency — did it deliver value at a reasonable cost?

And for teams managing many runs:

Reliability Pass Rate — how often runs clear the reliability bar over time

These KPIs are designed for fast operational reading. The deeper technical metrics remain available when you need them.

The model underneath

Behind the KPI layer, Invarum scores outputs across four core dimensions:

Metric	Signal	What it measures
$\alpha$ (alpha)	TaskScore	Did the output do the job it was asked to do?
$\beta$ (beta)	Coherence	Did it stay logically and semantically on track?
$\gamma$ (gamma)	Order / Entropy	Was the variability appropriate for the task and domain?
$\delta$ (delta)	Efficiency	How much useful result was delivered for the cost in tokens, time, and steps?

The KPI layer makes the system legible to operators and decision-makers. The four-metric layer preserves technical depth for engineers, evaluators, and governance teams.

What the CLI does

1) Run evaluations

Submit prompts to Invarum and get back a scored run with a response, KPI summary, technical metrics, and policy outcome.

2) Analyze existing outputs

Invarum can evaluate responses you already have. That means it can act as a validator and governance layer even when generation happens elsewhere.

If you provide --response or --response-file, the CLI automatically switches to analyze mode unless you explicitly set --exec.

This is especially useful for:

grading outputs from external LLM systems
auditing responses captured from production
evaluating structured multi-step or agent-like outputs in the analyze layer

3) Score simulated agent behavior in analyze mode

For imported multi-step or role-simulated outputs, Invarum can score agent-like behavior without becoming your production agent runtime.

Advanced analyze-mode hints include:

--structure-mode single
--structure-mode agent
--structure-mode sim_agent
--micro-alpha for per-virtual-step micro-alpha scoring in SIM-AGENT workflows

Use this when you want to inspect:

handoff quality across steps
stability across multi-step reasoning or workflow chains
whether an imported structured run should pass review thresholds before broader rollout

4) Enforce policy in CI/CD

Use --strict to return exit code 1 when a run fails policy gates. This acts as a powerful CI/CD gatekeeper for your prompts.

# Example CI/CD gate
- name: Certify Prompt Quality
  run: |
    invarum run -f prompts/onboarding.txt --strict
    # Fails the build if the new prompt violates ISO 42001 or internal thresholds

5) Export evidence

Generate:

JSON evidence bundles for automation and downstream systems (outputs a cryptographically hashed JSON receipt containing policy gates, model parameters, and metrics)
PDF audit reports for incident review, governance, and stakeholder communication

The CLI supports both inline save-on-run and standalone export.

6) Investigate in the dashboard

Move from terminal to the full run view to inspect:

KPI breakdowns
raw $\alpha$ / $\beta$ / $\gamma$ / $\delta$
policy decisions and failed gates
advisories
traces and operator spans
Sensitivity Analysis (Wind Tunnel) to see how mathematically fragile a prompt is to minor structural changes

Governance and evidence

A run can produce more than a score. Invarum also returns:

policy gate results
explicit decision states
advisories with remediation-oriented guidance
evidence artifacts for export and review
integrity metadata for verification workflows

This is the difference between a testing utility and a quality/evidence layer.

Architecture

Invarum uses a thin-client architecture.

[CLI] -> [API] -> [Invarum evaluation engine] ->[runs, traces, evidence, audit artifacts]
   ^                                                          |
   |----------------------------------------------------------|
                      summarized results + exports

What runs locally:

authentication
file input/output
request submission
terminal rendering
export handling

What runs in the cloud engine:

scoring
KPI computation
policy evaluation
diagnostics
evidence generation
audit artifact generation
trace and sensitivity processing

Installation

pip install git+https://github.com/Invarum/invarum-cli.git@v0.1.8

Requires Python 3.9+.

Quickstart

Create a CLI secret key at app.invarum.com/settings -> submit API key for generation (not required for analysis of responses created elsewhere)

Authenticate

invarum login --key inv_sk_your_secret_key_here

Or in CI:

export INVARUM_API_KEY="inv_sk_..."

Run an evaluation

invarum run "Summarize the main findings of this abstract in 5 bullets." --domain scientific

Analyze an existing output

invarum run "Review this customer support response." \
  --response "Thanks for your patience. Your order has shipped..."

Analyze from file

invarum run "Review this structured support workflow output." \
  --response-file workflow_output.json

Save the evidence bundle during a run

invarum run "Extract all dates from this contract." \
  --task extract \
  --domain legal \
  --output evidence.json

Export a JSON evidence bundle

invarum export run_a1b2c3d4 --format json --output evidence.json

Or pipe JSON to stdout:

invarum export run_a1b2c3d4 --format json

Export a PDF audit report

invarum export run_a1b2c3d4 --format pdf --output report.pdf

Fail CI on policy failure

invarum run -f prompt.txt --strict --json

Common workflows

Evaluate against a reference

invarum run "Explain quantum entanglement." \
  --reference "Quantum entanglement is a phenomenon where..."

Use a reference file

invarum run "Compare this answer to the approved response." \
  --reference-file approved_answer.txt

Be explicit about analyze mode

invarum run "Review this structured support workflow output." \
  --exec analyze \
  --response-file workflow_output.json

Score a SIM-AGENT-style imported workflow

invarum run "Review this structured multi-step workflow output." \
  --exec analyze \
  --response-file workflow_output.json \
  --structure-mode sim_agent \
  --micro-alpha

Tune generation

invarum run "Write a creative poem" --temp 0.9

Who this is for

Developers

Use Invarum to test prompts, score outputs, automate checks, and wire evaluations into delivery workflows.

AI platform teams

Use it to centralize run evaluation, compare outputs, inspect traces, and review imported outputs from other systems.

Compliance, QA, and risk

Use evidence bundles and audit PDFs to support review, signoff, and governance processes.

Founders and executives

Use Invarum when AI quality needs to be measurable, reviewable, and explainable to customers, partners, or internal stakeholders.

Observability

Invarum can integrate with OpenTelemetry-style workflows so traces and metrics can sit alongside broader system telemetry.

Security and data handling

Invarum is designed for auditability without forcing unnecessary retention. Key ideas include:

BYOK-oriented generation workflows
configurable prompt/response retention behavior
evidence artifacts that preserve integrity metadata even when raw text handling is minimized

Troubleshooting

Command not found after install

python -m invarum --version
python -m invarum login --key inv_sk_your_secret_key_here
python -m invarum run "Test prompt"

Not logged in

invarum login --key inv_sk_your_secret_key_here

Evidence not ready yet

If a run completes before evidence is available, retry the export command after a short delay.

PDF export requires an output file

invarum export run_a1b2c3d4 --format pdf --output report.pdf

License

MIT

Project details

These details have not been verified by PyPI

Project links

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Quality Assurance

Release history Release notifications | RSS feed

This version

0.1.9

Apr 2, 2026

0.1.8

Apr 2, 2026

0.1.7

Feb 18, 2026

0.1.6

Jan 19, 2026

0.1.5

Jan 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

invarum-0.1.9.tar.gz (14.8 kB view details)

Uploaded Apr 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

invarum-0.1.9-py3-none-any.whl (11.8 kB view details)

Uploaded Apr 2, 2026 Python 3

File details

Details for the file invarum-0.1.9.tar.gz.

File metadata

Download URL: invarum-0.1.9.tar.gz
Upload date: Apr 2, 2026
Size: 14.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for invarum-0.1.9.tar.gz
Algorithm	Hash digest
SHA256	`76aa57370688a91383c59beb8161b32cf8d2b882727dedd5ccc554792183a7a1`
MD5	`b926be75d7a2876d58fbf4bdf0024b45`
BLAKE2b-256	`da5e379a9a828ecb60509fc7a93a99962f194c50735c60913aabfa1d2eaf39ef`

See more details on using hashes here.

File details

Details for the file invarum-0.1.9-py3-none-any.whl.

File metadata

Download URL: invarum-0.1.9-py3-none-any.whl
Upload date: Apr 2, 2026
Size: 11.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for invarum-0.1.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f9186a5b5d8b29302d2b0ab3b41322fe484726d0d5f6845a3a493710f4f57481`
MD5	`9844f11b669d5c942fe97be25ffa68aa`
BLAKE2b-256	`e8305de121c6215df937d283c1145e8b0164abe5251ab229859478d1eae6b878`

See more details on using hashes here.

invarum 0.1.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Invarum CLI

Why Invarum

Start with the KPI layer

The model underneath

What the CLI does

1) Run evaluations

2) Analyze existing outputs

3) Score simulated agent behavior in analyze mode

4) Enforce policy in CI/CD

5) Export evidence

6) Investigate in the dashboard

Governance and evidence

Architecture

Installation

Quickstart

Authenticate

Run an evaluation

Analyze an existing output

Analyze from file

Save the evidence bundle during a run

Export a JSON evidence bundle

Export a PDF audit report

Fail CI on policy failure

Common workflows

Evaluate against a reference

Use a reference file

Be explicit about analyze mode

Score a SIM-AGENT-style imported workflow

Tune generation

Who this is for

Developers

AI platform teams

Compliance, QA, and risk

Founders and executives

Observability

Security and data handling

Troubleshooting

Command not found after install

Not logged in

Evidence not ready yet

PDF export requires an output file

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes