Skip to main content

Edit‑agnostic robustness evaluation reports for weight edits (InvarLock framework)

Project description

InvarLock

Edit‑agnostic robustness reports for weight edits

CI PyPI Docs License: Apache-2.0 Python 3.12+

Catch silent quality regressions from quantization, pruning, and weight edits before they ship.

Quantizing, pruning, or otherwise editing a model’s weights can silently degrade quality. InvarLock compares an edited subject checkpoint against a fixed baseline with paired evaluation windows, enforces the canonical guard chain (invariantsspectralRMTvarianceinvariants), and produces a machine-readable evaluation report you can gate in CI.

Why InvarLock?

  • Quality gates for edited checkpoints: catch regressions before deployment.
  • Paired statistical evidence: primary metrics with confidence intervals.
  • Auditable evidence: deterministic pairing metadata + policy digests in evaluation.report.json.
  • CI/CD-friendly: stable exit codes, --json outputs, and portable “evidence packs”.
  • Offline-first: network is disabled by default; enable downloads per command.

Who is this for?

  • ML engineers shipping edited model checkpoints, including quantized, pruned, fine-tuned, or otherwise weight-modified variants.
  • MLOps and platform teams building CI gates, runtime-provenance verification, and reviewable evaluation artifacts.
  • Researchers validating weight-edit, compression, and model-comparison methods with reproducible paired evaluation across text and image-text workflows supported here.

How it works

┌───────────────────────┐     ┌────────────────────────────────────────────┐
│ Baseline (checkpoint) │────►│                                            │
└───────────────────────┘     │  invarlock evaluate                        │
                              │  ├─► Paired windows (deterministic)        │
┌───────────────────────┐     │  ├─► GuardChain pipeline                   │
│ Subject  (checkpoint) │────►│  │   └─► invariants → spectral → RMT → VE  │
└───────────────────────┘     │  └─► Emit: evaluation.report.json          │
                              │                                            │
                              └────────────────────────────────────────────┘
                                                     │
                                     ┌───────────────┴───────────────┐
                                     ▼                               ▼
                                 ✅ PASS                          ❌ FAIL
                                 (ship)                          (rollback)

Quick start

Colab (CPU-friendly): Open in Colab

The public front door is evaluate -> verify -> report html, but the repo now splits onboarding by user type:

  • Wheel user / reviewer: install invarlock, inspect an existing evaluation.report.json, and render HTML without cloning the repository.
  • Evaluator: install invarlock[hf] when you want evaluate to load Hugging Face models and emit a fresh evaluation bundle.
  • Repo maintainer: clone the repo and build the local runtime image when you need maintainer smokes, repo presets, or local container-image iteration.

The default evaluate path runs model-loading commands inside the runtime container and expects an OCI engine such as podman or docker. Host-side workflows can opt into --execution-mode host, but the default verification path below expects a container-backed report with sibling runtime provenance.

# Evaluator path: create a fresh bundle
pip install "invarlock[hf]"

invarlock --version

# Compare baseline vs subject (downloads require explicit network enable)
invarlock evaluate --allow-network \
  --baseline gpt2 \
  --subject  distilgpt2 \
  --adapter auto \
  --profile ci \
  --report-out reports/eval \
  --quiet

# Validate the container-backed evaluation report
test -f reports/eval/runtime.manifest.json
invarlock verify --json reports/eval/evaluation.report.json

# Render HTML for sharing
invarlock report html -i reports/eval/evaluation.report.json -o reports/eval/evaluation.html

Wheel-only review path: pip install invarlock, invarlock doctor, invarlock verify /path/to/evaluation.report.json, and invarlock report html -i /path/to/evaluation.report.json -o /path/to/evaluation.html.

Repo maintainers can build the local runtime image once with make runtime-image; InvarLock automatically prefers invarlock-runtime:local when it is present.

Artifact model:

Artifact Produced by Primary consumers
evaluation.report.json invarlock evaluate, invarlock report generate --format report invarlock verify, invarlock report html, invarlock report validate, invarlock report explain --evaluation-report, invarlock advanced runtime-verify
report.json Baseline/subject run directories under runs/... invarlock report generate, invarlock report explain --subject-report ... --baseline-report ...

Example output (abridged; counts vary by profile/config):

INVARLOCK v<version> · EVALUATE
Baseline: gpt2 -> Subject: gpt2 · Profile: dev
Status: PASS · Gates: <passed>/<total> passed
Primary metric ratio: <ratio>
Output: reports/eval/evaluation.report.json
Runtime provenance: reports/eval/runtime.manifest.json

Command Surface

  • First touch in a fresh install: invarlock --help, invarlock --version, invarlock report --help, and invarlock advanced --help.
  • Core workflow: invarlock evaluateinvarlock verifyinvarlock report html.
  • Follow-on report analysis after the core loop: invarlock report generate, invarlock report explain, and invarlock report validate.
  • Environment and release checks: invarlock doctor plus the JSON surfaces emitted by doctor --json and advanced plugins ... --json.
  • Runtime-manifest verifier: invarlock advanced runtime-verify --report <evaluation.report.json> --manifest <runtime.manifest.json>.
  • The public contract catalog exposed by those JSON surfaces includes validation_keys, console_labels, and metric_kinds.
  • Advanced workflows: invarlock advanced evidence-pack, invarlock advanced policy, invarlock advanced plugins, invarlock advanced calibrate, and invarlock advanced runtime-verify.
  • Host execution for the core evaluate path uses --execution-mode host.
  • Optional adapter/backend installs use normal Python extras such as pip install "invarlock[hf]" rather than CLI install commands.

Evidence packs (portable evidence bundles)

Evidence packs bundle reports + verification metadata into a distributable artifact.

Note: configs/ and most scripts/ remain repo resources and are not included in wheels. Installed wheels include the public contracts and the invarlock advanced evidence-pack verify verifier, so installed packages can check bundles without cloning the repository.

Installation

# Minimal CLI (no torch/transformers)
pip install invarlock

# HF workflows (torch/transformers)
pip install "invarlock[hf]"

Optional extras: invarlock[probes], invarlock[gpu], invarlock[awq,gptq]. On Python 3.13+ stacks, gptq may still require a vendor wheel or a supported older interpreter because upstream auto-gptq packaging is narrower than the core InvarLock support matrix. Full setup: https://invarlock.github.io/invarlock/0.8.0/user-guide/getting-started/.

The minimal install covers the core verification and reporting flows. Add invarlock[hf] only for model-loading evaluate runs, and use the installed wheel's evidence-pack verifier when you need to inspect a bundle without cloning the repository.

Documentation

Community

Citation

If you use InvarLock in scientific work, please cite it (canonical metadata is in CITATION.cff):

@software{invarlock,
  title  = {InvarLock: Edit-agnostic robustness evaluation reports for weight edits},
  author = {{InvarLock}},
  url    = {https://github.com/invarlock/invarlock},
}

Limitations

  • InvarLock evaluates an edited model relative to a baseline under a specific configuration; results are not “global” guarantees.
  • Not a content-safety/alignment tool.
  • Native Windows is not supported (use WSL2 or Linux).

Support matrix

Platform Status Notes
Python 3.12+ ✅ Required
Linux ✅ Full Primary dev target
macOS (Intel/M-series) ✅ Full MPS supported (default on Apple Silicon)
Windows ❌ Not supported Use WSL2 or a Linux container if required
CUDA ✅ Recommended For larger models
CPU ✅ Fallback Slower but functional

Project status

InvarLock is pre‑1.0. Until 1.0, minor releases may include breaking changes. See CHANGELOG.md.

For guidance on where to ask questions, how to report bugs, and what to expect in terms of response times, see SUPPORT.md.

Contributing

License

Apache-2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

invarlock-0.8.0.tar.gz (616.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

invarlock-0.8.0-py3-none-any.whl (775.4 kB view details)

Uploaded Python 3

File details

Details for the file invarlock-0.8.0.tar.gz.

File metadata

  • Download URL: invarlock-0.8.0.tar.gz
  • Upload date:
  • Size: 616.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for invarlock-0.8.0.tar.gz
Algorithm Hash digest
SHA256 7d6b09ea0d15059cfa3503e3489e90e9bb258424a95a89cee373e89f6d21ce1a
MD5 5bcfa3652e085b8ff3c5a190a0e5515c
BLAKE2b-256 42a8a61bef95b706f39c2ea4b91a9579c7f0ae2e4ce47e69cb5c7cc51d148f51

See more details on using hashes here.

Provenance

The following attestation bundles were made for invarlock-0.8.0.tar.gz:

Publisher: release.yml on invarlock/invarlock

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file invarlock-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: invarlock-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 775.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for invarlock-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 321bba29b86d7a65cb312f47d760a14609ebbb9ae39a403e1830e1a0e12ad628
MD5 c6a396bddea8ddec594bdcee694da77b
BLAKE2b-256 5e9fb6c94d4569230ac71f2d2ec6a496502e987c0e0bc25e667f2279c42fbb81

See more details on using hashes here.

Provenance

The following attestation bundles were made for invarlock-0.8.0-py3-none-any.whl:

Publisher: release.yml on invarlock/invarlock

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page