Skip to main content

ar.io MLflow plugin — verifiable provenance for the ML lifecycle

Project description

ar-io-mlflow

Verifiable provenance for the MLflow lifecycle — training, registration, promotion, inference. Signed cryptographic proofs are anchored to ar.io, so an auditor can verify a model or decision long after your MLflow server is gone.

Status. Alpha. The cryptography, packaging, and verification flow are stable; default behaviors prioritize frictionless evaluation over production hardening. See docs/plugin-production.md for deployment guidance and CHANGELOG.md for what's shipped.

Install

# From source — PyPI publish is on the roadmap
git clone https://github.com/ar-io/ar-io-mlflow.git
cd ar-io-mlflow
pip install -e .

Python 3.10+. Pulls in MLflow, PyNaCl, the ar.io Turbo SDK, and cryptography.

MLflow version compatibility

Tested against MLflow 2.14 through 3.x. The plugin's prediction-side verify_source_of_truth reads trace tags directly via the lighter _tracing_client.get_trace_info API, sidestepping MLflow 3.x's stricter mlflow.artifactLocation requirement on client.get_trace(). Training, registration, and prediction verification all work on either major version.

Quickstart

import mlflow
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
import ario_mlflow

# Point MLflow at a tracking store. Skip if MLFLOW_TRACKING_URI is
# already set in your env, or if you're happy with the cwd's ./mlruns.
mlflow.set_tracking_uri("file:///tmp/mlruns")

X, y = load_iris(return_X_y=True)

with mlflow.start_run():
    model = LogisticRegression(max_iter=200).fit(X, y)
    mlflow.log_metric("accuracy", model.score(X, y))
    mlflow.sklearn.log_model(model, name="model")

    # Signs a proof, hashes the logged artifacts, writes ario.* tags,
    # and uploads ~500 bytes to Arweave via Turbo (free for small payloads).
    # allow_empty_dataset_inputs=True opts out of dataset anchoring; see
    # "Dataset anchoring" below for the recommended pattern.
    result = ario_mlflow.anchor(allow_empty_dataset_inputs=True)
    print(result["tags"]["ario.training_tx"])

No wallet configured? The plugin auto-generates one on first run and persists it to ~/.ario-mlflow/wallet.json so your signing address stays stable across sessions. Set ARIO_MLFLOW_ARWEAVE_WALLET=/path/to/wallet.json to use your own. The auto-generated wallet starts unfunded — that's fine for typical usage because Turbo's free tier covers small uploads (see "Wallet & cost" below).

A full runnable example lives in examples/sklearn-quickstart/.

The three integration points

1. ario_mlflow.anchor() — training provenance

Call inside an active mlflow.start_run() after logging your model. The plugin auto-resolves the logged model's artifact_path from MLflow's log-model history, so you rarely need to pass it explicitly.

Returns a dict with envelope, payload, payload_bytes, payload_hash, previous_hash, anchor_result, tags, artifact_path, artifact_status ("hashed" / "no_artifacts" / "hash_failed"), and artifact_error.

Failure modes. anchor() is synchronous and runs to completion before the with block exits.

  • Arweave upload fails (gateway down, network): the envelope is still signed locally and ario.verify_status is set to signed; ario.training_tx is absent. Your MLflow run still succeeds. Re-run later to retry. The underlying ArweaveAnchor.last_error attribute carries the cause if you pass an explicit arweave= instance you can inspect.
  • Artifact hashing fails (artifacts not yet logged, store unreachable): raises ario_mlflow.anchoring.ArtifactAccessError. Wrap the call if you want to log-and-continue.
  • Caller-supplied wallet missing or malformed (when constructing your own ArweaveAnchor(wallet_path=...) and passing it as arweave=): raises ario_mlflow.WalletLoadError from the constructor — operator intent must not be silently overridden by an auto-generated wallet under a different on-chain identity. Pass wallet_path=None (or omit the arg) to use the auto-generated default.
  • No active run: raises RuntimeError. The function requires an active mlflow.start_run() block.

2. ario_mlflow.ArioMlflowClient — registration + promotion

A drop-in replacement for mlflow.tracking.MlflowClient. Registration and stage promotions are anchored automatically in a background thread. Query the outcome via the client:

from ario_mlflow import ArioMlflowClient

client = ArioMlflowClient()
mv = client.create_model_version("credit-scorer", "runs:/<run_id>/model")

# Block until the async anchor finishes (optional):
client.wait_for_anchor("registration", "credit-scorer", mv.version, timeout=30)

status = client.anchor_status("registration", "credit-scorer", mv.version)
# {"status": "anchored", "tx_id": "...", "error": None, "done": True}

Failure modes. Registration and promotion both return their MLflow ModelVersion immediately; anchoring runs in a daemon thread.

  • The MLflow operation always succeeds independently — anchoring failures never break create_model_version() or transition_model_version_stage().
  • anchor_status() returns {"status": ...} where status is one of anchoring (in flight), anchored (Arweave upload succeeded), signed (envelope signed but Arweave upload failed), failed (anchoring crashed — see error), or unknown (no anchor was ever queued for this key).
  • wait_for_anchor() returns False on timeout. Process exit before the daemon completes is fine — the daemon is non-blocking by design.

3. ario_mlflow.VerifiedModel — inference

Wraps a registered model with an integrity check that runs before the underlying pyfunc model is loaded (so a tampered artifact never gets a chance to execute user code):

from ario_mlflow import VerifiedModel

vm = VerifiedModel("models:/credit-scorer/1")  # raises IntegrityError on hash mismatch
# Features, in order: annual_income, credit_utilization, debt_to_income_ratio,
# months_employed, credit_score.
result = vm.predict([78000, 0.18, 0.22, 72, 745])
print(result.decision_id, result.proof_status)  # "anchoring" → "anchored"

# Wait for the background anchor if you want the TX synchronously:
result.wait_for_anchor(timeout=10)
print(result.tx_id, result.anchor_error)

Failure modes.

  • Tampered model artifactVerifiedModel(model_uri) raises ario_mlflow.IntegrityError before the underlying pyfunc model is loaded, so a swapped model never gets the chance to execute user code. Catch this exception to alert your security operations rather than silently fail open.
  • predict() always returns the model's output even if anchoring later fails. Inspect result.proof_status: anchoring (in flight), anchored (Arweave upload succeeded), failed (see result.anchor_error), or disabled (no wallet / Turbo unavailable).
  • No registered model TX yet — predictions chain to the model version's ario.registration_tx. If ArioMlflowClient's registration daemon hasn't finished, the first few predictions chain to GENESIS (read once at model init; the registration TX never gets re-read on per-prediction calls — this avoids races).

Dataset anchoring

Each MLflow dataset can have its own signed Arweave proof, independent of any specific training run. Useful for:

  • Auditors who need to prove "this dataset existed at time T, signed by X" without depending on a particular model run.
  • Dataset publishers who anchor once and hand the TX to downstream model trainers.
  • Compliance (e.g. EU AI Act Article 53 GPAI training-data summaries) that expects dataset-level artifacts, not fragments inside a model proof.

Two ways to use it:

import mlflow
import ario_mlflow

ds = mlflow.data.from_pandas(df, source="s3://bucket/train_q1.parquet", name="train_q1")

# A) Implicit — auto-anchored inside training (recommended for typical use)
with mlflow.start_run():
    mlflow.log_input(ds, context="training")
    model.fit(...)
    mlflow.sklearn.log_model(model, "model")
    ario_mlflow.anchor()
    # Each logged dataset gets its own Arweave TX automatically;
    # the training proof references each by TX.

# B) Explicit — publisher pattern, no MLflow run needed
result = ario_mlflow.anchor(dataset=ds)
print(result["tx_id"])  # standalone dataset proof, hand off to downstream

The standalone-dataset envelope commits to the dataset's name, source URI, digest, and schema hash — not to its rows. Datasets stay private; the commitment is portable.

Wallet & cost

Each anchored event is a ~500-byte signed commitment (bounded 400–700 bytes by the plugin's smoke test). Turbo's free tier covers uploads under 105 KiB, so typical usage is free — the auto-generated wallet works out of the box with zero balance, and most teams never need to fund it.

You'd only need to fund the wallet if you're hitting Turbo's per-account free-tier limits or anchoring larger payloads. To top up:

  • Visit console.ar.io — credit-card or crypto top-up for the wallet address logged by the plugin on first use (wallet: <address>, mode=persistent).

For production deployments, generate a dedicated wallet (don't rely on the auto-generated one), set ARIO_MLFLOW_ARWEAVE_WALLET=/path/to/your/wallet.json, and treat the wallet like any other production secret. Source data (params, metrics, artifact bytes) always stays in MLflow — nothing else goes on chain — so costs are flat regardless of how big your training run was.

At scale. Each event is one upload, so cost grows linearly with anchor volume, not with model size. A high-throughput inference service anchoring every prediction sees one ~500-byte upload per call — well under the per-file free threshold. Account-level limits and any paid-tier rates are documented at console.ar.io and the ardrive.io Turbo docs; model your projected volume against current rates before going live. See also docs/plugin-production.md for wallet ops, monitoring, and balance alerting.

Network requirements

If your environment restricts outbound traffic, allowlist:

Host Used for
turbo-gateway.com Uploads (Turbo bundler) and proof fetches
arweave.net or other ar.io gateways Proof fetches (fallback)
turbo.ardrive.io TX bundler-status checks
Your configured ARIO_MLFLOW_ARIO_VERIFY_URL Optional ar.io Verify attestations

Override the upload/fetch host with ARIO_MLFLOW_GATEWAY_HOST if you want to route through a specific gateway operator.

Performance

What blocks vs what runs in the background:

Call Behavior
anchor() Synchronous. Hashes artifacts, signs, uploads to Turbo before returning. Typically a few seconds end-to-end; longer if artifact hashing is large.
ArioMlflowClient.create_model_version() / transition_model_version_stage() Returns immediately, anchors in a daemon thread. Use wait_for_anchor() if you need the TX before continuing.
VerifiedModel.__init__ Synchronous. Re-hashes artifacts, compares to ario.artifact_hash, raises IntegrityError on mismatch. One-time cost per model load.
VerifiedModel.predict() Returns immediately with the prediction; anchor runs in a daemon thread. No per-prediction latency added by anchoring.

For high-throughput inference, the predict path is the hot one — predictions return as soon as the model produces an output. The Arweave upload happens asynchronously and writes back to the trace tags when it completes.

Resilience

The plugin's HTTP layer is built to absorb transient ar.io gateway weather without bubbling up as user-visible failures.

  • Retries on transient failures. All upload, fetch, and ar.io Verify requests share a requests.Session with a urllib3 Retry adapter: HTTP 5xx and 429 responses are retried with exponential backoff (default: 2 retries, 0.5s/1.0s waits, Retry-After honored). 4xx responses other than 429 are not retried — they're hard failures. Tunable via max_retries and retry_backoff_factor constructor kwargs on ArweaveAnchor and ArioVerifyClient.
  • Multi-gateway fetch fallback. ArweaveAnchor.fetch_proof() walks self.gateways in order: on a transient failure for one, the next is tried automatically. Default list is ["turbo-gateway.com", "ardrive.net"]; override via the gateways= kwarg or the ARIO_MLFLOW_GATEWAYS env var. A single flaky gateway no longer surfaces as a hard "Proof Found" FAIL in any verifier UI.
  • Failure introspection via last_error. When upload_proof(), fetch_proof(), or ArioVerifyClient.submit_verification() returns None, the instance's last_error attribute carries a string describing the cause — gateway down, retries exhausted, response body unparseable, etc. Distinguish "anchor disabled" from "everything we tried failed" without parsing logs.
  • Attestation-level polling. ArioVerifyClient.poll_attestation(tx_id, target_level=2, timeout=120, interval=5) repeatedly submits the verification request until the desired attestation level is reached or the timeout expires. Returns the latest result either way (so callers can render "level 1, still propagating" status rather than nothing). Useful when you want to wait for full maturity before surfacing a Verified badge.

Environment variables

Variable Purpose Default
ARIO_MLFLOW_ARWEAVE_WALLET Path to an Arweave JWK wallet file auto-generates + persists at ~/.ario-mlflow/wallet.json
ARIO_MLFLOW_GATEWAY_HOST Primary ar.io gateway used in returned URLs turbo-gateway.com
ARIO_MLFLOW_GATEWAYS Comma-separated list of ar.io gateways tried in order on fetch failures (e.g. g1.com,g2.com) primary + ardrive.net fallback
ARIO_MLFLOW_SIGNING_KEY Base64-encoded Ed25519 seed auto-generates at ~/.ario-mlflow/keys/
ARIO_MLFLOW_ARIO_VERIFY_URL ar.io Verify REST API base URL — e.g. https://perma.online/local/verify (an ar.io operator's Verify endpoint) ar.io attestation disabled if unset

Tags the plugin writes

On the training run (anchor()):

  • ario.enabled, ario.version — via the registered RunContextProvider
  • ario.public_key, ario.verify_status, ario.artifact_hash
  • ario.payload_hash — SHA-256 of the canonical payload bytes (the same hash committed in the envelope)
  • ario.training_tx, ario.arweave_url — when the Arweave upload succeeded
  • ario.wallet_modeuser-configured / persistent / ephemeral

On the registered model (chain head, written by anchor()):

  • ario.last_training_hash — pointer to the most recent training proof for this registered model; the next training reads it to set its previous_hash

On model versions (ArioMlflowClient):

  • ario.artifact_verifiedtrue / false from re-hashing at registration
  • ario.registration_tx, ario.promotion_tx, ario.arweave_url

After running ar-io-mlflow verify … (training run or model version):

  • ario.verify_statusverified
  • ario.attestation_level1, 2, or 3 (see levels section below)
  • ario.report_url — link to the ar.io Verify dashboard for this proof
  • ario.attested_by, ario.attested_at — gateway operator and timestamp, only present when the operator has configured a signing wallet

On @mlflow.trace spans emitted by VerifiedModel.predict():

  • ario.payload_json — the full canonical payload (mirror of the ario/predictions/<id>/payload.json artifact). Read by verify_source_of_truth as the second MLflow surface for prediction check 3.
  • ario.decision_id, ario.model_name, ario.model_version
  • ario.input_hash, ario.output_hash, ario.payload_hash
  • ario.proof_status, ario.prediction_tx, ario.arweave_url
  • ario.artifact_verified (when known)

CLI

ar-io-mlflow verify run <run_id>                  # verify training proof
ar-io-mlflow verify model <name>/<version>        # verify registration proof
ar-io-mlflow verify trace <trace_id>              # verify an inference proof
ar-io-mlflow audit <name>/<version>               # full model-lineage audit

The CLI reads MLFLOW_TRACKING_URI (default ./mlruns) — export it to point at the same store you used at training time, otherwise the run lookup will fail with Run '<id>' not found. Set ARIO_MLFLOW_ARIO_VERIFY_URL to enable the optional ar.io attestation row.

All verify commands run the same three-row verify flow plus the optional ar.io attestation:

  1. Proof Found — fetch the pure-commitment envelope from ar.io for the recorded TX ID.
  2. Decision / Training / Registration Record Matches — download ario/payload.json from MLflow, re-hash, compare to the envelope's payload_hash, and re-derive canonical bytes from a separate live MLflow surface and compare to the anchored payload. This catches MLflow tampering — if either surface was modified after anchoring, the two won't agree.
    • verify run (Training Record Matches) re-fetches run.data.params/metrics/artifact_checksums.
    • verify model (Registration Record Matches) re-derives the artifact-verified state from the source run.
    • verify trace (Decision Record Matches) re-fetches the ario.payload_json trace tag (mirrored by VerifiedModel.predict at write time) and compares to the artifact.
  3. Signature Confirmed — the signature on the envelope verifies against the embedded public key.

Plus an Attested by line — independent third-party check by an ar.io gateway operator (when ARIO_MLFLOW_ARIO_VERIFY_URL is configured).

Results are written back to the MLflow tags and the HTML report is regenerated.

If an MLflow retention policy has pruned a prediction's trace, row 2 returns reason=live_refetch_incomplete rather than silently passing — the proof itself (signature + anchored bytes + ar.io) is on permanent storage and remains verifiable.

What the ar.io attestation means

ar-io-mlflow verify reports the ar.io attestation as Verified or Pending verification. A proof reads Verified once an ar.io gateway has:

  1. Found it permanently stored on Arweave.
  2. Re-downloaded the bytes and matched the SHA-256 against the gateway's own digest.
  3. Verified the signature against the original signer's public key.

For programmatic callers, ario.attestation_level exposes the same status as an integer (1, 2, or 3) — useful when you want to distinguish "still propagating" from "fully verified."

Operator attestation. When the ar.io gateway operator has configured a signing wallet, the verification result is itself signed and ario.attested_by / ario.attested_at get written back to your MLflow tags. That's an independent statement from a known operator, verifiable by any third party against their public key (standard RSA-PSS SHA-256).

These attestations cover integrity and authenticity of the anchored record. Semantic verification (whether this model produced this output on this input) is a separate problem and on the roadmap, not in v0.1.

Verifying without Python

The proof envelope spec is language-neutral: an Ed25519 signature over an RFC-8785 (JCS) canonicalized JSON object, with a SHA-256 commitment to the canonical payload bytes that live as an MLflow artifact. Any RFC-8785 + Ed25519 + SHA-256 implementation in any language can verify a proof — no ar-io-mlflow install needed.

The auditor recipe:

  1. Fetch the envelope from any ar.io gateway: GET https://<gateway>/raw/<tx_id>.
  2. Verify the signature. Strip the signature field from the envelope, JCS-canonicalize the rest (RFC 8785), then verify the original signature (hex) against the embedded public_key (hex) using Ed25519.
  3. Re-hash the canonical payload. Download ario/payload.json from the MLflow run's artifacts. Compute SHA-256 of the raw bytes. Compare to the envelope's payload_hash.
  4. Walk the chain (optional). Each envelope's previous_hash is the prior anchor's payload_hash for that event type, or "GENESIS". Fetch the predecessor by its TX (recorded in the relevant tag, e.g. ario.last_training_hash) and recurse.

JCS implementations exist for Python (jcs), JavaScript (canonicalize), Go (gowebpki/jcs), Java, Rust, and others — interoperable with the same ecosystem as Notary and Sigstore.

The Python plugin is a convenience wrapper around this recipe; the proof itself doesn't depend on the plugin's continued existence.

Tests

pytest tests/test_plugin_smoke.py tests/test_plugin_verify.py tests/test_input_anchoring.py

No network or MLflow server required.

Related docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ar_io_mlflow-0.1.0.tar.gz (105.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ar_io_mlflow-0.1.0-py3-none-any.whl (75.5 kB view details)

Uploaded Python 3

File details

Details for the file ar_io_mlflow-0.1.0.tar.gz.

File metadata

  • Download URL: ar_io_mlflow-0.1.0.tar.gz
  • Upload date:
  • Size: 105.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ar_io_mlflow-0.1.0.tar.gz
Algorithm Hash digest
SHA256 84bfae64bf25a7920b204197f3738c5451fa054169f4d567d8a1937abe8b0149
MD5 0b383be056caf209df0b029a69a018c3
BLAKE2b-256 4eb024f89cc87253015021d4c25fdf2dcf4c8c1a6a981932881c3506897fe4df

See more details on using hashes here.

Provenance

The following attestation bundles were made for ar_io_mlflow-0.1.0.tar.gz:

Publisher: publish.yml on ar-io/ar-io-mlflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ar_io_mlflow-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ar_io_mlflow-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 75.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ar_io_mlflow-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 267e8a1c0fb45a0a9c9a447e195ed9632c691a977af88661e3bc38e2aa07dd58
MD5 5ee64c84d40430c6d20b4a05dedc61af
BLAKE2b-256 f939a0d58fb1bfac96025538c28397f801eca15ce0d9e13783f37623560465d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for ar_io_mlflow-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ar-io/ar-io-mlflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page