An open specification for versioning agent runtimes and keeping datasets valid.

These details have not been verified by PyPI

Project links

Project description

AgentVersion

Your agent changed. Is your saved data still valid?

When you ship a new version of an agent, everything you collected against the old one — production traces, eval datasets, SFT (supervised fine-tuning) examples — quietly drifts out of date. There's no package.json to pin an agent's contract, and no git diff to tell you what changed.

agentversion is that missing format. Three steps, one per noun:

manifest   →   diff   →   compatibility decision
(what an       (what         (what to do with the data
 agent          changed,      you already collected:
 version is)    per surface)   keep / repair / replay / drop)

A surface is one independently-versioned part of the agent — its prompts, its tools, its model, its graph, its output format — each hashed on its own, so any change can be pinned to exactly one of them. A diff classifies each changed surface as breaking or non-breaking; a compatibility decision turns that into a per-data verdict.

It's a dependency-light Python package with a CLI — and an open spec any tool can implement.

See it in action

Two production manifests of the same finance-agent, v1 and v2. One command:

$ agentversion diff finance-agent-v1.json finance-agent-v2.json --compat

                                     Manifest Diff
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Surface         ┃ Change Type  ┃ Details                                             ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ environment     │ non_breaking │ deployment_id: None → 'prod-east-1'                 │
│                 │              │ region: None → 'us-east-1'                          │
│                 │              │ infra_image_hash: None →                            │
│                 │              │ 'sha256:img2img2img2img2img2img2img2img2img2img2im… │
│                 │              │ runtime_versions added=app-runtime,python           │
│                 │              │ external_service_pins changed                       │
│                 │              │ resource_limits changed                             │
│ model_runtime   │ breaking     │ provider: 'google' → 'openai'                       │
│                 │              │ runtime_version: 'app-runtime@1.5.0' →              │
│                 │              │ 'app-runtime@1.8.2'                                 │
│                 │              │ envelope changed                                    │
│ output_contract │ breaking     │ format: 'text' → 'json'                             │
│                 │              │ output schema changed                               │
│                 │              │ strict: False → True                                │
│ prompt_stack    │ non_breaking │ system_prompt hash changed                          │
│                 │              │ developer_prompt hash changed                       │
│ subagents       │ breaking     │ subagents added: ['finance_subagent',               │
│                 │              │ 'spreadsheet_subagent']                             │
│ tool_registry   │ breaking     │ search_population removed                            │
│                 │              │ get_population added                                │
│                 │              │ write_spreadsheet_cell added                        │
│                 │              │ get_market_cap modified (non-schema)                │
│ workflow        │ breaking     │ graph topology changed                              │
│                 │              │ routing_policy_version: '2' → '4'                   │
│                 │              │ graph_version: '3' → '6'                            │
│                 │              │ graph_name: 'finance-simple-graph' →                │
│                 │              │ 'finance-router-graph'                              │
└─────────────────┴──────────────┴─────────────────────────────────────────────────────┘

  Breaking: 5  Non-breaking: 2

  Recommendation: replay
  Breaking changes in model_runtime, output_contract, subagents, tool_registry,
  workflow — existing data should be replayed against the new agent version.

Between v1 and v2 the team swapped the model (Google → OpenAI), renamed a tool, added two subagents, and switched to strict JSON output. agentversion caught all five breaking surfaces and told you the old traces need a replay — not a guess, a classification you can gate CI on.

The recommendation is one of four verdicts — what to do with each piece of data you collected against the old version:

Verdict	What it means	Typical trigger
`keep`	Still valid as-is.	Only non-breaking surfaces changed.
`repair`	Salvageable with a transform — patch it, don't re-run the agent.	A recoverable output-contract change (the bundled default rules emit `repair` only for output-contract-only breaks).
`replay`	Re-run it through the new version for fresh outputs.	A breaking surface (tool, model, workflow) makes old outputs untrustworthy but the inputs still apply.
`drop`	No longer usable — discard it.	The inputs themselves no longer apply. (`drop` comes from a custom policy, not the default `diff --compat` rules.)

In the demo above, five breaking surfaces (model swap, tool rename, new subagents, strict-JSON output, new graph) make the old outputs stale — but the old inputs still apply — so the verdict is replay.

What a manifest looks like

A manifest is plain JSON. The top says which version this is; contract holds one entry per surface — exactly the rows you saw in the diff above:

{
  "agent_name": "finance-agent",
  "version_label": "2026-03-01.prod.1",
  "identity": {
    "overall_hash": "sha256:47301b25...",   // stable id for this whole version
    "hash_algorithm": "jcs-sha256"
  },
  "contract": {
    "prompt_stack":    { "system_prompt": { "version": "8", "hash": "sha256:aaa1..." }, "...": "..." },
    "model_runtime":   { "provider": "google", "model": "gemini-2.0-flash", "...": "..." },
    "tool_registry":   { "registry_version": "5", "tools": [ /* get_market_cap, search_population */ ] },
    "workflow":        { "graph_name": "finance-simple-graph", "graph_version": "3", "...": "..." },
    "subagents":       [],
    "output_contract": { "format": "text", "strict": false, "...": "..." },
    "guardrails":      { "bundle_version": "3", "...": "..." },
    "context_config":  { "retrieval_config_version": "5", "...": "..." }
  }
}

Each surface is hashed on its own, so the diff can say "tool_registry changed, prompt_stack didn't" instead of just "the manifest changed."

Try it yourself — both examples/manifest/ manifests ship inside the agentversion wheel.

Why an agent needs a version contract

You probably already have observability and a trace store. None of them answer "what is this agent version, and is my old data still compatible with the new one?"

You already have	What it gives you	What it doesn't
OpenTelemetry / LangSmith / Langfuse	rich execution traces	a versioned contract for the agent that produced them
A2A / ACP agent cards	runtime discovery + I/O types	version identity or data-compatibility
OpenAI JSONL / SFT files	a training format	provenance — which agent version produced each row

Isn't this A2A? No — and they compose. A2A and ACP (the Agent-to-Agent and Agent Communication protocols) answer "how does Agent A discover and talk to Agent B?". agentversion answers "what changed in this agent, and what does that mean for my data?". An A2A Agent Card can carry an agentversion manifest hash so you know both at once.

Install

pip install agentversion

Apache-2.0, no config — just needs Python 3.10+.

There are two version numbers, deliberately different:

the wire spec is frozen at v1.0 (stable format + conformance suite — safe to build against);
this Python package is 0.1.0 — pre-1.0, so its API may still shift.

(The spec-v1.0 and PyPI badges above show each one.)

Quickstart

First five minutes: init → hash → validate → diff → gate in CI.

1. Scaffold a manifest for your agent (interactive):

agentversion init

2. Get its stable id and check it's valid:

agentversion hash manifest.json       # a content hash that ignores key order and
                                      # whitespace, so the same agent always hashes the
                                      # same id (JCS-SHA256 = JSON Canonicalization
                                      # Scheme + SHA-256)
agentversion validate manifest.json   # check it against the spec

3. Diff two versions — runnable right now against the bundled examples (--compat adds the keep/repair/replay/drop recommendation; --json for machine output):

agentversion diff examples/manifest/finance-agent-v1.json \
                  examples/manifest/finance-agent-v2.json --compat

4. Gate breaking changes in CI — --fail-on-breaking exits non-zero when any surface is breaking:

# .github/workflows/agent.yml
- name: Block breaking agent changes
  run: agentversion diff baseline-manifest.json current-manifest.json --fail-on-breaking

Use it from Python — every line below is exercised by the test suite:

import json
from agentversion import AgentManifest, validate_manifest_file, hash_manifest
from agentversion.diff import diff_manifests
from agentversion.compatibility import classify_compatibility

old = json.load(open("finance-agent-v1.json"))
new = json.load(open("finance-agent-v2.json"))

# Validate + identify a version
assert validate_manifest_file("finance-agent-v2.json").valid
m = AgentManifest.model_validate(new)
print(m.agent_name, m.identity.overall_hash)   # finance-agent  sha256:767ebff1...

# Diff, then ask what to do with old data
result = diff_manifests(old, new)
print(result.summary.breaking_surfaces)                       # 5
print(classify_compatibility(result).recommended_decision)   # replay

What's in the box

A typed reference implementation: Pydantic models, canonical hashing, the diff/compatibility algorithms, and a CLI.

CLI

Command	What it does
`agentversion diff A B`	Classify changes by surface (`--json`, `--compat`, `--fail-on-breaking`)
`agentversion validate M`	Validate a manifest against the spec
`agentversion hash M`	Compute the canonical JCS-SHA256 hash
`agentversion init`	Scaffold a new manifest interactively
`agentversion upgrade M --to X`	Bump a manifest to a newer spec version
`agentversion {decision,replay,dataset} validate`	Validate the other spec objects

Library — top-level agentversion exports AgentManifest, validate_manifest / validate_manifest_file, hash_manifest / hash_surface, and SPEC_VERSION. The algorithms live in agentversion.diff and agentversion.compatibility; the other spec models live in agentversion.dataset, agentversion.replay, and agentversion.decision.

The manifest is organized as a contract surface per component — prompt_stack, model_runtime, tool_registry, skill_registry, workflow, subagents, output_contract, guardrails, context_config, environment — each independently hashed so the diff is surface-level and precise.

Use it anywhere — no platform required

The protocol is fully useful standalone:

Track versions locally — init to scaffold, hash for a stable id, diff between any two. No account, fully offline.
Gate CI/CD — diff --fail-on-breaking stops a breaking agent change from reaching production.
Annotate traces — stamp identity.overall_hash onto your OpenTelemetry spans as agentversion.manifest_hash for version-scoped filtering. See examples/integrations/otel_mapping.md, bundled in the package.
Classify data compatibility — diff --compat (or decision generate) gives a per-episode keep / repair / replay / drop verdict you can act on.

It interoperates with LangSmith, Langfuse, Phoenix, and W&B — annotate their traces/datasets with a manifest hash, or read/write compatibility decisions alongside your eval pipeline.

The spec & conformance

agentversion is an open spec so any tool, in any language, can produce interoperable manifests and diffs:

spec/manifest.md — the agent manifest
spec/diff.md — surface diffs, breaking vs non-breaking
spec/compatibility-decision.md — keep / repair / replay / drop
spec/replay.md · spec/dataset.md — replay jobs and dataset objects with provenance
spec/reference.md — full schemas and validation rules · schemas/ — JSON Schemas

The full spec and JSON Schemas ship inside the agentversion wheel. CONFORMANCE.md + compatibility-tests/ are golden in/out pairs that any implementation must reproduce to claim conformance.

Pairs with skillevaluation

A manifest can carry the eval results that gated its release in evaluation.gates[]:

{
  "evaluation": {
    "gates": [
      { "name": "regression-suite", "threshold": 0.95, "actual_score": 0.972,
        "passed": true, "ran_at": "2026-03-05T14:00:00Z" }
    ]
  }
}

Those scores come from skillevaluation, the sibling open spec for A/B benchmarking skills. agentversion records what an agent version is; skillevaluation measures whether it's better.

The decimalai Python SDK builds on agentversion to add framework adapters (capture a manifest straight from your LangGraph/CrewAI app), trace capture, and managed replay — but you never need it to use the spec.

From the DecimalAI SDK

If you use the decimalai SDK you don't hand-write manifests — it captures one straight from your running agent, and export_manifest hands it to the OSS tooling here:

import decimalai
from decimalai.schema.manifest import extract_from_config
from agentversion.diff import diff_manifests
from agentversion.compatibility import classify_compatibility

# Capture a manifest from your agent's config (or a framework adapter)…
snap = extract_from_config(
    agent_name="support-agent",
    prompts={"system": "You are a helpful support assistant."},
    models={"default": {"provider": "openai", "model": "gpt-4o"}},
)
manifest = decimalai.export_manifest(snap)        # → an agentversion manifest dict

# …then this package takes over: diff vs your last prod manifest, gate in CI.
diff = diff_manifests(last_prod_manifest, manifest)
print(classify_compatibility(diff).recommended_decision)

This is the seam that makes agentversion the open core of the paid platform: the manifest the SDK captures is the format agentversion diff consumes, so you can reproduce the platform's diffs and verdicts entirely outside DecimalAI. A runnable version is in examples/integrations/decimalai_bridge.py.

Project

The spec is frozen at v1.0; the package is pre-1.0 (see Install). Design decisions are logged in adrs/, releases in CHANGELOG.md. Contributions — especially new conformance cases — are genuinely welcome; see CONTRIBUTING.md:

pip install agentversion
agentversion --help
# run the conformance + unit suite from a clone:
git clone https://github.com/decimal-labs/agentversion && cd agentversion
pip install -e ".[dev]" && pytest

Licensed under Apache 2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

Jun 24, 2026

0.2.0

Jun 24, 2026

0.1.0

May 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentversion-0.2.1.tar.gz (156.9 kB view details)

Uploaded Jun 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentversion-0.2.1-py3-none-any.whl (143.3 kB view details)

Uploaded Jun 24, 2026 Python 3

File details

Details for the file agentversion-0.2.1.tar.gz.

File metadata

Download URL: agentversion-0.2.1.tar.gz
Upload date: Jun 24, 2026
Size: 156.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for agentversion-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`82465223db8adb5a7b0ba580ba9a7ce40adb228b468ef9548a3ab2fc18554856`
MD5	`63a1f9cb3aabc452464eecdd89076aa1`
BLAKE2b-256	`4159e4f24305eac212b735e06b385b69c9232684c996391807e5416881fe940d`

See more details on using hashes here.

File details

Details for the file agentversion-0.2.1-py3-none-any.whl.

File metadata

Download URL: agentversion-0.2.1-py3-none-any.whl
Upload date: Jun 24, 2026
Size: 143.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for agentversion-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6c620b64ccfc81d26faad8103ebb594fa719184a4fb5b8efc643181ae580531e`
MD5	`f31b5ce772ecc3847055ef2a4860efcc`
BLAKE2b-256	`249c87ec53fe238372a094b93cd622b59ccf8b406883b1fe010e9a0b622ed3b2`

See more details on using hashes here.

agentversion 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentVersion

See it in action

What a manifest looks like

Why an agent needs a version contract

Install

Quickstart

What's in the box

Use it anywhere — no platform required

The spec & conformance

Pairs with skillevaluation

From the DecimalAI SDK

Project

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes