An open specification for versioning agent runtimes and keeping datasets valid.
Project description
AgentVersion
Your agent changed. Is your saved data still valid?
When you ship a new version of an agent, everything you collected against the old one — production traces, eval datasets, SFT (supervised fine-tuning) examples — quietly drifts out of date. There's no package.json to pin an agent's contract, and no git diff to tell you what changed.
agentversion is that missing format. Three steps, one per noun:
manifest → diff → compatibility decision
(what an (what (what to do with the data
agent changed, you already collected:
version is) per surface) keep / repair / replay / drop)
A surface is one independently-versioned part of the agent — its prompts, its tools, its model, its graph, its output format — each hashed on its own, so any change can be pinned to exactly one of them. A diff classifies each changed surface as breaking or non-breaking; a compatibility decision turns that into a per-data verdict.
It's a dependency-light Python package with a CLI — and an open spec any tool can implement.
See it in action
Two production manifests of the same finance-agent, v1 and v2. One command:
$ agentversion diff finance-agent-v1.json finance-agent-v2.json --compat
Manifest Diff
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Surface ┃ Change Type ┃ Details ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ environment │ non_breaking │ deployment_id: None → 'prod-east-1' │
│ │ │ region: None → 'us-east-1' │
│ │ │ infra_image_hash: None → │
│ │ │ 'sha256:img2img2img2img2img2img2img2img2img2img2im… │
│ │ │ runtime_versions added=app-runtime,python │
│ │ │ external_service_pins changed │
│ │ │ resource_limits changed │
│ model_runtime │ breaking │ provider: 'google' → 'openai' │
│ │ │ runtime_version: 'app-runtime@1.5.0' → │
│ │ │ 'app-runtime@1.8.2' │
│ │ │ envelope changed │
│ output_contract │ breaking │ format: 'text' → 'json' │
│ │ │ output schema changed │
│ │ │ strict: False → True │
│ prompt_stack │ non_breaking │ system_prompt hash changed │
│ │ │ developer_prompt hash changed │
│ subagents │ breaking │ subagents added: ['finance_subagent', │
│ │ │ 'spreadsheet_subagent'] │
│ tool_registry │ breaking │ search_population removed │
│ │ │ get_population added │
│ │ │ write_spreadsheet_cell added │
│ │ │ get_market_cap modified (non-schema) │
│ workflow │ breaking │ graph topology changed │
│ │ │ routing_policy_version: '2' → '4' │
│ │ │ graph_version: '3' → '6' │
│ │ │ graph_name: 'finance-simple-graph' → │
│ │ │ 'finance-router-graph' │
└─────────────────┴──────────────┴─────────────────────────────────────────────────────┘
Breaking: 5 Non-breaking: 2
Recommendation: replay
Breaking changes in model_runtime, output_contract, subagents, tool_registry,
workflow — existing data should be replayed against the new agent version.
Between v1 and v2 the team swapped the model (Google → OpenAI), renamed a tool, added two subagents, and switched to strict JSON output. agentversion caught all five breaking surfaces and told you the old traces need a replay — not a guess, a classification you can gate CI on.
The recommendation is one of four verdicts — what to do with each piece of data you collected against the old version:
| Verdict | What it means | Typical trigger |
|---|---|---|
keep |
Still valid as-is. | Only non-breaking surfaces changed. |
repair |
Salvageable with a transform — patch it, don't re-run the agent. | A recoverable output-contract change (the bundled default rules emit repair only for output-contract-only breaks). |
replay |
Re-run it through the new version for fresh outputs. | A breaking surface (tool, model, workflow) makes old outputs untrustworthy but the inputs still apply. |
drop |
No longer usable — discard it. | The inputs themselves no longer apply. (drop comes from a custom policy, not the default diff --compat rules.) |
In the demo above, five breaking surfaces (model swap, tool rename, new subagents, strict-JSON output, new graph) make the old outputs stale — but the old inputs still apply — so the verdict is replay.
What a manifest looks like
A manifest is plain JSON. The top says which version this is; contract holds one entry per surface — exactly the rows you saw in the diff above:
{
"agent_name": "finance-agent",
"version_label": "2026-03-01.prod.1",
"identity": {
"overall_hash": "sha256:47301b25...", // stable id for this whole version
"hash_algorithm": "jcs-sha256"
},
"contract": {
"prompt_stack": { "system_prompt": { "version": "8", "hash": "sha256:aaa1..." }, "...": "..." },
"model_runtime": { "provider": "google", "model": "gemini-2.0-flash", "...": "..." },
"tool_registry": { "registry_version": "5", "tools": [ /* get_market_cap, search_population */ ] },
"workflow": { "graph_name": "finance-simple-graph", "graph_version": "3", "...": "..." },
"subagents": [],
"output_contract": { "format": "text", "strict": false, "...": "..." },
"guardrails": { "bundle_version": "3", "...": "..." },
"context_config": { "retrieval_config_version": "5", "...": "..." }
}
}
Each surface is hashed on its own, so the diff can say "tool_registry changed, prompt_stack didn't" instead of just "the manifest changed."
Try it yourself — both
examples/manifest/manifests ship inside theagentversionwheel.
Why an agent needs a version contract
You probably already have observability and a trace store. None of them answer "what is this agent version, and is my old data still compatible with the new one?"
| You already have | What it gives you | What it doesn't |
|---|---|---|
| OpenTelemetry / LangSmith / Langfuse | rich execution traces | a versioned contract for the agent that produced them |
| A2A / ACP agent cards | runtime discovery + I/O types | version identity or data-compatibility |
| OpenAI JSONL / SFT files | a training format | provenance — which agent version produced each row |
Isn't this A2A? No — and they compose. A2A and ACP (the Agent-to-Agent and Agent Communication protocols) answer "how does Agent A discover and talk to Agent B?". agentversion answers "what changed in this agent, and what does that mean for my data?". An A2A Agent Card can carry an agentversion manifest hash so you know both at once.
Install
pip install agentversion
Apache-2.0, no config — just needs Python 3.10+.
There are two version numbers, deliberately different:
- the wire spec is frozen at v1.0 (stable format + conformance suite — safe to build against);
- this Python package is 0.1.0 — pre-1.0, so its API may still shift.
(The spec-v1.0 and PyPI badges above show each one.)
Quickstart
First five minutes: init → hash → validate → diff → gate in CI.
1. Scaffold a manifest for your agent (interactive):
agentversion init
2. Get its stable id and check it's valid:
agentversion hash manifest.json # a content hash that ignores key order and
# whitespace, so the same agent always hashes the
# same id (JCS-SHA256 = JSON Canonicalization
# Scheme + SHA-256)
agentversion validate manifest.json # check it against the spec
3. Diff two versions — runnable right now against the bundled examples (--compat adds the keep/repair/replay/drop recommendation; --json for machine output):
agentversion diff examples/manifest/finance-agent-v1.json \
examples/manifest/finance-agent-v2.json --compat
4. Gate breaking changes in CI — --fail-on-breaking exits non-zero when any surface is breaking:
# .github/workflows/agent.yml
- name: Block breaking agent changes
run: agentversion diff baseline-manifest.json current-manifest.json --fail-on-breaking
Use it from Python — every line below is exercised by the test suite:
import json
from agentversion import AgentManifest, validate_manifest_file, hash_manifest
from agentversion.diff import diff_manifests
from agentversion.compatibility import classify_compatibility
old = json.load(open("finance-agent-v1.json"))
new = json.load(open("finance-agent-v2.json"))
# Validate + identify a version
assert validate_manifest_file("finance-agent-v2.json").valid
m = AgentManifest.model_validate(new)
print(m.agent_name, m.identity.overall_hash) # finance-agent sha256:767ebff1...
# Diff, then ask what to do with old data
result = diff_manifests(old, new)
print(result.summary.breaking_surfaces) # 5
print(classify_compatibility(result).recommended_decision) # replay
What's in the box
A typed reference implementation: Pydantic models, canonical hashing, the diff/compatibility algorithms, and a CLI.
CLI
| Command | What it does |
|---|---|
agentversion diff A B |
Classify changes by surface (--json, --compat, --fail-on-breaking) |
agentversion validate M |
Validate a manifest against the spec |
agentversion hash M |
Compute the canonical JCS-SHA256 hash |
agentversion init |
Scaffold a new manifest interactively |
agentversion upgrade M --to X |
Bump a manifest to a newer spec version |
agentversion {decision,replay,dataset} validate |
Validate the other spec objects |
Library — top-level agentversion exports AgentManifest, validate_manifest / validate_manifest_file, hash_manifest / hash_surface, and SPEC_VERSION. The algorithms live in agentversion.diff and agentversion.compatibility; the other spec models live in agentversion.dataset, agentversion.replay, and agentversion.decision.
The manifest is organized as a contract surface per component — prompt_stack, model_runtime, tool_registry, skill_registry, workflow, subagents, output_contract, guardrails, context_config, environment — each independently hashed so the diff is surface-level and precise.
Use it anywhere — no platform required
The protocol is fully useful standalone:
- Track versions locally —
initto scaffold,hashfor a stable id,diffbetween any two. No account, fully offline. - Gate CI/CD —
diff --fail-on-breakingstops a breaking agent change from reaching production. - Annotate traces — stamp
identity.overall_hashonto your OpenTelemetry spans asagentversion.manifest_hashfor version-scoped filtering. Seeexamples/integrations/otel_mapping.md, bundled in the package. - Classify data compatibility —
diff --compat(ordecision generate) gives a per-episode keep / repair / replay / drop verdict you can act on.
It interoperates with LangSmith, Langfuse, Phoenix, and W&B — annotate their traces/datasets with a manifest hash, or read/write compatibility decisions alongside your eval pipeline.
The spec & conformance
agentversion is an open spec so any tool, in any language, can produce interoperable manifests and diffs:
spec/manifest.md— the agent manifestspec/diff.md— surface diffs, breaking vs non-breakingspec/compatibility-decision.md— keep / repair / replay / dropspec/replay.md·spec/dataset.md— replay jobs and dataset objects with provenancespec/reference.md— full schemas and validation rules ·schemas/— JSON Schemas
The full spec and JSON Schemas ship inside the agentversion wheel. CONFORMANCE.md + compatibility-tests/ are golden in/out pairs that any implementation must reproduce to claim conformance.
Pairs with skillevaluation
A manifest can carry the eval results that gated its release in evaluation.gates[]:
{
"evaluation": {
"gates": [
{ "name": "regression-suite", "threshold": 0.95, "actual_score": 0.972,
"passed": true, "ran_at": "2026-03-05T14:00:00Z" }
]
}
}
Those scores come from skillevaluation, the sibling open spec for A/B benchmarking skills. agentversion records what an agent version is; skillevaluation measures whether it's better.
The decimalai Python SDK builds on agentversion to add framework adapters (capture a manifest straight from your LangGraph/CrewAI app), trace capture, and managed replay — but you never need it to use the spec.
From the DecimalAI SDK
If you use the decimalai SDK you don't hand-write manifests — it captures one straight from your running agent, and export_manifest hands it to the OSS tooling here:
import decimalai
from decimalai.schema.manifest import extract_from_config
from agentversion.diff import diff_manifests
from agentversion.compatibility import classify_compatibility
# Capture a manifest from your agent's config (or a framework adapter)…
snap = extract_from_config(
agent_name="support-agent",
prompts={"system": "You are a helpful support assistant."},
models={"default": {"provider": "openai", "model": "gpt-4o"}},
)
manifest = decimalai.export_manifest(snap) # → an agentversion manifest dict
# …then this package takes over: diff vs your last prod manifest, gate in CI.
diff = diff_manifests(last_prod_manifest, manifest)
print(classify_compatibility(diff).recommended_decision)
This is the seam that makes agentversion the open core of the paid platform: the manifest the SDK captures is the format agentversion diff consumes, so you can reproduce the platform's diffs and verdicts entirely outside DecimalAI. A runnable version is in examples/integrations/decimalai_bridge.py.
Project
The spec is frozen at v1.0; the package is pre-1.0 (see Install). Design decisions are logged in adrs/, releases in CHANGELOG.md. Contributions — especially new conformance cases — are genuinely welcome; see CONTRIBUTING.md:
pip install agentversion
agentversion --help
# run the conformance + unit suite from a clone:
git clone https://github.com/decimal-labs/agentversion && cd agentversion
pip install -e ".[dev]" && pytest
Licensed under Apache 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentversion-0.2.1.tar.gz.
File metadata
- Download URL: agentversion-0.2.1.tar.gz
- Upload date:
- Size: 156.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82465223db8adb5a7b0ba580ba9a7ce40adb228b468ef9548a3ab2fc18554856
|
|
| MD5 |
63a1f9cb3aabc452464eecdd89076aa1
|
|
| BLAKE2b-256 |
4159e4f24305eac212b735e06b385b69c9232684c996391807e5416881fe940d
|
File details
Details for the file agentversion-0.2.1-py3-none-any.whl.
File metadata
- Download URL: agentversion-0.2.1-py3-none-any.whl
- Upload date:
- Size: 143.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c620b64ccfc81d26faad8103ebb594fa719184a4fb5b8efc643181ae580531e
|
|
| MD5 |
f31b5ce772ecc3847055ef2a4860efcc
|
|
| BLAKE2b-256 |
249c87ec53fe238372a094b93cd622b59ccf8b406883b1fe010e9a0b622ed3b2
|