An open specification for versioning agent runtimes and keeping datasets valid.
Project description
AgentVersion
Your agent changed. Is your saved data still valid?
agentversion turns an agent version into a diffable, hashable contract — so when prompts, tools, models, or graphs change, you know exactly what broke and which traces, eval sets, and training data survived.
When you ship a new version of an agent, everything you collected against the old one — production traces, eval datasets, SFT examples — quietly drifts out of date. There's no package.json to pin an agent's contract, and no git diff to tell you what changed. agentversion is that missing format: a JSON manifest describing an agent version, a diff that classifies every change as breaking or non-breaking, and a compatibility decision that tells you whether to keep, repair, replay, or drop your old data.
It's a dependency-light Python package with a CLI — and an open spec any tool can implement.
See it in action
Two production manifests of the same finance-agent, v1 and v2. One command:
$ agentversion diff finance-agent-v1.json finance-agent-v2.json --compat
Manifest Diff
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Surface ┃ Change Type ┃ Details ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ environment │ non_breaking │ environment added │
│ model_runtime │ breaking │ provider: 'google' → 'openai' │
│ │ │ runtime_version: 'app-runtime@1.5.0' → │
│ │ │ 'app-runtime@1.8.2' │
│ │ │ envelope changed │
│ output_contract │ breaking │ format: 'text' → 'json' │
│ │ │ strict: False → True │
│ │ │ output schema changed │
│ prompt_stack │ non_breaking │ system_prompt hash changed │
│ │ │ developer_prompt hash changed │
│ subagents │ breaking │ subagents added: ['finance_subagent', │
│ │ │ 'spreadsheet_subagent'] │
│ tool_registry │ breaking │ search_population removed │
│ │ │ get_population added │
│ │ │ write_spreadsheet_cell added │
│ │ │ get_market_cap modified (non-schema) │
│ workflow │ breaking │ graph topology changed │
│ │ │ routing_policy_version: '2' → '4' │
│ │ │ graph_version: '3' → '6' │
│ │ │ graph_name: 'finance-simple-graph' → │
│ │ │ 'finance-router-graph' │
└─────────────────┴──────────────┴─────────────────────────────────────────────────────┘
Breaking: 5 Non-breaking: 2
Recommendation: replay
Breaking changes in model_runtime, output_contract, subagents, tool_registry,
workflow — existing data should be replayed against the new agent version.
Between v1 and v2 the team swapped the model (Google → OpenAI), renamed a tool, added two subagents, and switched to strict JSON output. agentversion caught all five breaking surfaces and told you the old traces need a replay — not a guess, a classification you can gate CI on.
Try it yourself — both manifests live in
examples/manifest/.
Why an agent needs a version contract
You probably already have observability and a trace store. None of them answer "what is this agent version, and is my old data still compatible with the new one?"
| You already have | What it gives you | What it doesn't |
|---|---|---|
| OpenTelemetry / LangSmith / Langfuse | rich execution traces | a versioned contract for the agent that produced them |
| A2A / ACP agent cards | runtime discovery + I/O types | version identity or data-compatibility |
| OpenAI JSONL / SFT files | a training format | provenance — which agent version produced each row |
Isn't this A2A? No — and they compose. A2A and ACP answer "how does Agent A discover and talk to Agent B?". agentversion answers "what changed in this agent, and what does that mean for my data?". An A2A Agent Card can carry an agentversion manifest hash so you know both at once.
Install
pip install agentversion
Apache-2.0, no config — just needs Python 3.10+. It implements the frozen v1.0 spec, but the Python package itself is early: 0.1.0, pre-1.0, with the API still settling.
Quickstart
Diff two versions (table by default; add --json for machine output, --compat for a keep/repair/replay/drop recommendation):
agentversion diff old-manifest.json new-manifest.json --compat
Gate breaking changes in CI — --fail-on-breaking exits non-zero when any surface is breaking:
# .github/workflows/agent.yml
- name: Block breaking agent changes
run: agentversion diff baseline-manifest.json current-manifest.json --fail-on-breaking
Scaffold, hash, and validate a manifest:
agentversion init # interactively create a manifest
agentversion hash manifest.json # canonical JCS-SHA256 identity hash
agentversion validate manifest.json # check it against the spec
Use it from Python — every line below is exercised by the test suite:
import json
from agentversion import AgentManifest, validate_manifest_file, hash_manifest
from agentversion.diff import diff_manifests
from agentversion.compatibility import classify_compatibility
old = json.load(open("finance-agent-v1.json"))
new = json.load(open("finance-agent-v2.json"))
# Validate + identify a version
assert validate_manifest_file("finance-agent-v2.json").valid
m = AgentManifest.model_validate(new)
print(m.agent_name, m.identity.overall_hash) # finance-agent sha256:767ebff1...
# Diff, then ask what to do with old data
result = diff_manifests(old, new)
print(result.summary.breaking_surfaces) # 5
print(classify_compatibility(result).recommended_decision) # replay
What's in the box
A typed reference implementation: Pydantic models, canonical hashing, the diff/compatibility algorithms, and a CLI.
CLI
| Command | What it does |
|---|---|
agentversion diff A B |
Classify changes by surface (--json, --compat, --fail-on-breaking) |
agentversion validate M |
Validate a manifest against the spec |
agentversion hash M |
Compute the canonical JCS-SHA256 hash |
agentversion init |
Scaffold a new manifest interactively |
agentversion upgrade M --to X |
Bump a manifest to a newer spec version |
agentversion {decision,replay,dataset} validate |
Validate the other spec objects |
Library — top-level agentversion exports AgentManifest, validate_manifest / validate_manifest_file, hash_manifest / hash_surface, and SPEC_VERSION. The algorithms live in agentversion.diff and agentversion.compatibility; the other spec models live in agentversion.dataset, agentversion.replay, and agentversion.decision.
The manifest is organized as a contract surface per component — prompt_stack, model_runtime, tool_registry, skill_registry, workflow, subagents, output_contract, guardrails, context_config, environment — each independently hashed so the diff is surface-level and precise.
Use it anywhere — no platform required
The protocol is fully useful standalone:
- Track versions locally —
initto scaffold,hashfor a stable id,diffbetween any two. No account, fully offline. - Gate CI/CD —
diff --fail-on-breakingstops a breaking agent change from reaching production. - Annotate traces — stamp
identity.overall_hashonto your OpenTelemetry spans asagentversion.manifest_hashfor version-scoped filtering. Seeexamples/integrations/otel_mapping.md. - Classify data compatibility —
diff --compat(ordecision generate) gives a per-episode keep / repair / replay / drop verdict you can act on.
It interoperates with LangSmith, Langfuse, Phoenix, and W&B — annotate their traces/datasets with a manifest hash, or read/write compatibility decisions alongside your eval pipeline.
The spec & conformance
agentversion is an open spec so any tool, in any language, can produce interoperable manifests and diffs:
spec/manifest.md— the agent manifestspec/diff.md— surface diffs, breaking vs non-breakingspec/compatibility-decision.md— keep / repair / replay / dropspec/replay.md·spec/dataset.md— replay jobs and dataset objects with provenancespec/reference.md— full schemas and validation rules ·schemas/— JSON Schemas
CONFORMANCE.md + compatibility-tests/ are golden in/out pairs that any implementation must reproduce to claim conformance.
Pairs with skillevaluation
A manifest can carry the eval results that gated its release in evaluation.gates[]:
{
"evaluation": {
"gates": [
{ "name": "regression-suite", "threshold": 0.95, "actual_score": 0.972, "passed": true }
]
}
}
Those scores come from skillevaluation, the sibling open spec for A/B benchmarking skills. agentversion records what an agent version is; skillevaluation measures whether it's better.
The decimalai Python SDK builds on agentversion to add framework adapters (capture a manifest straight from your LangGraph/CrewAI app), trace capture, and managed replay — but you never need it to use the spec.
Project
The spec is stable at v1.0 — frozen wire format and conformance suite. The package is 0.1.0: pre-1.0 under semantic versioning, so the Python API may still shift before it catches up. Design decisions are logged in adrs/, releases in CHANGELOG.md. Contributions — especially new conformance cases — are genuinely welcome; see CONTRIBUTING.md:
git clone https://github.com/decimal-labs/agentversion
cd agentversion
pip install -e ".[dev]"
pytest
Licensed under Apache 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentversion-0.1.0.tar.gz.
File metadata
- Download URL: agentversion-0.1.0.tar.gz
- Upload date:
- Size: 115.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
caf81efcb4394de905b7aa8bf02b40f1dbe01a1f88bcad2b1b45beec40f45772
|
|
| MD5 |
4a4ca1edf7a3010940fe1daf195dc19b
|
|
| BLAKE2b-256 |
5d31cfbee63fb189a289c7b14baaa2ce50e7ffbc84bd348394d45e6b2a7ffdf0
|
File details
Details for the file agentversion-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentversion-0.1.0-py3-none-any.whl
- Upload date:
- Size: 46.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db0ae3b8cecf48d7bce1562bc4eb492728378fa9e56204bda867b3248010d4b4
|
|
| MD5 |
f8a9983145258768306b0533fe328f10
|
|
| BLAKE2b-256 |
d9ee53efef72d1b2886f0db70483ce776edde92c78ff7fc681076c4750b9518d
|