Skip to main content

An open specification for versioning agent runtimes and keeping datasets valid.

Project description

AgentVersion

Your agent changed. Is your saved data still valid?

agentversion turns an agent version into a diffable, hashable contract — so when prompts, tools, models, or graphs change, you know exactly what broke and which traces, eval sets, and training data survived.

CI PyPI Python Spec License

When you ship a new version of an agent, everything you collected against the old one — production traces, eval datasets, SFT examples — quietly drifts out of date. There's no package.json to pin an agent's contract, and no git diff to tell you what changed. agentversion is that missing format: a JSON manifest describing an agent version, a diff that classifies every change as breaking or non-breaking, and a compatibility decision that tells you whether to keep, repair, replay, or drop your old data.

It's a dependency-light Python package with a CLI — and an open spec any tool can implement.


See it in action

Two production manifests of the same finance-agent, v1 and v2. One command:

$ agentversion diff finance-agent-v1.json finance-agent-v2.json --compat
                                     Manifest Diff
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Surface         ┃ Change Type  ┃ Details                                             ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ environment     │ non_breaking │ environment added                                   │
│ model_runtime   │ breaking     │ provider: 'google' → 'openai'                       │
│                 │              │ runtime_version: 'app-runtime@1.5.0' →              │
│                 │              │ 'app-runtime@1.8.2'                                 │
│                 │              │ envelope changed                                    │
│ output_contract │ breaking     │ format: 'text' → 'json'                             │
│                 │              │ strict: False → True                                │
│                 │              │ output schema changed                               │
│ prompt_stack    │ non_breaking │ system_prompt hash changed                          │
│                 │              │ developer_prompt hash changed                       │
│ subagents       │ breaking     │ subagents added: ['finance_subagent',               │
│                 │              │ 'spreadsheet_subagent']                             │
│ tool_registry   │ breaking     │ search_population removed                            │
│                 │              │ get_population added                                │
│                 │              │ write_spreadsheet_cell added                        │
│                 │              │ get_market_cap modified (non-schema)                │
│ workflow        │ breaking     │ graph topology changed                              │
│                 │              │ routing_policy_version: '2' → '4'                   │
│                 │              │ graph_version: '3' → '6'                            │
│                 │              │ graph_name: 'finance-simple-graph' →                │
│                 │              │ 'finance-router-graph'                              │
└─────────────────┴──────────────┴─────────────────────────────────────────────────────┘

  Breaking: 5  Non-breaking: 2

  Recommendation: replay
  Breaking changes in model_runtime, output_contract, subagents, tool_registry,
  workflow — existing data should be replayed against the new agent version.

Between v1 and v2 the team swapped the model (Google → OpenAI), renamed a tool, added two subagents, and switched to strict JSON output. agentversion caught all five breaking surfaces and told you the old traces need a replay — not a guess, a classification you can gate CI on.

Try it yourself — both manifests live in examples/manifest/.


Why an agent needs a version contract

You probably already have observability and a trace store. None of them answer "what is this agent version, and is my old data still compatible with the new one?"

You already have What it gives you What it doesn't
OpenTelemetry / LangSmith / Langfuse rich execution traces a versioned contract for the agent that produced them
A2A / ACP agent cards runtime discovery + I/O types version identity or data-compatibility
OpenAI JSONL / SFT files a training format provenance — which agent version produced each row

Isn't this A2A? No — and they compose. A2A and ACP answer "how does Agent A discover and talk to Agent B?". agentversion answers "what changed in this agent, and what does that mean for my data?". An A2A Agent Card can carry an agentversion manifest hash so you know both at once.


Install

pip install agentversion

Apache-2.0, no config — just needs Python 3.10+. It implements the frozen v1.0 spec, but the Python package itself is early: 0.1.0, pre-1.0, with the API still settling.

Quickstart

Diff two versions (table by default; add --json for machine output, --compat for a keep/repair/replay/drop recommendation):

agentversion diff old-manifest.json new-manifest.json --compat

Gate breaking changes in CI--fail-on-breaking exits non-zero when any surface is breaking:

# .github/workflows/agent.yml
- name: Block breaking agent changes
  run: agentversion diff baseline-manifest.json current-manifest.json --fail-on-breaking

Scaffold, hash, and validate a manifest:

agentversion init                     # interactively create a manifest
agentversion hash manifest.json       # canonical JCS-SHA256 identity hash
agentversion validate manifest.json   # check it against the spec

Use it from Python — every line below is exercised by the test suite:

import json
from agentversion import AgentManifest, validate_manifest_file, hash_manifest
from agentversion.diff import diff_manifests
from agentversion.compatibility import classify_compatibility

old = json.load(open("finance-agent-v1.json"))
new = json.load(open("finance-agent-v2.json"))

# Validate + identify a version
assert validate_manifest_file("finance-agent-v2.json").valid
m = AgentManifest.model_validate(new)
print(m.agent_name, m.identity.overall_hash)   # finance-agent  sha256:767ebff1...

# Diff, then ask what to do with old data
result = diff_manifests(old, new)
print(result.summary.breaking_surfaces)                       # 5
print(classify_compatibility(result).recommended_decision)   # replay

What's in the box

A typed reference implementation: Pydantic models, canonical hashing, the diff/compatibility algorithms, and a CLI.

CLI

Command What it does
agentversion diff A B Classify changes by surface (--json, --compat, --fail-on-breaking)
agentversion validate M Validate a manifest against the spec
agentversion hash M Compute the canonical JCS-SHA256 hash
agentversion init Scaffold a new manifest interactively
agentversion upgrade M --to X Bump a manifest to a newer spec version
agentversion {decision,replay,dataset} validate Validate the other spec objects

Library — top-level agentversion exports AgentManifest, validate_manifest / validate_manifest_file, hash_manifest / hash_surface, and SPEC_VERSION. The algorithms live in agentversion.diff and agentversion.compatibility; the other spec models live in agentversion.dataset, agentversion.replay, and agentversion.decision.

The manifest is organized as a contract surface per component — prompt_stack, model_runtime, tool_registry, skill_registry, workflow, subagents, output_contract, guardrails, context_config, environment — each independently hashed so the diff is surface-level and precise.


Use it anywhere — no platform required

The protocol is fully useful standalone:

  1. Track versions locallyinit to scaffold, hash for a stable id, diff between any two. No account, fully offline.
  2. Gate CI/CDdiff --fail-on-breaking stops a breaking agent change from reaching production.
  3. Annotate traces — stamp identity.overall_hash onto your OpenTelemetry spans as agentversion.manifest_hash for version-scoped filtering. See examples/integrations/otel_mapping.md.
  4. Classify data compatibilitydiff --compat (or decision generate) gives a per-episode keep / repair / replay / drop verdict you can act on.

It interoperates with LangSmith, Langfuse, Phoenix, and W&B — annotate their traces/datasets with a manifest hash, or read/write compatibility decisions alongside your eval pipeline.


The spec & conformance

agentversion is an open spec so any tool, in any language, can produce interoperable manifests and diffs:

CONFORMANCE.md + compatibility-tests/ are golden in/out pairs that any implementation must reproduce to claim conformance.


Pairs with skillevaluation

A manifest can carry the eval results that gated its release in evaluation.gates[]:

{
  "evaluation": {
    "gates": [
      { "name": "regression-suite", "threshold": 0.95, "actual_score": 0.972, "passed": true }
    ]
  }
}

Those scores come from skillevaluation, the sibling open spec for A/B benchmarking skills. agentversion records what an agent version is; skillevaluation measures whether it's better.

The decimalai Python SDK builds on agentversion to add framework adapters (capture a manifest straight from your LangGraph/CrewAI app), trace capture, and managed replay — but you never need it to use the spec.


Project

The spec is stable at v1.0 — frozen wire format and conformance suite. The package is 0.1.0: pre-1.0 under semantic versioning, so the Python API may still shift before it catches up. Design decisions are logged in adrs/, releases in CHANGELOG.md. Contributions — especially new conformance cases — are genuinely welcome; see CONTRIBUTING.md:

git clone https://github.com/decimal-labs/agentversion
cd agentversion
pip install -e ".[dev]"
pytest

Licensed under Apache 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentversion-0.1.0.tar.gz (115.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentversion-0.1.0-py3-none-any.whl (46.7 kB view details)

Uploaded Python 3

File details

Details for the file agentversion-0.1.0.tar.gz.

File metadata

  • Download URL: agentversion-0.1.0.tar.gz
  • Upload date:
  • Size: 115.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for agentversion-0.1.0.tar.gz
Algorithm Hash digest
SHA256 caf81efcb4394de905b7aa8bf02b40f1dbe01a1f88bcad2b1b45beec40f45772
MD5 4a4ca1edf7a3010940fe1daf195dc19b
BLAKE2b-256 5d31cfbee63fb189a289c7b14baaa2ce50e7ffbc84bd348394d45e6b2a7ffdf0

See more details on using hashes here.

File details

Details for the file agentversion-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentversion-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 46.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for agentversion-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 db0ae3b8cecf48d7bce1562bc4eb492728378fa9e56204bda867b3248010d4b4
MD5 f8a9983145258768306b0533fe328f10
BLAKE2b-256 d9ee53efef72d1b2886f0db70483ce776edde92c78ff7fc681076c4750b9518d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page