Multi-agent text handoffs discard KV-cache and attention state. AVP transfers that state directly — 51-78% fewer tokens, 1.5-5x faster.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sstas

These details have not been verified by PyPI

Project description

Agent Vector Protocol (AVP) — KV-Cache Transfer for Multi-Agent LLMs

Multi-agent text handoffs discard KV-cache and attention state. AVP transfers that state directly — 46-78% fewer tokens, 2-4x faster, across models and families. Built on LatentMAS (2025).

pip install avp

Self-hosted models on GPUs only. AVP needs access to model internals (KV-cache, hidden states) that cloud APIs don't expose. If you use OpenAI, Anthropic, or Google APIs — AVP can't help you. Good fit: multi-agent pipelines on vLLM or HuggingFace Transformers with datacenter or same-machine connectivity.

Quick Start

from avp import HuggingFaceConnector

connector = HuggingFaceConnector.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
prompt = "Analyze this math problem: 24 * 17 + 3"

# Agent A: latent reasoning (no text output, builds KV-cache)
context = connector.think(prompt, steps=10)

# Agent B: generate with Agent A's context
answer = connector.generate(prompt, context=context)

Results

	Direct	Latent (AVP)	Text
HumanEval (Qwen 7B, n=164)	58.5%	67.1%	53.0%
GSM8K (Qwen 7B, n=200)	91.0%	90.5%	87.0%
DebugBench (Qwen 7B, n=100)	50.0%	51.0%	49.0%
GSM8K (Llama 3B, n=200)	75.0%	78.0%	75.5%

+8.6pp on code generation (p=0.029). 46-78% fewer tokens. 2-4x faster. Tested on NVIDIA A100.

Cross-model (zero training, 6 KB wire):

Source → Target	GSM8K	HumanEval
Qwen 7B → Llama 3B	74.5%	47.0%
Llama 3B → Qwen 7B	90.0%	79.3%

Full results: Benchmarks — 8 benchmarks, 5 models, 2 families.

How It Works

graph LR
    subgraph text["Text Chain (today)"]
        direction LR
        A1["Agent A<br/>generates text"] -->|"serialize to text<br/>re-tokenize everything"| B1["Agent B<br/>re-processes from scratch"]
    end

    subgraph avp["AVP Latent Transfer"]
        direction LR
        A2["Agent A<br/>generates KV-cache"] -->|"binary transfer<br/>28-130 MB"| B2["Agent B<br/>picks up where A left off"]
    end

    style text fill:#fff3f3,stroke:#d44,stroke-width:2px
    style avp fill:#f3fff3,stroke:#4a4,stroke-width:2px

AVP transfers the KV-cache (computed attention states) directly between agents. The receiving agent reads prior reasoning from attention states instead of re-computing it from text. Three modes, auto-negotiated:

Mode	When	What Happens
Latent	Same model	KV-cache transfer, zero re-processing
Cross-model	Same or different family	Vocabulary-mediated projection, zero training
JSON fallback	No compatible path	Standard text, auto-negotiated

Cross-model transfer

from avp import HuggingFaceConnector

researcher = HuggingFaceConnector.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
solver = HuggingFaceConnector.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

prompt = "Solve step by step: 24 * 17 + 3"
context = researcher.think(prompt, steps=10)
answer = solver.generate(prompt, context=context, source=researcher)

Cross-model calibration is one-time per model pair (~0.5-2s), cached to ~/.avp/maps/.

Easy API (convenience wrappers)

import avp

# One-liner: think + generate
answer = avp.generate("Solve: 24 * 17 + 3", model="Qwen/Qwen2.5-7B-Instruct")

# Cross-model
answer = avp.generate("Solve: 24 * 17 + 3",
                       model="meta-llama/Llama-3.2-3B-Instruct",
                       source_model="Qwen/Qwen2.5-7B-Instruct")

vLLM integration (experimental)

Status: Experimental. VLLMConnector works for text generation and identity extraction. The KV connector plugin (AVPKVConnectorV1Dynamic) for latent KV-cache transfer between vLLM instances has not been validated end-to-end and has known issues with PagedAttention format conversion. Use HuggingFaceConnector for production latent transfer. See CHANGELOG for details.

from avp import VLLMConnector

connector = VLLMConnector(model_id="Qwen/Qwen2.5-7B-Instruct")
answer = connector.generate("Analyze and solve: 24 * 17 + 3")

Cross-process transfer

# Process A: serialize context
wire_bytes = context.to_bytes(session_id="s1", source_agent_id="agent-a")

# Process B: restore and generate
from avp import AVPContext, HuggingFaceConnector
connector = HuggingFaceConnector.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
restored = AVPContext.from_bytes(wire_bytes, device="cuda")
answer = connector.generate(prompt, context=restored)

Works With

AVP works with your orchestration framework, not instead of it. Replace llm.invoke() with avp.generate() — your framework sees text in, text out.

Framework	Integration
LangGraph	Graph node — `avp.generate()` replaces LLM call
CrewAI	`BaseLLM.call()` override
PydanticAI	`FunctionModel` callback
LlamaIndex	`CustomLLM.complete()` override
vLLM	KVConnectorBase_V1 plugin (experimental — text generation works, latent transfer in progress)
HuggingFace	Full hidden state and KV-cache access
A2A / MCP	Complementary — AVP handles tensor transfer

See Framework Integration Guide for examples.

Roadmap

Bidirectional latent communication (A→B + B→A latent)
vLLM serving throughput benchmarks
CacheGen-style compression (3-4x KV-cache size reduction)

Documentation

AVP Specification — Binary format, handshake, transport
Benchmarks — 8 benchmarks, 5 models, 2 families
Framework Integration — LangGraph, CrewAI, cross-model examples
Examples — Quickstart and agent demos

License

Apache 2.0 — see LICENSE

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sstas

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.1

Apr 5, 2026

0.6.0

Apr 4, 2026

0.5.1

Apr 3, 2026

0.5.0

Apr 3, 2026

0.4.2

Mar 30, 2026

0.4.1

Mar 26, 2026

0.4.0

Mar 23, 2026

0.3.2

Mar 13, 2026

0.3.1

Mar 8, 2026

This version

0.3.0

Mar 7, 2026

0.2.3

Mar 2, 2026

0.2.2

Feb 28, 2026

0.2.1

Feb 25, 2026

0.2.0

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

avp-0.3.0.tar.gz (247.7 kB view details)

Uploaded Mar 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

avp-0.3.0-py3-none-any.whl (81.2 kB view details)

Uploaded Mar 7, 2026 Python 3

File details

Details for the file avp-0.3.0.tar.gz.

File metadata

Download URL: avp-0.3.0.tar.gz
Upload date: Mar 7, 2026
Size: 247.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for avp-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`d48ca0b7cb5363060f855686ab54f17b94295228e816bc3b251de795e19345f8`
MD5	`2c34b598eb3353c0f29d0bcf5025cd39`
BLAKE2b-256	`c3f854fd55942bdbb2129e82776c45e9f1530e2adc0a98cc5937f34580c4928d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for avp-0.3.0.tar.gz:

Publisher: publish.yml on VectorArc/avp-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: avp-0.3.0.tar.gz
- Subject digest: d48ca0b7cb5363060f855686ab54f17b94295228e816bc3b251de795e19345f8
- Sigstore transparency entry: 1058208351
- Sigstore integration time: Mar 7, 2026
Source repository:
- Permalink: VectorArc/avp-python@33e871013ba8abc635217b3e7d4d193e43cf7ac8
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/VectorArc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@33e871013ba8abc635217b3e7d4d193e43cf7ac8
- Trigger Event: push

File details

Details for the file avp-0.3.0-py3-none-any.whl.

File metadata

Download URL: avp-0.3.0-py3-none-any.whl
Upload date: Mar 7, 2026
Size: 81.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for avp-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e0e0516bb10d55cd5ddd4142a8f36715732d0b11ff30c898983d5dc65d0cac22`
MD5	`612210386038403a503f3f4c70f9e4c5`
BLAKE2b-256	`c8f95d21045541382fddcb9bfc35c205788e3690f71bd9b0806a17fd41544823`

See more details on using hashes here.

Provenance

The following attestation bundles were made for avp-0.3.0-py3-none-any.whl:

Publisher: publish.yml on VectorArc/avp-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: avp-0.3.0-py3-none-any.whl
- Subject digest: e0e0516bb10d55cd5ddd4142a8f36715732d0b11ff30c898983d5dc65d0cac22
- Sigstore transparency entry: 1058208536
- Sigstore integration time: Mar 7, 2026
Source repository:
- Permalink: VectorArc/avp-python@33e871013ba8abc635217b3e7d4d193e43cf7ac8
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/VectorArc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@33e871013ba8abc635217b3e7d4d193e43cf7ac8
- Trigger Event: push

avp 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Agent Vector Protocol (AVP) — KV-Cache Transfer for Multi-Agent LLMs

Quick Start

Results

How It Works

Works With

Roadmap

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance