Multi-agent text handoffs discard KV-cache and attention state. AVP transfers that state directly — 51-78% fewer tokens, 1.5-5x faster.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sstas

These details have not been verified by PyPI

Project description

AVP – Agents Share Thoughts, Not Text

When LLM agents hand off work as text, the next agent re-processes everything from scratch. AVP (Agent Vector Protocol) transfers the actual computation (KV-cache, hidden states, attention) so the receiving agent picks up where the sender left off. Zero tokens between agents, 2-3x faster pipelines, same or better accuracy. Built on LatentMAS, extended with cross-model vocabulary-mediated projection. Zero training, works across model families.

pip install avp[hf]

Requires self-hosted models on GPUs. AVP accesses model internals (KV-cache, hidden states) that cloud APIs don't expose. Other engines: avp[ollama], avp[llamacpp], avp[vllm] – see Works With.

Quick Start

Same model – two agents share a KV-cache:

from avp import HuggingFaceConnector

connector = HuggingFaceConnector.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

# Agent A thinks (builds KV-cache, no text output)
context = connector.think("Analyze this math problem: 24 * 17 + 3", steps=20)

# Agent B generates using Agent A's KV-cache
answer = connector.generate("Solve step by step: 24 * 17 + 3", context=context)

Cross-model – different architectures, zero training:

researcher = HuggingFaceConnector.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
solver = HuggingFaceConnector.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

context = researcher.think("Analyze this problem", steps=20)
answer = solver.generate("Solve it", context=context, source=researcher, cross_model=True)

Cross-process – serialize context over any transport:

# Process A
wire_bytes = context.to_bytes(session_id="s1", source_agent_id="agent-a")

# Process B
restored = AVPContext.from_bytes(wire_bytes, device="cuda")
answer = connector.generate(prompt, context=restored)

You don't choose the transfer mode. The handshake auto-negotiates based on model compatibility: same model → full KV-cache, different models → vocabulary-mediated projection (~6 KB), incompatible models → JSON text fallback.

Results

Direct = single model, no pipeline. Latent = AVP transfer. Text Chain = standard text handoff between agents.

	Direct	Latent (AVP)	Text Chain
HumanEval (Qwen 7B, n=164)	58.5%	67.1%	53.0%
GSM8K (Qwen 7B, n=200)	91.0%	90.5%	87.0%
DebugBench (Qwen 7B, n=100)	50.0%	51.0%	49.0%
GSM8K (Llama 3B, n=200)	74.5%	76.0%	79.0%

HumanEval: +12.4pp vs text across 4 seeds (p=0.004). GSM8K and DebugBench: neutral across all modes, but the pipeline runs 3x faster (7.6s vs 22.8s end-to-end on DebugBench). Llama 3B: text wins on GSM8K; latent overhead has more impact on smaller models. All benchmarks used steps=20 on NVIDIA A100.

Trade-off: 20 latent steps cost ~0.9s on A100. If Agent A would normally generate 22+ tokens of text, latent is faster.

Cross-model (zero training):

Source → Target	GSM8K (Rosetta / Text)	HumanEval (Rosetta / Text)
Qwen 7B → Qwen 3B	82.5% / 88.5%	66.5% / 62.2%
Qwen 7B → Llama 3B	77.0% / 86.5%	47.0% / 57.9%
Llama 3B → Qwen 7B	90.0% / 82.0%	79.3% / 61.6%

Target solo baselines: Qwen 3B = 82.5% / 61.0%, Llama 3B = 76.0% / 50.6%, Qwen 7B = 91.0% / 58.5%.

Full results: Benchmarks – 7 benchmarks, 5 models, 2 families, reproducible.

How It Works

How AVP works

AVP auto-negotiates the transfer mode via a handshake at connection time. You write the same think() / generate() code regardless of which mode is selected:

Mode	When	What transfers	Size
Latent	Same model	Full KV-cache	~390 MB for 7B
Cross-model	Different model or family	Projected hidden state via shared vocabulary	~6 KB
JSON fallback	No compatible projection path	Plain text	Varies

The handshake checks model hash → structural match → shared tokenizer → vocabulary overlap (≥100 BPE tokens) → JSON. You never configure this manually.

Works With

Engines

Engine	Latent Pipeline	Cross-model
HuggingFace `avp[hf]`	Full think/generate	Yes
Ollama `avp[ollama]`	Full think/generate, auto-resolves GGUF	Yes
llama.cpp `avp[llamacpp]`	Full think/generate on GGUF	Yes
vLLM `avp[vllm]`	KV connector + model plugin	Yes

Frameworks

Framework	Integration	Extra
LangChain	`ChatAVP` BaseChatModel	`avp[langchain]`
CrewAI	`AVPLLM` BaseLLM	`avp[crewai]`
AutoGen	`AVPChatCompletionClient`	`avp[autogen]`
A2A / MCP	Complementary: AVP handles tensor transfer, they handle routing	–

See Framework Integration Guide for per-engine code examples.

Roadmap

Bidirectional latent communication (both agents share thinking, not just one)
CacheGen-style KV-cache compression (3-4x reduction)

Documentation

AVP Specification – binary format, handshake, transport
Benchmarks – 7 benchmarks, 5 models, 2 families
Framework Integration – engines, frameworks, per-engine examples
Examples – quickstart, cross-model, and agent demos
CHANGELOG

License

Apache 2.0 – see LICENSE

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sstas

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.1

Apr 5, 2026

0.6.0

Apr 4, 2026

0.5.1

Apr 3, 2026

0.5.0

Apr 3, 2026

0.4.2

Mar 30, 2026

0.4.1

Mar 26, 2026

This version

0.4.0

Mar 23, 2026

0.3.2

Mar 13, 2026

0.3.1

Mar 8, 2026

0.3.0

Mar 7, 2026

0.2.3

Mar 2, 2026

0.2.2

Feb 28, 2026

0.2.1

Feb 25, 2026

0.2.0

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

avp-0.4.0.tar.gz (325.9 kB view details)

Uploaded Mar 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

avp-0.4.0-py3-none-any.whl (114.8 kB view details)

Uploaded Mar 23, 2026 Python 3

File details

Details for the file avp-0.4.0.tar.gz.

File metadata

Download URL: avp-0.4.0.tar.gz
Upload date: Mar 23, 2026
Size: 325.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for avp-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`33959c5cabf3e19c7c64fe919c8690b517512895e759d249a05c44e67645fa83`
MD5	`5c2f813d2ee77243ca7fa7d522804453`
BLAKE2b-256	`92472d76346a33a2204e8c187ad071ade34c05fcb4685eccbe653ff2c272903f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for avp-0.4.0.tar.gz:

Publisher: publish.yml on VectorArc/avp-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: avp-0.4.0.tar.gz
- Subject digest: 33959c5cabf3e19c7c64fe919c8690b517512895e759d249a05c44e67645fa83
- Sigstore transparency entry: 1157444410
- Sigstore integration time: Mar 23, 2026
Source repository:
- Permalink: VectorArc/avp-python@6f355d36ac9c374e824d42d2fe518b796dc755ad
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/VectorArc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6f355d36ac9c374e824d42d2fe518b796dc755ad
- Trigger Event: push

File details

Details for the file avp-0.4.0-py3-none-any.whl.

File metadata

Download URL: avp-0.4.0-py3-none-any.whl
Upload date: Mar 23, 2026
Size: 114.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for avp-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c5f0932557370e3e508c6420d9dd77ea48c663bc4dc13fda861b96bebd334c6`
MD5	`b2e23a62d04faedc1db6f81a4b507dac`
BLAKE2b-256	`07fc3d07fda872aa2a25c0fc9632c24a898e3f03c30bca86dae3e37777715b23`

See more details on using hashes here.

Provenance

The following attestation bundles were made for avp-0.4.0-py3-none-any.whl:

Publisher: publish.yml on VectorArc/avp-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: avp-0.4.0-py3-none-any.whl
- Subject digest: 7c5f0932557370e3e508c6420d9dd77ea48c663bc4dc13fda861b96bebd334c6
- Sigstore transparency entry: 1157444472
- Sigstore integration time: Mar 23, 2026
Source repository:
- Permalink: VectorArc/avp-python@6f355d36ac9c374e824d42d2fe518b796dc755ad
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/VectorArc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6f355d36ac9c374e824d42d2fe518b796dc755ad
- Trigger Event: push

avp 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

AVP – Agents Share Thoughts, Not Text

Quick Start

Results

How It Works

Works With

Engines

Frameworks

Roadmap

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance