acervo

Context proxy for AI agents. Enriches LLM context from a knowledge graph, extracts knowledge from responses.

These details have not been verified by PyPI

Project description

Acervo
Semantic compression layer for AI agents.
Your agent's context window is finite. Acervo makes it infinite.

Python

The problem

Every chat application sends the entire conversation history to the LLM on every turn. Turn 1 costs 200 tokens. Turn 50 costs 9,000. Turn 100 hits the context window limit and starts losing information.

And it's getting worse. Custom rules (CLAUDE.md), skills, agent instructions, MCP tool definitions — each layer adds more static tokens to every request. The models now support 128K–1M token contexts, but having more space doesn't solve the problem. You're stuffing everything into a bigger bag instead of organizing what you need.

RAG helps, but it's brute force — it searches everything, retrieves long text chunks, and floods the context with tokens that are mostly noise. A 5-chunk retrieval at 500 tokens each costs 2,500 tokens per turn, and most of that text isn't relevant to the current question.

And then there's the session problem. Close the conversation, open a new one — everything is gone. You start from scratch every time.

What Acervo does differently

Acervo builds a knowledge graph from every conversation. When a new message arrives, it retrieves only the relevant context — not raw text chunks, but compressed knowledge nodes. The graph is the summary.

Without Acervo:   turn 1 → 200tk  │  turn 50 → 9,000tk  │  turn 100 → context limit
Classic RAG:      turn 1 → 200tk  │  turn 50 → 2,500tk  │  turn 100 → 2,500tk (noisy)
With Acervo:      turn 1 → 200tk  │  turn 50 → ~400tk   │  turn 100 → ~400tk (signal)

A question like "What tech does Project Alpha use?" doesn't need 2,500 tokens of retrieved paragraphs. It needs the compressed knowledge:

Project Alpha: web app, e-commerce platform
  → uses: React, PostgreSQL, Redis
  → maintained by: Alice, Bob
  → deployed on: AWS, web + mobile

Pure signal, zero noise. And when you close the session and come back tomorrow, the graph is still there. No more starting from scratch.

How it works

Acervo is a context proxy — it sits between your app and the LLM. Transparent, stateless, zero code changes required.

User message
     │
     ▼
 S1 Unified        ← Topic detection + knowledge extraction (one LLM call)
     │
     ▼
 Context build     ← Reads the graph, assembles compressed context from relevant nodes
     │
     ▼
 Your LLM          ← Responds with enriched context (you control model, streaming, tools)
     │
     ▼
 S1.5 Graph Update ← Async: extracts from response, curates graph, merges duplicates
     │
     ▼
 Graph grows       ← Next turn has more knowledge available, still constant tokens

The pipeline

S1 Unified (sync, before response) — Classifies the topic and extracts entities, relations, and facts from the user's message. One LLM call replaces what used to be three separate steps.

S1.5 Graph Update (async, after response) — Runs in the background without blocking the user. Extracts knowledge from the assistant's response, merges duplicate nodes, corrects types, and creates missing relations. This is what makes Acervo stateless — the graph is always up to date and ready for the next message, whenever it comes.

Stateless by design

Acervo has no session state. The graph is the state. Every turn follows the same pipeline: read from graph → compress → inject → extract → write. If your app crashes and restarts, if you close the session and come back next week — the next message works identically. The graph remembers.

Quick start

1. Install

pip install acervo

2. Start a local LLM

Acervo works best with our fine-tuned extraction model. Load it in LM Studio or Ollama:

With LM Studio: Search for acervo-extractor-qwen3.5-9b and load it. One model handles everything — chat and extraction.

With Ollama:

ollama run sandyeveliz/acervo-extractor

Any OpenAI-compatible model also works as a fallback (e.g., qwen2.5:3b, gpt-4o-mini).

3. Initialize and run

cd your-project
acervo init          # Creates .acervo/ directory
acervo serve         # Starts proxy on port 9470

Point your app's base_url to http://localhost:9470 — that's it. Acervo intercepts every request, enriches it with graph context, and forwards it to your LLM.

4. Or use as a library

import asyncio
from acervo import Acervo, OpenAIClient

async def main():
    llm = OpenAIClient(
        base_url="http://localhost:1234/v1",
        model="acervo-extractor-qwen3.5-9b",
    )
    memory = Acervo(llm=llm, owner="demo-user")

    history = [{"role": "system", "content": "You are a helpful assistant."}]

    # Turn 1: user shares information
    user_msg = "I work at Acme Corp, we're building a React app called Beacon with PostgreSQL"
    history.append({"role": "user", "content": user_msg})

    prep = await memory.prepare(user_msg, history)
    # Call YOUR LLM with prep.context_stack
    assistant_msg = "Got it! Tell me more about Beacon."
    history.append({"role": "assistant", "content": assistant_msg})

    await memory.process(user_msg, assistant_msg)
    # Graph now has: Acme Corp (organization), Beacon (project), React + PostgreSQL (technology)

    # Turn 2: ask about something stored
    user_msg = "What do you know about our project?"
    history.append({"role": "user", "content": user_msg})

    prep = await memory.prepare(user_msg, history)
    # prep.has_context → True
    # Context includes Beacon's full node — not the raw turn 1 text

asyncio.run(main())

Proxy mode (`acervo serve`)

The recommended way to use Acervo. Zero code changes — just redirect your app's base_url.

acervo serve --port 9470 --forward-to http://localhost:1234/v1

Your app                    Acervo proxy (:9470)                LLM server (:1234)
   │                              │                                   │
   ├─ POST /v1/chat/completions ─►│                                   │
   │                              ├─ S1: topic + extraction           │
   │                              ├─ inject compressed context         │
   │                              ├─ POST /v1/chat/completions ──────►│
   │                              │◄──────────── stream response ─────┤
   │◄──── stream response ────────┤                                   │
   │                              ├─ S1.5: async graph curation       │
   │                              │                                   │

Supports OpenAI (/v1/chat/completions) and Anthropic (/v1/messages) formats, streaming and non-streaming.

curl http://localhost:9470/acervo/last-turn    # What Acervo did on the last turn
curl http://localhost:9470/acervo/status        # Graph stats

Data storage

All data lives in .acervo/ in your project directory, following the .git/ pattern:

your-project/
├── .acervo/
│   ├── graph/
│   │   ├── nodes.json
│   │   └── edges.json
│   ├── vectordb/
│   └── config.toml
├── src/
└── ...

acervo init      # Create .acervo/ directory
acervo status    # Show graph stats
acervo reset     # Clear all data (graph + vector store)

The knowledge graph

As conversations happen, Acervo builds a persistent graph of entities, relations, and facts. Every node is a compressed representation — not raw text, but structured knowledge.

What gets extracted

Entities — real, named things with 8 types:

person, organization, project, technology, place, event, document, concept

Relations — 15 precise connection types:

part_of, created_by, maintains, works_at, member_of,
uses_technology, depends_on, alternative_to,
located_in, deployed_on, produces, serves, documented_in,
participated_in, triggered_by, resulted_in

Facts — specific claims attached to entities:

Project Beacon: "In production since March, 50k monthly users"
Alice Chen: "Lead developer, joined in 2024"

Two knowledge layers

PERSONAL — User-specific: projects, preferences, relationships. Scoped to that user.

UNIVERSAL — World knowledge: technologies, cities, public figures. Shareable across users.

The extractor assigns layers automatically. "We use React" creates a PERSONAL edge from your project to the UNIVERSAL React node.

Events as super-summaries

Instead of storing 5,000 tokens of meeting notes:

Event: "Sprint review Q1"
  participants: [Alice, Bob, CTO]
  description: "Shipped auth module, discussed perf issues, decided to add Redis cache"
  temporal_marker: "End of Q1 2026"

The LLM gets ~40 tokens instead of the full transcript.

Topic-based context layers

What makes Acervo fundamentally different from a cache or a RAG system.

The layers move with the conversation

Traditional systems use temporal layers — recent = hot, old = cold. Acervo uses topic-based layers: what's relevant depends on what you're talking about right now.

09:00 — "Let's work on the auth bug in Beacon"
  HOT:  Beacon, React, PostgreSQL, auth module, Alice
  → LLM receives ~200 tokens of graph context

09:45 — "Now let's look at Project Compass, the mobile app"
  HOT:  Compass, React Native, Firebase
  WARM: Beacon, React, auth module          ← dropped from hot
  → LLM receives ~200 tokens about Compass only

14:00 — "Back to the Beacon auth bug, did we fix it?"
  HOT:  Beacon, React, auth module          ← jumps back to hot instantly
  → LLM receives Beacon context including this morning's facts

17:00 — "Switching gears — have you read Dune?"
  HOT:  Dune, Arrakis, Paul Atreides       ← completely new cluster
  COLD: ALL work context (0 tokens)

Progressive retrieval

Default: inject only hot layer (~200-400 tokens)
If user asks for more: bring in warm layer
If needed: bring in cold layer

In 80% of turns, hot is enough.

Fine-tuned extraction model

Acervo includes a fine-tuned model specifically trained for knowledge graph extraction:

acervo-extractor-qwen3.5-9b — Based on Qwen 3.5 9B, trained on 612 examples across 5 domains.

Metric	Score
JSON parse rate	100%
Extraction accuracy	85%
Languages	English + Spanish

Single model architecture — The same model handles both chat and extraction. The system prompt determines behavior. No need for separate models. ~6GB VRAM total.

Training data, notebooks, and evaluation scripts are in the acervo-models repository.

Index a codebase

Acervo can index an entire project directory — parsing source code with tree-sitter, resolving import dependencies, and building a structural knowledge graph without any LLM calls.

acervo init /path/to/project
acervo index /path/to/project

For a 50-file project, structural indexing takes under 2 seconds. Each file stores a SHA-256 hash — unchanged files are skipped on re-index.

What it extracts

Phase 1 — Structural (tree-sitter, no LLM): functions, classes, interfaces, imports/exports, markdown sections.

Phase 2 — Semantic (optional): embeddings per entity, LLM summaries, topic tags.

acervo index /path/to/project \
  --embedding-model nomic-embed-text \
  --embedding-endpoint http://localhost:11434

Supported files

Extension	Parser	Extracts
`.py`	tree-sitter	functions, classes, methods, imports, decorators
`.ts` `.tsx`	tree-sitter	functions, classes, interfaces, types, imports, exports
`.js` `.jsx`	tree-sitter	functions, classes, imports, exports
`.html`	regex	component references, element IDs
`.css`	regex	selectors, custom properties
`.md`	heading parser	sections with hierarchy context

Architecture

How nodes replace chunks

Traditional RAG: "When did Alice and Bob first work together?" → 5 chunks × 500 tokens = 2,500 tokens.

Acervo:

Alice → participated_in → Project Beacon launch (event)
Bob   → participated_in → Project Beacon launch (event)

Event: "Project Beacon launch"
  description: "First joint project, shipped MVP in 2 weeks"
  temporal_marker: "March 2025"

~60 tokens. Same answer quality.

Pipeline components

Component	What it does
S1 Unified	Topic classification + entity/relation/fact extraction in one call
S1.5 Graph Update	Async graph curation: merges, type corrections, assistant extraction
Context index	Selects hot/warm/cold nodes, assembles compressed context
Topic detector	Cascade: keywords → embeddings → LLM (most resolved without LLM)

History windowing

Traditional (turn 50):  system + msg1 + ... + msg50 = 9,000 tokens (growing)
Acervo (turn 50):       system + [graph context] + msg49 + msg50 = ~400 tokens (constant)

Works with any LLM

class LLMClient(Protocol):
    async def chat(
        self,
        messages: list[dict[str, str]],
        *,
        temperature: float = 0.0,
        max_tokens: int = 500,
    ) -> str: ...

Tested setup

Component	Tool	Model
LLM + Extraction	LM Studio	`acervo-extractor-qwen3.5-9b` (single model for everything)
Embeddings	Ollama	`qwen3-embedding` (optional)
Client app	AVS-Agents	Python Web UI

VRAM requirement: ~6GB (one model handles chat + extraction)

Project status

v0.2.0 — Changelog

Feature	Status
Knowledge graph (JSON persistence)	✅ Working
UNIVERSAL / PERSONAL layers	✅ Working
`prepare()` / `process()` context proxy API	✅ Working
S1 Unified extraction (topic + entities in one call)	✅ Working
S1.5 Async graph curation (merges, corrections)	✅ Working
Fine-tuned extraction model (Qwen 3.5 9B)	✅ Published
Single model architecture (chat + extraction)	✅ Working
Topic-based context layers (HOT/WARM/COLD)	✅ Working
Topic detector (keywords → embeddings → LLM)	✅ Working
Context index with token budgeting	✅ Working
History windowing (constant token usage)	✅ Working
Entity + relation + event extraction	✅ Working
`.acervo/` project data directory	✅ Working
`acervo index` — structural + semantic	✅ Working
REST API (`acervo serve`)	✅ Working
Reproducible benchmarks (100-turn comparison)	🔜 v0.3
Progressive retrieval (hot → warm → cold)	🔜 v0.3
Docker Compose (one-command setup)	🔜 v0.3
Interactive demo (GitHub Pages)	🔜 v0.3
Graph → Vector DB chunk refs	🔜 v0.4

Documentation

Tutorial — Build a chat with persistent memory in 5 minutes
Getting Started — Installation, quick start, LLMClient protocol
Configuration — SDK parameters, environment variables
Knowledge Layers — UNIVERSAL vs PERSONAL, node lifecycle, topic layers
Roadmap — Planned features
Blog series — Development journey, version by version

Why "Acervo"?

In library science, an acervo is the complete collection of a library — every book, document, and record, organized so anything can be found when needed.

Your agent's memory should work like a library: knowledge organized by subject, retrievable in an instant. Not like someone who reads every book from cover to cover every time you ask a question.

Contributing

Open source under Apache 2.0. See CONTRIBUTING.md.

License

Apache 2.0 — see LICENSE.

GitHub · Docs · PyPI · Model

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.2

Mar 25, 2026

0.2.1

Mar 25, 2026

0.2.0

Mar 25, 2026

0.1.1

Mar 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acervo-0.2.2.tar.gz (426.3 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

acervo-0.2.2-py3-none-any.whl (115.4 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file acervo-0.2.2.tar.gz.

File metadata

Download URL: acervo-0.2.2.tar.gz
Upload date: Mar 25, 2026
Size: 426.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for acervo-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`14f69926bd93d63ab410f849b27170849c6aeaae7d5d513d28868e74268d5dbf`
MD5	`25b6eb378c961c9be70e810d7b4e2f3b`
BLAKE2b-256	`fb7503c099dde3728554e2de41e0ffddeee7be6716bf453b98b7b217b539f807`

See more details on using hashes here.

Provenance

The following attestation bundles were made for acervo-0.2.2.tar.gz:

Publisher: publish.yml on sandyeveliz/acervo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: acervo-0.2.2.tar.gz
- Subject digest: 14f69926bd93d63ab410f849b27170849c6aeaae7d5d513d28868e74268d5dbf
- Sigstore transparency entry: 1181747380
- Sigstore integration time: Mar 25, 2026
Source repository:
- Permalink: sandyeveliz/acervo@4c6eda53f673ea94093f0e4f27b9a26207edfd5d
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/sandyeveliz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4c6eda53f673ea94093f0e4f27b9a26207edfd5d
- Trigger Event: push

File details

Details for the file acervo-0.2.2-py3-none-any.whl.

File metadata

Download URL: acervo-0.2.2-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 115.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for acervo-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dd183316fb0e215c9b5d3733fdd748143a05541da005238621bd98c787d62de6`
MD5	`0ce21772026bd20cea4839c3ca46035f`
BLAKE2b-256	`f9828c3c23eb247cd1e00dbf218ef925ace18f85234729109dda1e388e42a3f3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for acervo-0.2.2-py3-none-any.whl:

Publisher: publish.yml on sandyeveliz/acervo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: acervo-0.2.2-py3-none-any.whl
- Subject digest: dd183316fb0e215c9b5d3733fdd748143a05541da005238621bd98c787d62de6
- Sigstore transparency entry: 1181747381
- Sigstore integration time: Mar 25, 2026
Source repository:
- Permalink: sandyeveliz/acervo@4c6eda53f673ea94093f0e4f27b9a26207edfd5d
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/sandyeveliz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4c6eda53f673ea94093f0e4f27b9a26207edfd5d
- Trigger Event: push

acervo 0.2.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

The problem

What Acervo does differently

How it works

The pipeline

Stateless by design

Quick start

1. Install

2. Start a local LLM

3. Initialize and run

4. Or use as a library

Proxy mode (acervo serve)

Data storage

The knowledge graph

What gets extracted

Two knowledge layers

Events as super-summaries

Topic-based context layers

The layers move with the conversation

Progressive retrieval

Fine-tuned extraction model

Index a codebase

What it extracts

Supported files

Architecture

How nodes replace chunks

Pipeline components

History windowing

Works with any LLM

Tested setup

Project status

Documentation

Why "Acervo"?

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Proxy mode (`acervo serve`)