Skip to main content

Anticipatory memory for LLMs — predicts what context you'll need next and pre-stages it before you ask.

Project description

presage

Anticipatory memory for LLMs. Retrieves before you ask.
A novel memory architecture that predicts what context an LLM will need next — and pre-stages it before the question arrives.


Every memory system for LLMs works the same way: the user asks something, the system retrieves relevant context, the LLM responds. Retrieve, then respond.

Presage inverts this. It models the conversation as a trajectory moving through semantic space — with position, velocity, and acceleration — and uses that trajectory to predict what memories will be needed next. By the time the user sends their next message, the relevant context is already staged and ready to inject.

No retrieval latency on the critical path. No cold cache. A system that gets smarter every turn.


How It Works

User types a message
        │
        ├── [HOT PATH — blocks until response] ──────────────────────┐
        │   Observer:  embed + intent extraction        (~2ms)        │
        │   Staging:   grab pre-fetched memory          (~1ms)        │  
        │   Reranker:  refine against actual message    (~1ms)        │
        │   Injector:  0/1 knapsack → token budget      (~1ms)        │
        │   LLM:       call with enriched context                     │
        │                                                             │
        └── [BACKGROUND — while LLM generates] ──────────────────────┘
            Predictor:  geodesic extrapolation → predictions
            Prefetcher: async fetch from vector + graph + annotations
            Feedback:   hit/miss → bandit update → training log
            Writer:     distill + chunk + store new memories

By the next turn, the prefetch is already done. The user never waits for retrieval.


The Math

Presage treats conversation as a particle moving through the embedding hypersphere.

Conversation state — an exponentially-decayed weighted sum of turn embeddings, normalized onto the unit sphere:

C_t = normalize( Σ [λ^(N-i) * e_i] )

Momentum — projected onto the tangent plane at $C_t$ (respects spherical geometry):

M_tan = M_t - (M_t • C_t) * C_t

Geodesic extrapolation — moves along the great circle rather than punching through the sphere's interior:

C_{t+k} = cos(θ)*C_t + sin(θ)*M_tan
where θ = velocity * k * δ

The predicted state C_{t+k} is used as the query vector for prefetching — always on the unit sphere, always a valid cosine similarity query.

Confidence — each prediction strategy is tracked by a Bayesian Beta-Bernoulli bandit:

P(hit) = (hits + 1) / (hits + misses + 2)

No training required. Starts calibrated (Beta(1,1) = 0.5), updates every turn.

Injection — context allocation solved as a 0/1 knapsack over pre-chunked semantic units:

Maximize: Σ (value_i * x_i)
Subject to: Σ (tokens_i * x_i) ≤ MAX_TOKENS
where x_i ∈ {0, 1}

Content is never truncated. The knapsack selects whole chunks only — split at AST node, sentence, or header boundaries at write time.


Architecture

┌─────────────────────────────────────────────────────────────┐
│  Surface Layer    token stream · intent classifier           │
│  (core/surface)   symbol extractor · file detector          │
├─────────────────────────────────────────────────────────────┤
│  Nerve Layer      conversation state manager                 │
│  (core/nerve)     trajectory predictor · bandit registry    │
├─────────────────────────────────────────────────────────────┤
│  Staging Layer    P0–P9 async prefetch cache                 │
│  (core/staging)   prefetcher · reranker · knapsack injector │
├─────────────────────────────────────────────────────────────┤
│  Store Layer      SQLite (source of truth)                   │
│  (core/store)     Qdrant (vector) · Kuzu (graph)            │
│                   outbox worker · read-your-writes fallback  │
├─────────────────────────────────────────────────────────────┤
│  Write Layer      memory distiller · conflict resolver       │
│  (core/write)     semantic chunker · forward annotator      │
├─────────────────────────────────────────────────────────────┤
│  Feedback Layer   trigram + semantic hit detection           │
│  (core/feedback)  bandit updater · trajectory dataset       │
├─────────────────────────────────────────────────────────────┤
│  API Layer        FastAPI REST · WebSocket streaming         │
│  (api/)           session factory · session manager         │
└─────────────────────────────────────────────────────────────┘

Storage

Store Backend Role
MetaStore SQLite + aiosqlite Source of truth. All writes here first.
VectorStore Qdrant (local) Semantic search over chunk embeddings.
GraphStore Kuzu (embedded) Causal graph: calls, imports, conflicts.
Outbox SQLite table Eventual consistency to Qdrant + Kuzu.

SQLite is always the source of truth. Qdrant and Kuzu are derived projections — if they get corrupted, run presage rebuild-index to reconstruct them from SQLite in minutes.


Quickstart

Prerequisites

  • Python 3.12+
  • An Anthropic or OpenAI API key (or Ollama for local models)

Install

pip install presage

Or with all optional backends:

pip install "presage[all]"

Initialize

presage init

Ingest your codebase

presage ingest ./your_project/

Presage walks the directory, chunks every file at natural boundaries (AST nodes for code, headers for markdown, top-level keys for JSON/YAML), embeds the chunks, and writes them to the store with forward annotations.

Chat

presage chat
Session: a3f7c2d1-...
Type your message. Ctrl+C to exit.

You: why does verify_token throw an AttributeError?

Presage [DEBUG | v=0.18 | 3 mem | 241ms]:
The AttributeError in verify_token() is caused by...

The header shows: detected intent, conversation velocity, memories injected, and turn latency.

Start the API

presage serve

API docs: http://localhost:8000/docs


REST API

# Create a session
curl -X POST http://localhost:8000/v1/session

# Submit a turn
curl -X POST http://localhost:8000/v1/turn \
  -H "Content-Type: application/json" \
  -d '{"session_id": "...", "message": "explain verify_token"}'

# Manually ingest a memory
curl -X POST http://localhost:8000/v1/ingest \
  -H "Content-Type: application/json" \
  -d '{"content": "...", "source": "auth.py", "source_type": "code"}'

# Search memories
curl "http://localhost:8000/v1/memory/search?query=authentication&top_k=5"

# View staging slot state
curl http://localhost:8000/v1/session/{id}/slots

# Health check
curl http://localhost:8000/v1/health

Docker

cd docker
cp .env.example .env   # add API keys
docker-compose up -d
Service URL Purpose
Presage API http://localhost:8000 REST + WebSocket
API Docs http://localhost:8000/docs Swagger UI
Metrics http://localhost:8000/metrics Prometheus scrape
Grafana http://localhost:3000 Metrics dashboard

Configuration

All settings are environment variables with the PRESAGE_ prefix (or set in .env).

Variable Default Description
PRESAGE_LLM_BACKEND anthropic anthropic · openai · ollama
PRESAGE_EMBEDDER_BACKEND openai openai · nomic · bge
PRESAGE_DECAY_LAMBDA_BASE 0.85 Exponential decay for conversation state
PRESAGE_CONTEXT_SWITCH_THRESHOLD 0.40 Cosine distance that triggers momentum reset
PRESAGE_SLERP_STEP_SIZE 0.30 Arc length per velocity unit in geodesic extrapolation
PRESAGE_AUTO_INJECT_THRESHOLD 0.80 Bandit confidence required for automatic injection
PRESAGE_MAX_INJECT_TOKENS 4096 Token budget for context injection per turn
PRESAGE_SLOT_COUNT 10 Number of prefetch staging slots (P0–P9)
PRESAGE_SLOT_TTL_SECONDS 120 How long a staged memory stays warm
PRESAGE_STATE_WINDOW_MAX 6 Max turn lookback for conversation state

Prediction Strategies

Presage uses different retrieval strategies depending on detected intent:

Intent Signals Strategy
DEBUG "error", "fix", "crash", "exception" Graph walk → semantic
IMPLEMENT "write", "create", "build", "add" Semantic → symbol lookup
NAVIGATE "where is", "find", "which file" Symbol lookup → semantic
COMPARE "vs", "difference", "better than" Hybrid (vector + graph)
EXPLORE "what is", "explain", "how does" Semantic → annotation
REFLECT "earlier", "we decided", "before" Annotation → semantic

Staging Slots

The 10 staging slots are tiered by confidence:

P0 ─── AUTO  (conf ≥ 0.80) → injected automatically every turn
P1 ─── AUTO  (conf ≥ 0.80) → injected automatically every turn
P2 ─── HOT   (conf ≥ 0.50) → injected when soft trigger matches
P3 ─── HOT   (conf ≥ 0.50) → injected when soft trigger matches
P4 ─── HOT   (conf ≥ 0.50) → injected when soft trigger matches
P5 ─── WARM  (conf ≥ 0.30) → available on explicit request
...
P9 ─── WARM  (conf ≥ 0.30) → available on explicit request

A soft trigger fires when the user's message mentions a symbol or file that matches a staged memory's annotation tags — e.g., typing "verify_token" fires any HOT memory tagged symbol:verify_token.


Benchmarks

python tests/bench/bench_momentum.py   # math layer
python tests/bench/bench_staging.py    # staging layer
python tests/bench/bench_store.py      # store layer

Target latencies (P99 on a modern laptop):

Operation Target Layer
Intent classification < 0.5ms Surface
Momentum update < 2ms Nerve
Geodesic extrapolation < 2ms Nerve
Knapsack injection < 1ms Staging
Reranker < 1ms Staging
Annotation search < 5ms Store
Total hot-path overhead < 10ms All

The prefetch (retrieval) runs in the background while the LLM generates its response — it does not contribute to user-perceived latency.


Observability

Presage exposes Prometheus metrics at /metrics and OpenTelemetry traces via OTLP.

Key metrics:

presage_session_turn_latency_seconds    # end-to-end hot path latency
presage_feedback_hit_rate               # prediction hit rate per turn
presage_nerve_momentum_velocity         # conversation velocity histogram
presage_staging_slot_hits_total         # successful memory injections
presage_store_outbox_pending            # propagation lag gauge
presage_trajectory_samples_total        # training data accumulated

Enable tracing:

PRESAGE_OTEL_ENDPOINT=http://jaeger:4317 presage serve

Training Data Export

Every session accumulates trajectory samples — (conversation_state, predictions, outcomes) triples — that can be used to fine-tune the trajectory predictor from heuristic rules into a learned model.

presage export trajectory_data.jsonl

The JSONL format is compatible with standard fine-tuning pipelines. Export requires sessions with ≥ 100 turns for quality filtering.


Project Structure

presage/
├── math_core/           # Core mathematics
│   ├── momentum.py      # Conversation state, SLERP extrapolation
│   ├── entropy.py       # Context switch detection, adaptive decay
│   ├── knapsack.py      # 0/1 DP knapsack for token budget
│   ├── diffusion.py     # Personalized PageRank over memory graph
│   └── bandit.py        # Beta-Bernoulli bandits + registry
├── core/
│   ├── surface/         # Observer, intent classifier, signal extractor
│   ├── nerve/           # Trajectory predictor, state manager
│   ├── staging/         # Prefetch cache, prefetcher, injector, reranker
│   ├── store/           # MetaStore, VectorStore, OutboxWorker
│   ├── write/           # Chunker, distiller, conflict resolver, annotator
│   ├── feedback/        # Hit/miss detector, feedback loop, dataset
│   └── session/         # SessionManager, SessionFactory
├── adapters/
│   ├── embedder/        # OpenAI, nomic, bge (local)
│   └── llm/             # Anthropic, OpenAI, Ollama
├── api/                 # FastAPI REST + WebSocket
├── observability/       # Prometheus metrics, OpenTelemetry tracing
├── cli/                 # presage CLI
├── docker/              # Dockerfile, docker-compose, .env.example
└── tests/
    ├── unit/            # Per-module unit tests (100% coverage target)
    ├── integration/     # Cross-layer integration tests
    └── bench/           # Latency benchmarks

What Makes Presage Different

Every existing LLM memory system — Mem0, MemGPT, Zep, LangChain memory, AriGraph — is pull-based. The LLM asks, the store answers.

Presage is push-based. The store predicts, prefetches, and pushes. By the time the LLM asks, the answer is already there.

System Architecture Retrieval Self-improving
RAG Vector search Reactive No
MemGPT Episodic compression Reactive No
Zep Bi-temporal KG Reactive No
A-MEM Zettelkasten Reactive No
Presage Kinematic trajectory Proactive Yes (bandits)

Roadmap

  • Cross-encoder reranker for P0-P1 slots (Phase 7 upgrade)
  • Token streaming in WebSocket (type: token events)
  • Fine-tuned trajectory predictor from accumulated dataset
  • Multi-user shared memory with access control
  • VSCode extension for native IDE integration
  • MCP server adapter (plug into Claude, Cursor, Zed)

License

MIT © 2025


Built with kinematic trajectory math, Bayesian bandits, and the conviction that memory should anticipate — not react.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

presage-0.1.1.tar.gz (199.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

presage-0.1.1-py3-none-any.whl (92.0 kB view details)

Uploaded Python 3

File details

Details for the file presage-0.1.1.tar.gz.

File metadata

  • Download URL: presage-0.1.1.tar.gz
  • Upload date:
  • Size: 199.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for presage-0.1.1.tar.gz
Algorithm Hash digest
SHA256 19351ef6651a7b967bb012ab5984c3d0e26c4aa2a3e316a83ceb744b4508deac
MD5 09d71f186f4e24ce070229c6a21ea62a
BLAKE2b-256 0652ee5df82fa7fb21b24f58754c038bf62fffd3cff3b29ad0f144a52090b893

See more details on using hashes here.

Provenance

The following attestation bundles were made for presage-0.1.1.tar.gz:

Publisher: publish.yml on garry00107/presage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file presage-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: presage-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 92.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for presage-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ee115529eb99e1c6eff964dde9c6c43707b726eb93544ff8e6096eb6c23f467a
MD5 2179574af0873effd616c6073bc2837f
BLAKE2b-256 45388db8bd8fedf516e70beff15c9438602fb099d8a3df65d5697a83703ffbd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for presage-0.1.1-py3-none-any.whl:

Publisher: publish.yml on garry00107/presage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page