Anticipatory memory for LLMs — predicts what context you'll need next and pre-stages it before you ask.

These details have not been verified by PyPI

Project links

Project description

presage

Anticipatory memory for LLMs. Retrieves before you ask.
_{A novel memory architecture that predicts what context an LLM will need next — and pre-stages it before the question arrives.}

Every memory system for LLMs works the same way: the user asks something, the system retrieves relevant context, the LLM responds. Retrieve, then respond.

Presage inverts this. It models the conversation as a trajectory moving through semantic space — with position, velocity, and acceleration — and uses that trajectory to predict what memories will be needed next. By the time the user sends their next message, the relevant context is already staged and ready to inject.

No retrieval latency on the critical path. No cold cache. A system that gets smarter every turn.

How It Works

User types a message
        │
        ├── [HOT PATH — blocks until response] ──────────────────────┐
        │   Observer:  embed + intent extraction        (~2ms)        │
        │   Staging:   grab pre-fetched memory          (~1ms)        │  
        │   Reranker:  refine against actual message    (~1ms)        │
        │   Injector:  0/1 knapsack → token budget      (~1ms)        │
        │   LLM:       call with enriched context                     │
        │                                                             │
        └── [BACKGROUND — while LLM generates] ──────────────────────┘
            Predictor:  geodesic extrapolation → predictions
            Prefetcher: async fetch from vector + graph + annotations
            Feedback:   hit/miss → bandit update → training log
            Writer:     distill + chunk + store new memories

By the next turn, the prefetch is already done. The user never waits for retrieval.

The Math

Presage treats conversation as a particle moving through the embedding hypersphere.

Conversation state — an exponentially-decayed weighted sum of turn embeddings, normalized onto the unit sphere:

$$C_t = \text{normalize}\left(\sum_{i=0}^{N} \lambda^{(N-i)} \cdot e_i\right)$$

Momentum — projected onto the tangent plane at $C_t$ (respects spherical geometry):

$$M_{\text{tan}} = M_t - (M_t \cdot C_t),C_t$$

Geodesic extrapolation — moves along the great circle rather than punching through the sphere's interior:

$$\hat{C}{t+k} = \cos(\theta),C_t + \sin(\theta),\hat{M}{\text{tan}}, \quad \theta = v \cdot k \cdot \delta$$

The predicted state $\hat{C}_{t+k}$ is used as the query vector for prefetching — always on the unit sphere, always a valid cosine similarity query.

Confidence — each prediction strategy is tracked by a Bayesian Beta-Bernoulli bandit:

$$P(\text{hit}) = \frac{\alpha_{\text{hits}} + 1}{\alpha_{\text{hits}} + \beta_{\text{misses}} + 2}$$

No training required. Starts calibrated (Beta(1,1) = 0.5), updates every turn.

Injection — context allocation solved as a 0/1 knapsack over pre-chunked semantic units:

$$\max \sum v_i x_i \quad \text{s.t.} \quad \sum w_i x_i \leq B, \quad x_i \in {0, 1}$$

Content is never truncated. The knapsack selects whole chunks only — split at AST node, sentence, or header boundaries at write time.

Architecture

┌─────────────────────────────────────────────────────────────┐
│  Surface Layer    token stream · intent classifier           │
│  (core/surface)   symbol extractor · file detector          │
├─────────────────────────────────────────────────────────────┤
│  Nerve Layer      conversation state manager                 │
│  (core/nerve)     trajectory predictor · bandit registry    │
├─────────────────────────────────────────────────────────────┤
│  Staging Layer    P0–P9 async prefetch cache                 │
│  (core/staging)   prefetcher · reranker · knapsack injector │
├─────────────────────────────────────────────────────────────┤
│  Store Layer      SQLite (source of truth)                   │
│  (core/store)     Qdrant (vector) · Kuzu (graph)            │
│                   outbox worker · read-your-writes fallback  │
├─────────────────────────────────────────────────────────────┤
│  Write Layer      memory distiller · conflict resolver       │
│  (core/write)     semantic chunker · forward annotator      │
├─────────────────────────────────────────────────────────────┤
│  Feedback Layer   trigram + semantic hit detection           │
│  (core/feedback)  bandit updater · trajectory dataset       │
├─────────────────────────────────────────────────────────────┤
│  API Layer        FastAPI REST · WebSocket streaming         │
│  (api/)           session factory · session manager         │
└─────────────────────────────────────────────────────────────┘

Storage

Store	Backend	Role
MetaStore	SQLite + aiosqlite	Source of truth. All writes here first.
VectorStore	Qdrant (local)	Semantic search over chunk embeddings.
GraphStore	Kuzu (embedded)	Causal graph: calls, imports, conflicts.
Outbox	SQLite table	Eventual consistency to Qdrant + Kuzu.

SQLite is always the source of truth. Qdrant and Kuzu are derived projections — if they get corrupted, run presage rebuild-index to reconstruct them from SQLite in minutes.

Quickstart

Prerequisites

Python 3.12+
An Anthropic or OpenAI API key (or Ollama for local models)

Install

git clone https://github.com/yourname/presage
cd presage
pip install -e ".[all]"
cp docker/.env.example .env
# Edit .env — add your API key

Initialize

presage init

Ingest your codebase

presage ingest ./your_project/

Presage walks the directory, chunks every file at natural boundaries (AST nodes for code, headers for markdown, top-level keys for JSON/YAML), embeds the chunks, and writes them to the store with forward annotations.

Chat

presage chat

Session: a3f7c2d1-...
Type your message. Ctrl+C to exit.

You: why does verify_token throw an AttributeError?

Presage [DEBUG | v=0.18 | 3 mem | 241ms]:
The AttributeError in verify_token() is caused by...

The header shows: detected intent, conversation velocity, memories injected, and turn latency.

Start the API

presage serve

API docs: http://localhost:8000/docs

REST API

# Create a session
curl -X POST http://localhost:8000/v1/session

# Submit a turn
curl -X POST http://localhost:8000/v1/turn \
  -H "Content-Type: application/json" \
  -d '{"session_id": "...", "message": "explain verify_token"}'

# Manually ingest a memory
curl -X POST http://localhost:8000/v1/ingest \
  -H "Content-Type: application/json" \
  -d '{"content": "...", "source": "auth.py", "source_type": "code"}'

# Search memories
curl "http://localhost:8000/v1/memory/search?query=authentication&top_k=5"

# View staging slot state
curl http://localhost:8000/v1/session/{id}/slots

# Health check
curl http://localhost:8000/v1/health

Docker

cd docker
cp .env.example .env   # add API keys
docker-compose up -d

Service	URL	Purpose
Presage API	`http://localhost:8000`	REST + WebSocket
API Docs	`http://localhost:8000/docs`	Swagger UI
Metrics	`http://localhost:8000/metrics`	Prometheus scrape
Grafana	`http://localhost:3000`	Metrics dashboard

Configuration

All settings are environment variables with the PRESAGE_ prefix (or set in .env).

Variable	Default	Description
`PRESAGE_LLM_BACKEND`	`anthropic`	`anthropic` · `openai` · `ollama`
`PRESAGE_EMBEDDER_BACKEND`	`openai`	`openai` · `nomic` · `bge`
`PRESAGE_DECAY_LAMBDA_BASE`	`0.85`	Exponential decay for conversation state
`PRESAGE_CONTEXT_SWITCH_THRESHOLD`	`0.40`	Cosine distance that triggers momentum reset
`PRESAGE_SLERP_STEP_SIZE`	`0.30`	Arc length per velocity unit in geodesic extrapolation
`PRESAGE_AUTO_INJECT_THRESHOLD`	`0.80`	Bandit confidence required for automatic injection
`PRESAGE_MAX_INJECT_TOKENS`	`4096`	Token budget for context injection per turn
`PRESAGE_SLOT_COUNT`	`10`	Number of prefetch staging slots (P0–P9)
`PRESAGE_SLOT_TTL_SECONDS`	`120`	How long a staged memory stays warm
`PRESAGE_STATE_WINDOW_MAX`	`6`	Max turn lookback for conversation state

Prediction Strategies

Presage uses different retrieval strategies depending on detected intent:

Intent	Signals	Strategy
`DEBUG`	"error", "fix", "crash", "exception"	Graph walk → semantic
`IMPLEMENT`	"write", "create", "build", "add"	Semantic → symbol lookup
`NAVIGATE`	"where is", "find", "which file"	Symbol lookup → semantic
`COMPARE`	"vs", "difference", "better than"	Hybrid (vector + graph)
`EXPLORE`	"what is", "explain", "how does"	Semantic → annotation
`REFLECT`	"earlier", "we decided", "before"	Annotation → semantic

Staging Slots

The 10 staging slots are tiered by confidence:

P0 ─── AUTO  (conf ≥ 0.80) → injected automatically every turn
P1 ─── AUTO  (conf ≥ 0.80) → injected automatically every turn
P2 ─── HOT   (conf ≥ 0.50) → injected when soft trigger matches
P3 ─── HOT   (conf ≥ 0.50) → injected when soft trigger matches
P4 ─── HOT   (conf ≥ 0.50) → injected when soft trigger matches
P5 ─── WARM  (conf ≥ 0.30) → available on explicit request
...
P9 ─── WARM  (conf ≥ 0.30) → available on explicit request

A soft trigger fires when the user's message mentions a symbol or file that matches a staged memory's annotation tags — e.g., typing "verify_token" fires any HOT memory tagged symbol:verify_token.

Benchmarks

python tests/bench/bench_momentum.py   # math layer
python tests/bench/bench_staging.py    # staging layer
python tests/bench/bench_store.py      # store layer

Target latencies (P99 on a modern laptop):

Operation	Target	Layer
Intent classification	< 0.5ms	Surface
Momentum update	< 2ms	Nerve
Geodesic extrapolation	< 2ms	Nerve
Knapsack injection	< 1ms	Staging
Reranker	< 1ms	Staging
Annotation search	< 5ms	Store
Total hot-path overhead	< 10ms	All

The prefetch (retrieval) runs in the background while the LLM generates its response — it does not contribute to user-perceived latency.

Observability

Presage exposes Prometheus metrics at /metrics and OpenTelemetry traces via OTLP.

Key metrics:

presage_session_turn_latency_seconds    # end-to-end hot path latency
presage_feedback_hit_rate               # prediction hit rate per turn
presage_nerve_momentum_velocity         # conversation velocity histogram
presage_staging_slot_hits_total         # successful memory injections
presage_store_outbox_pending            # propagation lag gauge
presage_trajectory_samples_total        # training data accumulated

Enable tracing:

PRESAGE_OTEL_ENDPOINT=http://jaeger:4317 presage serve

Training Data Export

Every session accumulates trajectory samples — (conversation_state, predictions, outcomes) triples — that can be used to fine-tune the trajectory predictor from heuristic rules into a learned model.

presage export trajectory_data.jsonl

The JSONL format is compatible with standard fine-tuning pipelines. Export requires sessions with ≥ 100 turns for quality filtering.

Project Structure

presage/
├── math_core/           # Core mathematics
│   ├── momentum.py      # Conversation state, SLERP extrapolation
│   ├── entropy.py       # Context switch detection, adaptive decay
│   ├── knapsack.py      # 0/1 DP knapsack for token budget
│   ├── diffusion.py     # Personalized PageRank over memory graph
│   └── bandit.py        # Beta-Bernoulli bandits + registry
├── core/
│   ├── surface/         # Observer, intent classifier, signal extractor
│   ├── nerve/           # Trajectory predictor, state manager
│   ├── staging/         # Prefetch cache, prefetcher, injector, reranker
│   ├── store/           # MetaStore, VectorStore, OutboxWorker
│   ├── write/           # Chunker, distiller, conflict resolver, annotator
│   ├── feedback/        # Hit/miss detector, feedback loop, dataset
│   └── session/         # SessionManager, SessionFactory
├── adapters/
│   ├── embedder/        # OpenAI, nomic, bge (local)
│   └── llm/             # Anthropic, OpenAI, Ollama
├── api/                 # FastAPI REST + WebSocket
├── observability/       # Prometheus metrics, OpenTelemetry tracing
├── cli/                 # presage CLI
├── docker/              # Dockerfile, docker-compose, .env.example
└── tests/
    ├── unit/            # Per-module unit tests (100% coverage target)
    ├── integration/     # Cross-layer integration tests
    └── bench/           # Latency benchmarks

What Makes Presage Different

Every existing LLM memory system — Mem0, MemGPT, Zep, LangChain memory, AriGraph — is pull-based. The LLM asks, the store answers.

Presage is push-based. The store predicts, prefetches, and pushes. By the time the LLM asks, the answer is already there.

System	Architecture	Retrieval	Self-improving
RAG	Vector search	Reactive	No
MemGPT	Episodic compression	Reactive	No
Zep	Bi-temporal KG	Reactive	No
A-MEM	Zettelkasten	Reactive	No
Presage	Kinematic trajectory	Proactive	Yes (bandits)

Roadmap

Cross-encoder reranker for P0-P1 slots (Phase 7 upgrade)
Token streaming in WebSocket (type: token events)
Fine-tuned trajectory predictor from accumulated dataset
Multi-user shared memory with access control
VSCode extension for native IDE integration
MCP server adapter (plug into Claude, Cursor, Zed)

License

_{Built with kinematic trajectory math, Bayesian bandits, and the conviction that memory should anticipate — not react.}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Mar 1, 2026

This version

0.1.0

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

presage-0.1.0.tar.gz (199.4 kB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

presage-0.1.0-py3-none-any.whl (92.1 kB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file presage-0.1.0.tar.gz.

File metadata

Download URL: presage-0.1.0.tar.gz
Upload date: Mar 1, 2026
Size: 199.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: Hatch/1.16.5 cpython/3.12.4 HTTPX/0.27.0

File hashes

Hashes for presage-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`662283051882831ccf2755ab65cd48a4161d59641dbfccd75ad78dd0c0b425e1`
MD5	`b6095a107ec37f60a09995d97db4da62`
BLAKE2b-256	`0102663cead631a3a27431dfc02118d3abc37617497dcfc20f34a0a9b87e34da`

See more details on using hashes here.

File details

Details for the file presage-0.1.0-py3-none-any.whl.

File metadata

Download URL: presage-0.1.0-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 92.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: Hatch/1.16.5 cpython/3.12.4 HTTPX/0.27.0

File hashes

Hashes for presage-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1207ee9511bcbab559f010b32e7a34d9b814b060f5bc09943b81fe74a5d07323`
MD5	`9f89e73661e88c5c670ab894050ac9df`
BLAKE2b-256	`4650e10b01568b92ab72482bc96f06fd2d9936fa12b31c7ece8dbab5d6a24b53`

See more details on using hashes here.

presage 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

presage

How It Works

The Math

Architecture

Storage

Quickstart

Prerequisites

Install

Initialize

Ingest your codebase

Chat

Start the API

REST API

Docker

Configuration

Prediction Strategies

Staging Slots

Benchmarks

Observability

Training Data Export

Project Structure

What Makes Presage Different

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes