Local context cache for LLM agents with semantic chunking and vector search
Project description
Stele Context
Local context cache for LLM agents with semantic chunking and vector search.
Stele Context helps LLM agents avoid re-reading unchanged files by caching chunk data with semantic search. Documents are routed through modality-specific chunkers, chunk content is stored in SQLite, and an HNSW vector index enables fast O(log n) retrieval. Only modified chunks trigger reprocessing.
For LLM agents: Stele is built so you can index a codebase once, optionally enrich chunks with your own summaries or vectors (no bundled model), then retrieve across later sessions from persistent storage. Read AGENTS.md, Design philosophy, and Agent workflow.
Key Features
- 100% Offline & Local-Only: No internet access, no external API calls, no cloud components
- Zero Required Dependencies: Runs on Python stdlib alone — no supply chain risks
- Multi-Modal Support: Text, code, images, PDFs, audio, and video (optional dependencies)
- HNSW Vector Index: O(log n) semantic search across all indexed chunks
- Hybrid Search: HNSW cosine similarity + BM25 keyword matching, auto-tuned blending; falls back to BM25 when vector/keyword signals disagree, scores are flat, or top raw cosine is weak; optional
search_mode=keywordfor BM25-only (deterministic keyword ranking) - Scoped
map/search: optionalpath_prefix(project-relative) to limit documents when the index spans multiple projects impact_radiussummary mode:summary_mode+top_n_filesfor depth counts and a bounded top-files list (large-hub files)- Index health:
mapandstatsexposeindex_health(counts, staleness,alerts,project_root) — see CHANGELOG for 1.0.5 - Agent orientation:
doctor/project_brief(MCP + CLI), boundedsearch/map/stats,get_contexttrust +agent_notes; see AGENTS.md - Tree-Sitter Chunking: AST-aware code chunking for 9 languages (optional, falls back to regex)
- Symbol Graph: Cross-file reference tracking —
find_references,find_definition,impact_radius - Multi-Agent Safe: Per-document locking, optimistic versioning, cross-worktree coordination
- MCP Server: JSON-RPC over stdio for Claude Desktop, HTTP REST for other agents
- Project Config:
.stele-context.tomlfile for per-project settings - Session Management: Sessions with rollback, pruning, and KV-cache persistence
Architecture
graph TB
subgraph API["API Layer"]
CLI["CLI<br/>stele-context index / search / serve"]
HTTP["HTTP REST<br/>unified tool registry, threaded"]
MCP["MCP stdio<br/>unified tool registry, JSON-RPC"]
end
subgraph Engine["Engine (engine.py)"]
CFG["Config<br/>.stele-context.toml loader"]
SEARCH["Hybrid Search<br/>HNSW + BM25"]
IDX["index_documents()<br/>detect_changes()"]
SYM["Symbol Graph<br/>12 languages"]
SESS["Sessions<br/>rollback, pruning"]
LOCK["Document Locking<br/>ownership, versioning"]
end
subgraph Chunkers["Chunkers"]
TXT["TextChunker"]
CODE["CodeChunker<br/>Python AST<br/>tree-sitter (9 langs)<br/>regex fallback"]
IMG["ImageChunker<br/>(Pillow)"]
PDF["PDFChunker<br/>(pymupdf)"]
AUD["AudioChunker<br/>(librosa)"]
VID["VideoChunker<br/>(opencv)"]
end
subgraph Storage["Storage"]
SQLITE["SQLite<br/>chunks, symbols,<br/>sessions, history"]
HNSW["HNSW Index<br/>128-dim vectors"]
BM25["BM25 Index<br/>keyword scoring"]
KV["KV Cache<br/>JSON + zlib"]
COORD["Coordination DB<br/>cross-worktree locks"]
end
CLI --> Engine
HTTP --> Engine
MCP --> Engine
Engine --> Chunkers
Engine --> Storage
Comparison
Stele targets offline, zero-core-dependency agent memory for a local codebase; other stacks often assume cloud APIs or large dependency trees. A durable design-dimensions view (and why Tier 2 is agent-driven) is in docs/philosophy.md.
| Dimension | Stele Context |
|---|---|
| Core runtime dependencies | Zero (stdlib only) |
| Network / cloud required | No |
| Who supplies “semantic” embeddings | Optional: you (summaries / vectors) or built-in Tier 1 stats only |
| Primary storage | SQLite + on-disk indices (project-local) |
| MCP / tool surface | Native CLI + HTTP + MCP |
Installation
# From PyPI (import as: import stele_context)
pip install stele-context
# With optional extras
pip install stele-context[performance] # faster vector math
pip install stele-context[tree-sitter] # AST-aware code chunking
pip install stele-context[all] # everything
Note: The PyPI package is
stele-contextand the import name isstele_context.
# From source
git clone https://github.com/IronAdamant/stele-context.git
cd stele-context
pip install -e .
# With dev dependencies
pip install -e ".[dev]"
Requirements
- Python 3.9+
- Zero required dependencies
Optional Extras (all 100% offline)
| Extra | Packages | Use Case |
|---|---|---|
performance |
msgspec, numpy | Faster serialization & vector math |
image |
Pillow | Image indexing & similarity |
pdf |
pymupdf | PDF text extraction |
audio |
librosa, numpy | Audio segmentation & features |
video |
opencv-python, numpy | Video keyframe extraction |
tree-sitter |
tree-sitter + 9 grammar packages | AST-aware code chunking for JS/TS, Java, C/C++, Go, Rust, Ruby, PHP |
mcp |
mcp | MCP stdio server for Claude Desktop |
all |
All of the above | Everything |
pip install stele-context[tree-sitter] # AST-aware code chunking
pip install stele-context[image,pdf] # Multi-modal
pip install stele-context[all] # Everything
Quick Start
1. Index Documents
stele-context index src/*.py docs/*.md
stele-context index --force document.py # Force re-index
2. Semantic Search
stele-context search "authentication logic" --top-k 5
stele-context search "exact keyword ranking" --search-mode keyword
stele-context search "error handling" --json
3. MCP Server (for Claude Code / Claude Desktop)
pip install stele-context[mcp]
stele-context serve-mcp
Claude Code (~/.claude/settings.json):
{
"mcpServers": {
"stele-context": {
"command": "stele-context",
"args": ["serve-mcp"]
}
}
}
Claude Desktop (~/.config/Claude/claude_desktop_config.json):
{
"mcpServers": {
"stele-context": {
"command": "stele-context",
"args": ["serve-mcp"]
}
}
}
Tip: If installed in a virtualenv, use the full path to the
stele-contextbinary.
4. HTTP REST Server
stele-context serve --port 9876
5. Project Configuration
Create .stele-context.toml in your project root:
[stele-context]
chunk_size = 512
max_chunk_size = 8192
merge_threshold = 0.75
change_threshold = 0.90
search_alpha = 0.6
skip_dirs = [".git", "node_modules", "dist", "vendor"]
All values are optional — constructor params and env vars override config file values.
Python API
from stele_context import Stele
engine = Stele()
# Index documents (auto-detects modality, walks directories)
result = engine.index_documents(["src/", "README.md"])
print(f"Indexed {result['total_chunks']} chunks")
# Hybrid semantic search (HNSW + BM25)
results = engine.search("authentication logic", top_k=5)
for r in results:
print(f"[{r['relevance_score']:.3f}] {r['document_path']}")
print(f" {r['content'][:100]}...")
# Get cached context — unchanged chunks skip reprocessing
context = engine.get_context(["src/main.py", "src/utils.py"])
for doc in context["unchanged"]:
print(f"{doc['path']}: {len(doc['chunks'])} cached chunks")
# Symbol graph — cross-file reference tracking
refs = engine.find_references("Stele")
defn = engine.find_definition("StorageBackend")
# Impact analysis — what breaks if this changes?
impact = engine.impact_radius(chunk_id="abc123", depth=2)
# Staleness detection — find chunks with stale dependencies
stale = engine.stale_chunks(threshold=0.3)
# Chunk version history
history = engine.get_chunk_history(document_path="src/main.py")
# Session management
engine.save_kv_state("session-1", {"chunk_id": {"key": "value"}})
engine.rollback("session-1", target_turn=2)
engine.prune_chunks("session-1", max_tokens=100000)
# Multi-agent document locking
engine.acquire_document_lock("src/main.py", agent_id="agent-alpha")
engine.index_documents(["src/main.py"], agent_id="agent-alpha")
engine.release_document_lock("src/main.py", agent_id="agent-alpha")
Configuration
engine = Stele(
chunk_size=256, # Target tokens per initial chunk
max_chunk_size=4096, # Maximum tokens per merged chunk
merge_threshold=0.7, # Similarity threshold for merging
change_threshold=0.85, # Similarity threshold for "unchanged"
search_alpha=0.42, # Blend: 1.0 = pure vector, 0.0 = pure keyword (default)
)
Or use .stele-context.toml (see above) — constructor params override config file values.
Agent-Supplied Semantic Embeddings
LLM agents already understand the semantics of every chunk they read. Instead of using a separate embedding model, Stele Context captures the agent's understanding directly:
# After indexing, the agent describes what each chunk does
engine.store_semantic_summary(
chunk_id="abc123",
summary="JWT authentication middleware that validates bearer tokens and attaches user identity to request context"
)
# Now searches like "token validation" match far better than
# statistical signatures on raw code would
results = engine.search("token validation middleware")
The agent IS the embedding model. Stele Context just stores and indexes what the agent tells it — zero new dependencies, no model downloads, no API calls.
How it works:
- Tier 1 (always): 128-dim statistical signatures — trigrams, bigrams, structural features. Used for change detection.
- Tier 2 (optional): Agent-supplied semantic summaries. Stele computes a signature from the summary text and uses it for HNSW search. ~9% improvement on semantic queries.
- Tier 2 alt:
store_embedding(chunk_id, vector)for agents with direct embedding API access.
MCP Tools
Both servers (unified registry)
The HTTP REST server and MCP stdio server expose the same tool set via tool_registry.py (see stele-context doctor / MCP doctor for a live-oriented snapshot).
| Category | Tools |
|---|---|
| Indexing | index, remove, detect_changes, detect_modality, get_supported_formats |
| Search | search, search_text, get_context, get_relevant_kv |
| Annotations | annotate, get_annotations, delete_annotation, update_annotation, search_annotations, bulk_annotate |
| Sessions | save_kv_state, rollback, prune_chunks, list_sessions |
| Symbols | find_references, find_definition, impact_radius, rebuild_symbols, stale_chunks |
| Locking | acquire_document_lock, release_document_lock, refresh_document_lock, get_document_lock_status, release_agent_locks, reap_expired_locks |
| History | get_conflicts, get_chunk_history, get_notifications, history, prune_history |
| Stats & Map | stats, map |
| Embeddings | store_semantic_summary, store_embedding |
| Utilities | list_agents, environment_check, clean_bytecache |
How It Works
Change Detection
For each chunk:
1. SHA-256 hash → exact match → instant cache hit (0 tokens)
2. Hash differs → compute 128-dim semantic signature
3. Cosine similarity > threshold → semantically similar → restore KV
4. Similarity ≤ threshold → significant change → reprocess
Token Savings
| Scenario | Without Stele Context | With Stele Context | Savings |
|---|---|---|---|
| Unchanged document | 10,000 tokens | 0 tokens | 100% |
| Minor edit (typo) | 10,000 tokens | ~100 tokens | 99% |
| Moderate edit | 10,000 tokens | ~1,000 tokens | 90% |
| Major rewrite | 10,000 tokens | 10,000 tokens | 0% |
Code Chunking Strategy
| Language | Parser | Fallback |
|---|---|---|
| Python | stdlib ast (always) |
regex |
| JS/TS, Java, C/C++, Go, Rust, Ruby, PHP | tree-sitter (optional) | regex patterns |
| Shell, Swift, SQL, config files | regex patterns | line-based |
Tree-sitter provides proper AST boundary detection for function/class definitions.
Install with pip install stele-context[tree-sitter].
Storage Layout
<project_root>/.stele-context/ # Per-worktree (default)
├── stele_context.db # SQLite: chunks, symbols, sessions, history
├── kv_cache/ # JSON + zlib compressed KV states
└── indices/ # HNSW + BM25 persistent indices
<git-common-dir>/stele-context/ # Shared across worktrees
└── coordination.db # Agent registry, shared locks, notifications
Multi-Agent Support
Stele Context supports multiple LLM agents sharing one store on the same machine.
| Layer | Protection |
|---|---|
| Thread safety | RWLock — concurrent reads, exclusive writes |
| Process safety | fcntl.flock() on index files |
| Document ownership | acquire_document_lock() with TTL expiry |
| Optimistic locking | doc_version compare-and-swap |
| Cross-worktree | Shared coordination DB for locks, agent registry, notifications |
| Conflict log | Full audit trail of ownership violations |
Performance
Run benchmarks:
python benchmarks/run_all.py # Full suite
python benchmarks/run_all.py --quick # CI mode
Representative results (quick mode):
| Operation | Size | Time | Throughput |
|---|---|---|---|
| TextChunker | 10KB | 1.6ms | 6,100 KB/s |
| CodeChunker (AST) | 10KB | 5.7ms | 1,750 KB/s |
| store_chunk (batch) | 100 | 27ms | 3,700 ops/s |
| VectorIndex.search (k=10) | 500 nodes | 4.7ms | 212 qps |
| BM25.score_batch | 100 docs | 0.18ms | 556K docs/s |
| engine.search (hybrid) | 50 docs | 9.9ms | 101 qps |
Security & Supply Chain
- Zero required dependencies — no supply chain attack surface for core functionality
- No model downloads — semantic signatures use statistical features, not ML models
- No API calls — everything runs locally, no data leaves your machine
- No pickle — session data serialized with JSON+zlib
- Minimal codebase — ~13,000 lines of Python, easy to audit
# Maximum security: install with zero dependencies
pip install stele-context --no-deps
Supported Formats
Text & Code (Zero Dependencies)
.txt, .md, .rst, .csv, .log, .py, .js, .ts, .jsx, .tsx, .java, .cpp, .c, .h, .go, .rs, .rb, .php, .swift, .sh, .json, .yaml, .toml, .html, .css, .sql
Images (requires Pillow)
.png, .jpg, .jpeg, .gif, .webp, .bmp, .tiff, .ico
PDFs (requires pymupdf)
.pdf
Audio (requires librosa)
.mp3, .wav, .ogg, .flac, .m4a, .aac, .wma
Video (requires opencv-python)
.mp4, .avi, .mov, .mkv, .webm, .flv, .wmv
Configuration Reference
Environment Variables
| Variable | Description |
|---|---|
STELE_CONTEXT_STORAGE_DIR |
Override default storage directory |
STELE_CONTEXT_LOG_LEVEL |
Logging level (DEBUG, INFO, WARNING, ERROR) |
Config File (.stele-context.toml)
[stele-context]
storage_dir = ".stele-context" # Storage directory (relative to project root)
chunk_size = 256 # Target tokens per initial chunk
max_chunk_size = 4096 # Maximum tokens per merged chunk
merge_threshold = 0.7 # Similarity threshold for merging chunks
change_threshold = 0.85 # Similarity threshold for "unchanged"
search_alpha = 0.42 # Hybrid search blend (1.0=vector, 0.0=keyword)
skip_dirs = [".git", "node_modules", "__pycache__"]
Priority: constructor params > .stele-context.toml > STELE_CONTEXT_STORAGE_DIR env var > defaults.
FAQ
Q: Does Stele Context require an internet connection? No. Stele Context is 100% offline. No API calls, no model downloads, no telemetry. All operations run locally using Python stdlib.
Q: How does Stele Context compare to RAG (Retrieval-Augmented Generation)? Stele Context is not RAG — it's a context cache. RAG retrieves chunks at query time from an external store. Stele Context caches chunk KV-states so the LLM skips re-reading unchanged content. It can be used alongside RAG, but its primary value is token savings through change detection.
Q: What happens if tree-sitter isn't installed?
Code chunking falls back to regex patterns for non-Python languages. Python always uses stdlib ast. Install tree-sitter for better accuracy on JS/TS, Java, C/C++, Go, Rust, Ruby, PHP: pip install stele-context[tree-sitter].
Q: Can multiple agents use Stele Context simultaneously? Yes. Stele Context provides per-document locking, optimistic versioning, and a cross-worktree coordination DB. Both HTTP and MCP servers auto-register agents and inject agent IDs into write operations.
Q: How accurate are the semantic signatures? The 128-dim statistical signatures (trigrams, bigrams, structural features) are approximate. They're designed for change detection (same vs different), not for embedding-quality similarity. For typical code and documentation, they achieve ~95% accuracy on change detection.
Q: Where is data stored?
By default, <project_root>/.stele-context/ (each git worktree gets its own). Override with STELE_CONTEXT_STORAGE_DIR or storage_dir in .stele-context.toml. Cross-worktree coordination data lives in <git-common-dir>/stele-context/coordination.db.
Troubleshooting
ImportError: No module named 'stele_context'
Ensure Stele Context is installed: pip install -e . from the repo root. If using a virtualenv, make sure it's activated.
MCP server not connecting in Claude Desktop
Use the full path to the stele-context binary. Check with which stele-context and update your config. If installed in a virtualenv: /path/to/.venv/bin/stele-context.
PermissionError when indexing
Another agent holds a lock on the document. Check with get_document_lock_status() or reap_expired_locks() to clean up stale locks.
Slow search on large indices
The HNSW index adapts search width automatically. For 10K+ chunks, search uses 4x ef_search. If still slow, reduce top_k or check that the BM25 index isn't being rebuilt on every query (it's lazy-loaded once).
Tree-sitter not working for a language
Verify the grammar package is installed: pip install tree-sitter-javascript (etc.). Check with: python -c "from stele_context.chunkers.code import HAS_TREE_SITTER; print(HAS_TREE_SITTER)".
Stale .pyc files causing issues
Run stele-context with the environment_check MCP tool, or call engine.check_environment(). Use engine.clean_bytecache() to remove orphaned .pyc files.
Releasing (maintainers)
- Bump
versioninpyproject.tomlandstele_context/__init__.py, updateCHANGELOG.md. - Tag and push:
git tag -a vX.Y.Z -m "..." && git push origin main && git push origin vX.Y.Z - PyPI (no repo secret): Use Trusted Publishing — on pypi.org open the project → Publishing → Add a new publisher → GitHub → set repository owner / repository name / workflow name
publish.ymlto match this repo. GitHub Actions then uses OIDC; you do not needPYPI_API_TOKENin GitHub. If publish fails withinvalid-publisher, the publisher entry on PyPI does not match the repo or workflow file name. - Create a GitHub Release from the tag (or Actions → Publish to PyPI → Run workflow).
The package has no runtime dependencies (dependencies = []); optional extras stay in [project.optional-dependencies].
Development
pip install -e ".[dev]"
pytest # 860+ tests
pytest --cov=stele_context # With coverage
python benchmarks/run_all.py # Performance benchmarks
mypy stele_context/ # Type checking
ruff check stele_context/ # Linting
Entry points: stele-context (CLI), stele-context-mcp (MCP stdio server)
Contributing
See CONTRIBUTING.md for guidelines.
License
MIT License — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stele_context-1.0.9.tar.gz.
File metadata
- Download URL: stele_context-1.0.9.tar.gz
- Upload date:
- Size: 170.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9db2a917f3ca86b4b9f4a73932c4347a935457a1bf4e734ab50d0f2eb5c736a
|
|
| MD5 |
32438e2718d529795ba0799234a58a11
|
|
| BLAKE2b-256 |
e339730b9a5b78c4ce97ce0f88d88b6b55c1429d68d5f3a2f7c314dd923c4f6c
|
Provenance
The following attestation bundles were made for stele_context-1.0.9.tar.gz:
Publisher:
publish.yml on IronAdamant/stele-context
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stele_context-1.0.9.tar.gz -
Subject digest:
b9db2a917f3ca86b4b9f4a73932c4347a935457a1bf4e734ab50d0f2eb5c736a - Sigstore transparency entry: 1203328965
- Sigstore integration time:
-
Permalink:
IronAdamant/stele-context@5330c01dbd459e1a5626824ff8724ca6472f619e -
Branch / Tag:
refs/tags/v1.0.9 - Owner: https://github.com/IronAdamant
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5330c01dbd459e1a5626824ff8724ca6472f619e -
Trigger Event:
release
-
Statement type:
File details
Details for the file stele_context-1.0.9-py3-none-any.whl.
File metadata
- Download URL: stele_context-1.0.9-py3-none-any.whl
- Upload date:
- Size: 164.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02303085020d62fb972e3535a6d9d6ea75dfb86b66fa6c6eb391fa07dc4f77c6
|
|
| MD5 |
82353d560e8128cf4c119b54420b870f
|
|
| BLAKE2b-256 |
d0bc2da60e4b78e34aa0f14dea068a03cda7796b6c332f32fe3227f5ce07891d
|
Provenance
The following attestation bundles were made for stele_context-1.0.9-py3-none-any.whl:
Publisher:
publish.yml on IronAdamant/stele-context
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stele_context-1.0.9-py3-none-any.whl -
Subject digest:
02303085020d62fb972e3535a6d9d6ea75dfb86b66fa6c6eb391fa07dc4f77c6 - Sigstore transparency entry: 1203328969
- Sigstore integration time:
-
Permalink:
IronAdamant/stele-context@5330c01dbd459e1a5626824ff8724ca6472f619e -
Branch / Tag:
refs/tags/v1.0.9 - Owner: https://github.com/IronAdamant
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5330c01dbd459e1a5626824ff8724ca6472f619e -
Trigger Event:
release
-
Statement type: