Fast code and document search for AI agents via MCP
Project description
Tessera
Persistent codebase intelligence for autonomous AI agents. Tessera gives agents bottom-up file access and top-down code understanding — across every project they're authorized to touch, with security from the ground up.
The Problem
Persistent AI agents — orchestrators like AutoJack, task agents like OpenClaw — need to understand codebases the way a senior developer does. Not just "find this string in a file," but "what calls this function, across which projects, and what breaks if I change it?"
Today's agents burn context window and wall-clock time on repeated grep / find / cat cycles. They lose track of project structure between conversations. They can't safely delegate to sub-agents without leaking access to projects those agents shouldn't see. And they can't search documentation, config files, or assets alongside code.
What Tessera Does
Tessera indexes everything — code, documents, config files, media assets, binary files — into a structured, chunked, searchable database. It exposes that through 18 MCP tools that any agent can call. Responses come back in milliseconds, not seconds.
For orchestrator agents: Full system visibility. Register projects, group them into collections, search across all of them. Understand cross-project dependencies. Delegate scoped access to sub-agents via session tokens.
For task agents: Deep code intelligence within their authorized scope. Symbol lookup, reference tracing, impact analysis, document search — everything an IDE provides, but through tool calls.
For security: Deny-by-default scope gating. Sub-agents only see what the orchestrator explicitly grants. Credentials and secrets are blocked from indexing by un-negatable security patterns. No ambient access, no scope creep.
Code Intelligence
- Symbol search — Functions, classes, methods, hooks by name or pattern
- Reference tracing — Call graphs, imports, inheritance chains
- Impact analysis — "What breaks if I change this?" — traced N levels deep
- File context — Complete structural overview of any file in one call
- Cross-project references — Track where project A's exports are used in project B
Event & Hook Analysis
- Cross-language event detection — WordPress hooks (PHP), EventEmitter/DOM (JS/TS), Django signals (Python),
@wordpress/hooks(JS) - Directional edges — Distinguish who registers a listener from who fires an event
- Mismatch detection — Find orphaned listeners (registered, never fired) and unfired events (fired, nobody listening)
- Action vs filter classification — WordPress-specific subtyping preserved without runtime analysis
- Wildcard queries —
events("pum_%")to explore hook namespaces - Pagination — Browse large event sets without blowing context windows
# Who listens to this hook?
events("pum_popup_content", direction="registers_on")
# Find all unfired events (potential dead code)
events(detect_mismatches=True, mismatch_filter="unfired")
# Explore a hook namespace
events("pum_%", direction="fires", limit=20)
Document & Text Search
- Chunked indexing — Files are split into focused, searchable chunks with metadata (by header, key path, or line group) — not stored as monolithic blobs
- Code + docs unified — Query across everything, or filter by source type (
code,asset,document) - Structural formats — PDF, Markdown (break-point scoring with distance decay), YAML/JSON (key-path chunking)
- Markup — HTML/XML with tag stripping
- Plaintext —
.txt,.rst,.csv,.log,.ini,.cfg,.toml, config files, dotfiles
Media & Binary File Indexing
- Asset discovery — Images, videos, audio, fonts, and archives are automatically discovered and indexed
- Metadata extraction — Filename, path, MIME type, file size, and image dimensions (PNG, JPEG, GIF, BMP) — zero external dependencies
- FTS5 searchable — Search for assets by name, category, format, or path components
- Source type filtering — Filter search results to
asset,code, ordocumentvia thesource_typeparameter - SVG dual-indexing — SVGs indexed as both searchable XML documents and image assets
Multi-Project Federation
- Project collections — Group related projects (e.g., a plugin ecosystem) and query across them
- Scope-gated access — Session tokens control what each agent can see. Orchestrators create scoped tokens for sub-agents.
- Search-time federation — Data stays at project level, merged at query time. No duplication.
Security
- Deny-by-default — No access without a valid session token
.tesseraignore— Per-project ignore config with.gitignoresyntax- Two-tier ignore system — Security-critical patterns (
.env*,*.pem,*credentials*) are locked and cannot be overridden by project config trustedfield — Search results from code are marked trusted; document content is marked untrusted so agents can handle prompt injection risk
Infrastructure
- Fully embedded — SQLite + FAISS. No Docker, no daemons, no external servers
- Incremental indexing — Git-aware, only re-indexes changed files
- Schema migration — Versioned database schema with automatic upgrades
- Drift adapter — Switch embedding models without re-indexing (Orthogonal Procrustes)
Supported Languages
Go, Ruby, Swift, PHP, TypeScript, JavaScript, Python — via tree-sitter grammars.
MCP Tools (19)
Search & Navigation
| Tool | Purpose |
|---|---|
search |
Hybrid keyword + semantic search across code, documents, and assets (filterable by source_type) |
doc_search_tool |
Document-only search (filterable by format or source_type) |
symbols |
Look up functions, classes, methods by name/pattern/kind |
references |
Find all references to a symbol (calls, imports, extends) |
file_context |
Complete context for a file (symbols, refs, structure) |
impact |
Trace downstream impact of changing a symbol |
cross_refs |
Cross-project references to a symbol |
events |
Analyze event/hook registrations, emissions, and mismatches across languages |
collection_map |
Overview of projects in a collection with stats |
Administration
| Tool | Purpose |
|---|---|
register_project |
Register a project for indexing |
reindex |
Trigger full or incremental re-index |
status |
Project indexing status and health |
drift_train |
Train embedding drift adapter for model migration |
Access Control
| Tool | Purpose |
|---|---|
create_scope_tool |
Create scoped session tokens for sub-agents |
revoke_scope_tool |
Revoke agent session tokens |
create_collection_tool |
Create a project collection |
add_to_collection_tool |
Add a project to a collection |
list_collections_tool |
List all collections |
delete_collection_tool |
Delete a collection |
Quick Start
Requirements
- Python 3.11+
- uv (recommended) or pip
Install
git clone https://github.com/danieliser/tessera.git
cd tessera
uv sync
Run as MCP Server
Add to your .mcp.json:
{
"mcpServers": {
"tessera": {
"command": "uv",
"args": [
"--directory", "/path/to/tessera",
"run", "python", "-m", "tessera", "serve"
]
}
}
}
Lock to a specific project (single-project mode):
uv run python -m tessera serve --project /path/to/your/project
Embedding Setup (Optional)
Tessera works without embeddings (keyword search only via FTS5). For semantic search, point it at any local OpenAI-compatible embedding endpoint. The embedding dimension is auto-detected — no configuration needed.
Recommended: LM Studio with nomic-embed-text or any embedding model serving on /v1/embeddings.
Search Quality
Validated against Next.js v16.1.6 (3,677 chunks, 1,729 files) and Popup Maker (~580 files):
| Dataset | Doc Top-10 | Cross Top-10 | Code Top-10 | Blend MRR |
|---|---|---|---|---|
| Next.js | 100% | 100% | 70% | 0.748 |
| Popup Maker | — | — | 90% | 0.542 |
Default stack: BGE-base (768d, ~210MB) with filename-aware RRF boosting. No reranker needed — BGE-base vectors are strong enough on their own. PPR graph ranking available for impact/reference analysis but disabled in search (neutral-to-harmful in benchmarks).
Run Tests
uv run pytest tests/ -v
Architecture
MCP Server (stdio)
├── Scope Validator (session-based, deny-by-default)
├── Query Router (project / collection / global)
│ ├── Search (FTS5 keyword + FAISS semantic + RRF merge)
│ ├── Symbols / References / Impact (SQLite graph)
│ └── Document Search (source_type filtering)
├── Per-Project Indexes
│ ├── SQLite (symbols, references, edges, files, chunk_meta)
│ └── FAISS (vector embeddings)
├── Global SQLite (~/.tessera/global.db)
│ ├── projects, collections, sessions
│ └── indexing_jobs
└── Indexer Pipeline
├── Tree-sitter parser (PHP, TS, JS, Python, Swift)
├── AST-aware code chunking
├── Document extraction (PDF, MD, YAML, JSON, HTML, XML, plaintext)
├── Asset metadata extraction (images, video, audio, fonts, archives)
└── Ignore filter (.tesseraignore, two-tier security)
Design Principles
- No external dependencies at runtime — SQLite + FAISS, fully embedded
- Tree-sitter for deterministic parsing — no LLM-extracted graphs, no hallucinated edges
- Chunked everything — every file is split into focused, searchable units with structural metadata
- Security-first scope model — deny-by-default, session-scoped, un-negatable credential protection
- Federation over duplication — data stays at project level, merged at query time
Project Status
v0.10.1 — Event/hook analysis with directional edges, per-language parser plugins, mismatch detection, action/filter subtyping, @wordpress/hooks support.
| Phase | Status | What |
|---|---|---|
| 1 | Done | Single-project indexer + scoped MCP server |
| 2 | Done | Incremental indexing + persistence |
| 3 | Done | Collection federation + cross-project refs |
| 4 | Done | Document indexing + drift adapter + ignore config + text formats |
| 4.5 | Done | Media/binary file metadata catalog |
| 5 | Done | PPR graph ranking + semantic snippet scoring |
| 6 | Planned | Always-on file watcher |
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tessera_idx-0.12.1.tar.gz.
File metadata
- Download URL: tessera_idx-0.12.1.tar.gz
- Upload date:
- Size: 867.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be5831a01826d29d8d5fe82b127ab19ad53a440df1a5475d0dc876dbf994766e
|
|
| MD5 |
59a24e85ffffdf9b980c576215d301ec
|
|
| BLAKE2b-256 |
f52f39a32899ba865420f7a11c8ebc1a93da3550056cbf21aba490e01ce66943
|
Provenance
The following attestation bundles were made for tessera_idx-0.12.1.tar.gz:
Publisher:
publish.yml on danieliser/tessera
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tessera_idx-0.12.1.tar.gz -
Subject digest:
be5831a01826d29d8d5fe82b127ab19ad53a440df1a5475d0dc876dbf994766e - Sigstore transparency entry: 1125726618
- Sigstore integration time:
-
Permalink:
danieliser/tessera@e556fe949e5a473a612a225e31059071c53b82e6 -
Branch / Tag:
refs/tags/v0.12.1 - Owner: https://github.com/danieliser
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e556fe949e5a473a612a225e31059071c53b82e6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file tessera_idx-0.12.1-py3-none-any.whl.
File metadata
- Download URL: tessera_idx-0.12.1-py3-none-any.whl
- Upload date:
- Size: 145.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94a5e6f1dc7f4e29d6cf7339109ca83725702e1a187a7f0dde77146cfdf7a752
|
|
| MD5 |
c7345ad39c4f12916adc889319cdfab0
|
|
| BLAKE2b-256 |
f1609846fdb3fe3fbd100f8a8c06ac8c3c45a3e25ad06ecd769795c421057170
|
Provenance
The following attestation bundles were made for tessera_idx-0.12.1-py3-none-any.whl:
Publisher:
publish.yml on danieliser/tessera
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tessera_idx-0.12.1-py3-none-any.whl -
Subject digest:
94a5e6f1dc7f4e29d6cf7339109ca83725702e1a187a7f0dde77146cfdf7a752 - Sigstore transparency entry: 1125726672
- Sigstore integration time:
-
Permalink:
danieliser/tessera@e556fe949e5a473a612a225e31059071c53b82e6 -
Branch / Tag:
refs/tags/v0.12.1 - Owner: https://github.com/danieliser
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e556fe949e5a473a612a225e31059071c53b82e6 -
Trigger Event:
push
-
Statement type: