Skip to main content

Hierarchical, scope-gated codebase indexing and persistent memory system for AI agents

Project description

Tessera

Persistent codebase intelligence for autonomous AI agents. Tessera gives agents bottom-up file access and top-down code understanding — across every project they're authorized to touch, with security from the ground up.

The Problem

Persistent AI agents — orchestrators like AutoJack, task agents like OpenClaw — need to understand codebases the way a senior developer does. Not just "find this string in a file," but "what calls this function, across which projects, and what breaks if I change it?"

Today's agents burn context window and wall-clock time on repeated grep / find / cat cycles. They lose track of project structure between conversations. They can't safely delegate to sub-agents without leaking access to projects those agents shouldn't see. And they can't search documentation, config files, or assets alongside code.

What Tessera Does

Tessera indexes everything — code, documents, config files, media assets, binary files — into a structured, chunked, searchable database. It exposes that through 18 MCP tools that any agent can call. Responses come back in milliseconds, not seconds.

For orchestrator agents: Full system visibility. Register projects, group them into collections, search across all of them. Understand cross-project dependencies. Delegate scoped access to sub-agents via session tokens.

For task agents: Deep code intelligence within their authorized scope. Symbol lookup, reference tracing, impact analysis, document search — everything an IDE provides, but through tool calls.

For security: Deny-by-default scope gating. Sub-agents only see what the orchestrator explicitly grants. Credentials and secrets are blocked from indexing by un-negatable security patterns. No ambient access, no scope creep.

Code Intelligence

  • Symbol search — Functions, classes, methods, hooks by name or pattern
  • Reference tracing — Call graphs, imports, inheritance chains
  • Impact analysis — "What breaks if I change this?" — traced N levels deep
  • File context — Complete structural overview of any file in one call
  • Cross-project references — Track where project A's exports are used in project B

Document & Text Search

  • Chunked indexing — Files are split into focused, searchable chunks with metadata (by header, key path, or line group) — not stored as monolithic blobs
  • Code + docs unified — Query across everything, or filter by source type (code, asset, document)
  • Structural formats — PDF, Markdown (break-point scoring with distance decay), YAML/JSON (key-path chunking)
  • Markup — HTML/XML with tag stripping
  • Plaintext.txt, .rst, .csv, .log, .ini, .cfg, .toml, config files, dotfiles

Media & Binary File Indexing

  • Asset discovery — Images, videos, audio, fonts, and archives are automatically discovered and indexed
  • Metadata extraction — Filename, path, MIME type, file size, and image dimensions (PNG, JPEG, GIF, BMP) — zero external dependencies
  • FTS5 searchable — Search for assets by name, category, format, or path components
  • Source type filtering — Filter search results to asset, code, or document via the source_type parameter
  • SVG dual-indexing — SVGs indexed as both searchable XML documents and image assets

Multi-Project Federation

  • Project collections — Group related projects (e.g., a plugin ecosystem) and query across them
  • Scope-gated access — Session tokens control what each agent can see. Orchestrators create scoped tokens for sub-agents.
  • Search-time federation — Data stays at project level, merged at query time. No duplication.

Security

  • Deny-by-default — No access without a valid session token
  • .tesseraignore — Per-project ignore config with .gitignore syntax
  • Two-tier ignore system — Security-critical patterns (.env*, *.pem, *credentials*) are locked and cannot be overridden by project config
  • trusted field — Search results from code are marked trusted; document content is marked untrusted so agents can handle prompt injection risk

Infrastructure

  • Fully embedded — SQLite + FAISS. No Docker, no daemons, no external servers
  • Incremental indexing — Git-aware, only re-indexes changed files
  • Schema migration — Versioned database schema with automatic upgrades
  • Drift adapter — Switch embedding models without re-indexing (Orthogonal Procrustes)

Supported Languages

PHP, TypeScript, JavaScript, Python, Swift — via tree-sitter grammars.

MCP Tools (18)

Search & Navigation

Tool Purpose
search Hybrid keyword + semantic search across code, documents, and assets (filterable by source_type)
doc_search_tool Document-only search (filterable by format or source_type)
symbols Look up functions, classes, methods by name/pattern/kind
references Find all references to a symbol (calls, imports, extends)
file_context Complete context for a file (symbols, refs, structure)
impact Trace downstream impact of changing a symbol
cross_refs Cross-project references to a symbol
collection_map Overview of projects in a collection with stats

Administration

Tool Purpose
register_project Register a project for indexing
reindex Trigger full or incremental re-index
status Project indexing status and health
drift_train Train embedding drift adapter for model migration

Access Control

Tool Purpose
create_scope_tool Create scoped session tokens for sub-agents
revoke_scope_tool Revoke agent session tokens
create_collection_tool Create a project collection
add_to_collection_tool Add a project to a collection
list_collections_tool List all collections
delete_collection_tool Delete a collection

Quick Start

Requirements

  • Python 3.11+
  • uv (recommended) or pip

Install

git clone https://github.com/danieliser/tessera.git
cd tessera
uv sync

Run as MCP Server

Add to your .mcp.json:

{
  "mcpServers": {
    "tessera": {
      "command": "uv",
      "args": [
        "--directory", "/path/to/tessera",
        "run", "python", "-m", "tessera", "serve"
      ]
    }
  }
}

Lock to a specific project (single-project mode):

uv run python -m tessera serve --project /path/to/your/project

Embedding Setup (Optional)

Tessera works without embeddings (keyword search only via FTS5). For semantic search, point it at any local OpenAI-compatible embedding endpoint. The embedding dimension is auto-detected — no configuration needed.

Recommended: LM Studio with nomic-embed-text or any embedding model serving on /v1/embeddings.

Run Tests

uv run pytest tests/ -v

Architecture

MCP Server (stdio)
├── Scope Validator (session-based, deny-by-default)
├── Query Router (project / collection / global)
│   ├── Search (FTS5 keyword + FAISS semantic + RRF merge)
│   ├── Symbols / References / Impact (SQLite graph)
│   └── Document Search (source_type filtering)
├── Per-Project Indexes
│   ├── SQLite (symbols, references, edges, files, chunk_meta)
│   └── FAISS (vector embeddings)
├── Global SQLite (~/.tessera/global.db)
│   ├── projects, collections, sessions
│   └── indexing_jobs
└── Indexer Pipeline
    ├── Tree-sitter parser (PHP, TS, JS, Python, Swift)
    ├── AST-aware code chunking
    ├── Document extraction (PDF, MD, YAML, JSON, HTML, XML, plaintext)
    ├── Asset metadata extraction (images, video, audio, fonts, archives)
    └── Ignore filter (.tesseraignore, two-tier security)

Design Principles

  • No external dependencies at runtime — SQLite + FAISS, fully embedded
  • Tree-sitter for deterministic parsing — no LLM-extracted graphs, no hallucinated edges
  • Chunked everything — every file is split into focused, searchable units with structural metadata
  • Security-first scope model — deny-by-default, session-scoped, un-negatable credential protection
  • Federation over duplication — data stays at project level, merged at query time

Project Status

v0.6.0 — Hybrid search with semantic snippet scoring, PPR graph ranking, collapsed ancestry context, and stale index detection.

Phase Status What
1 Done Single-project indexer + scoped MCP server
2 Done Incremental indexing + persistence
3 Done Collection federation + cross-project refs
4 Done Document indexing + drift adapter + ignore config + text formats
4.5 Done Media/binary file metadata catalog
5 Done PPR graph ranking + semantic snippet scoring
6 Planned Always-on file watcher

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tessera_idx-0.7.0.tar.gz (576.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tessera_idx-0.7.0-py3-none-any.whl (102.7 kB view details)

Uploaded Python 3

File details

Details for the file tessera_idx-0.7.0.tar.gz.

File metadata

  • Download URL: tessera_idx-0.7.0.tar.gz
  • Upload date:
  • Size: 576.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tessera_idx-0.7.0.tar.gz
Algorithm Hash digest
SHA256 817eed5cc24a4db417da007fb2ea4bbbe8b1261f43ed39f5f1b903ac95d65606
MD5 1be470564ab9a5d7a72f0c45daa8aa9e
BLAKE2b-256 93f79dd1cc466a68d9764a0965893a7b5529cefb3a5bfe7dc4077a85ee9a2b7c

See more details on using hashes here.

Provenance

The following attestation bundles were made for tessera_idx-0.7.0.tar.gz:

Publisher: publish.yml on danieliser/tessera

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tessera_idx-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: tessera_idx-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 102.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tessera_idx-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e507b2fa72ad449d464b5477fcdc82b91bc024a0c3878db5245dbe8f1e7014bd
MD5 7bb13141dd6cfc6ae1bae5a0d438f8af
BLAKE2b-256 2f167513d8abf73bd3f50e926c465e74a2eac0440046b6820bcd02e01b0906ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for tessera_idx-0.7.0-py3-none-any.whl:

Publisher: publish.yml on danieliser/tessera

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page