Skip to main content

Universal code-indexer MCP server for AI coding agents

Project description

SymDex

PyPI version Python versions License

Universal code-indexer MCP server for AI coding agents.
Claude · Cursor · Codex CLI · Gemini CLI · GitHub Copilot · Windsurf · Zed · OpenCode · Any agent that speaks MCP.

Pre-index your codebase once. Let AI agents find any symbol in ~200 tokens instead of reading whole files at ~7,500 tokens.
That is a 97% reduction — per lookup, every lookup.

The part no other code indexer does:
Don't know the function name? semantic_search("validate email addresses") finds it anyway.
No grep. No file reading. No guessing. One query, exact location.

pip install symdex

The Problem

Every time an AI coding agent needs to find a function, it reads the entire file that might contain it. Here is what that looks like in practice:

Agent thought: "I need to find the validate_email function."
Agent action: Read auth/utils.py          → 7,500 tokens consumed
Agent action: Read auth/validators.py     → 6,200 tokens consumed
Agent action: Read core/helpers.py        → 8,100 tokens consumed
Agent finds it on the third try.          → 21,800 tokens wasted

This is the equivalent of reading an entire book from page one every time you want to find a single paragraph — when the book has an index sitting right there.

On a large codebase, a single development session can burn hundreds of thousands of tokens this way. That is real money, real slowness, and real context-window pressure.

SymDex is the index.


How It Works

┌─────────────────────────────────────────────────────────────────┐
│  STEP 1 — Index once (you run this, takes seconds to minutes)   │
│                                                                 │
│  symdex index ./myproject                                       │
│         │                                                       │
│         ▼                                                       │
│  tree-sitter parses every source file                           │
│         │                                                       │
│         ▼                                                       │
│  Every function, class, method extracted                        │
│  with name · kind · file · exact byte offsets · docstring       │
│         │                                                       │
│         ▼                                                       │
│  Stored in SQLite database  +  vector embeddings (sqlite-vec)   │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│  STEP 2 — Agent queries SymDex instead of reading files         │
│                                                                 │
│  Without SymDex:                                                │
│  Agent → read auth/utils.py (full) → 7,500 tokens              │
│                                                                 │
│  With SymDex:                                                   │
│  Agent → search_symbols("validate_email")                       │
│        → { file: "auth/utils.py", start_byte: 1024,            │
│            end_byte: 1340 }          → ~200 tokens             │
│  Agent → read bytes 1024–1340 only  → done                     │
└─────────────────────────────────────────────────────────────────┘

SymDex does not read files for the agent. It tells the agent exactly where to look — file path and byte offset — so the agent reads only the bytes it needs. Nothing more.


Real-World Example

Here is a complete session showing how an agent uses SymDex to navigate a codebase:

Setup — index the project once:

symdex index ./myproject --name myproject
symdex serve   # start the MCP server

Agent calls search_symbols to locate a function:

// Tool call
{ "tool": "search_symbols", "query": "validate_email", "repo": "myproject" }

// Response (~200 tokens)
{
  "symbols": [
    {
      "name": "validate_email",
      "kind": "function",
      "file": "auth/utils.py",
      "start_byte": 1024,
      "end_byte": 1340,
      "signature": "def validate_email(email: str) -> bool"
    }
  ]
}

Agent calls get_symbol to read only that function:

// Tool call — reads bytes 1024 to 1340 only
{ "tool": "get_symbol", "file": "auth/utils.py", "start_byte": 1024, "end_byte": 1340, "repo": "myproject" }

// Response — the exact function source, nothing else
{
  "source": "def validate_email(email: str) -> bool:\n    \"\"\"Validate email format.\"\"\"\n    pattern = r'^[\\w.-]+@[\\w.-]+\\.\\w+$'\n    return bool(re.match(pattern, email))"
}

Agent calls get_callers to understand impact before changing it:

// Tool call
{ "tool": "get_callers", "name": "validate_email", "repo": "myproject" }

// Response
{
  "callers": [
    { "name": "register_user",  "file": "auth/views.py",  "kind": "function" },
    { "name": "update_profile", "file": "users/views.py", "kind": "function" }
  ]
}

Agent uses semantic_search when it doesn't know the exact name:

// Tool call
{ "tool": "semantic_search", "query": "check if user email address is valid", "repo": "myproject" }

// Response — finds by meaning, not by name
{
  "symbols": [
    { "name": "validate_email", "score": 0.91, "file": "auth/utils.py" },
    { "name": "is_valid_address", "score": 0.74, "file": "core/validators.py" }
  ]
}

Total tokens for this entire session: ~800 tokens. Without SymDex, finding and reading these three functions would cost ~25,000 tokens.


For AI Agents

If you are an AI agent reading this: Follow these rules to use SymDex correctly.

Rule 1 — Index before anything else. Call index_folder with the project path at the start of every session. Without this, the index is empty and all searches return nothing.

{ "tool": "index_folder", "path": "/path/to/project", "name": "myproject" }

Rule 2 — Never read a full file to find a function. Always call search_symbols first. Use the returned start_byte and end_byte to read only the relevant bytes via get_symbol.

Rule 3 — Use the right tool for the right query.

What you want Tool to call
Find a function by name search_symbols
Read a function's source get_symbol (use byte offsets from search)
Find by meaning / description semantic_search
See all symbols in a file get_file_outline
Understand project structure get_repo_outline
Find who calls a function get_callers
Find what a function calls get_callees
Search for a string in code search_text

Rule 4 — Re-index after code changes. Call index_folder again (or invalidate_cache for a specific file) after modifying source files so the index reflects the latest state.


SymDex vs. Conventional Approach

Capability Conventional (read files) SymDex
Find a function by name Read entire file(s) Byte-offset lookup — read only those bytes
Token cost per lookup ~7,500 tokens (one file) ~200 tokens
Token cost across a session Compounds per lookup Fixed per lookup — does not compound
Search by meaning Not possible Semantic embedding search — finds by intent
"Who calls this function?" Read every file manually Pre-built call graph — instant answer
"What does this function call?" Read function body manually Pre-built call graph — instant answer
"What API routes does this repo expose?" Read every route file search_routes — instant, no file reading
Search across multiple projects Not possible Cross-repo registry — one SymDex, many projects
Keep index current after edits Manual re-run symdex watch — auto-reindex on save
Context window pressure High — full files accumulate Low — precise snippets only
Works with any AI agent Agent-specific plugins Any MCP-compatible agent — one config
Requires editor / language server Often yes No — standalone, terminal-native
Command-line access Not available Full CLI included
Re-index on changes Full re-read every time SHA-256 change detection — only re-indexes changed files

Features

Symbol Search

Find any function, class, method, or variable by name across your entire indexed codebase. Returns file path and exact byte offsets. No file reading required.

Semantic Search

Can't remember the exact function name? Search by what it does.

symdex semantic "parse and validate an authentication token" --repo myproject

SymDex embeds every symbol's signature and docstring into a vector and finds the closest matches by meaning — not by keyword. Powered by sentence-transformers running fully locally, no API calls required.

Call Graph

Understand the impact of any change before you make it.

symdex callers process_payment --repo myproject   # Who calls this? (impact analysis)
symdex callees process_payment --repo myproject   # What does this call? (dependency trace)

Call relationships are extracted during indexing and stored as a graph. No file reading at query time.

Cross-Repo Registry

Index multiple projects and search across all of them from one place.

symdex index ./frontend --name frontend
symdex index ./backend  --name backend
symdex search "validate_token"           # searches both repos simultaneously

Each repo gets its own SQLite database. The registry tracks all of them.

Change Detection

SymDex stores a SHA-256 hash of every indexed file. Re-indexing only processes files that have actually changed. On large codebases this makes incremental updates take seconds, not minutes.

Full CLI

Every MCP tool is also available as a CLI command. Use SymDex without an AI agent — in scripts, in CI, or just to explore your codebase.

Auto-Watch — Live Index

Run symdex watch once. Every time you save a file, SymDex automatically re-indexes only the changed file. Delete a file — SymDex removes it from the index. No more manual symdex index after every edit.

symdex watch ./myproject              # Index now, then watch for changes
symdex watch ./myproject --interval 3 # Check every 3 seconds (default: 5)

Works as a background process alongside your development workflow. The index stays current without any agent interruption.

HTTP Route Indexing

SymDex automatically extracts HTTP API routes during indexing and makes them searchable. No more reading route files to understand an API surface.

Supported frameworks: Flask · FastAPI · Django · Express

symdex routes myproject               # All routes in the repo
symdex routes myproject -m POST       # Only POST routes
symdex routes myproject -p /users     # Routes matching a path pattern

Via MCP tool (agents can call this directly):

{ "tool": "search_routes", "repo": "myproject", "method": "GET" }
// → [{ "method": "GET", "path": "/users", "handler": "list_users", "file": "api/views.py" }, ...]

HTTP + stdio Transport

Run SymDex as a local stdio server (default, for desktop agents) or as an HTTP server for remote access.

symdex serve              # stdio — for Claude, Cursor, Copilot, Gemini CLI, Codex CLI, etc.
symdex serve --port 8080  # HTTP — for remote agents or services

Supported Languages

SymDex parses source files using tree-sitter — a fast, robust, incremental parser used by major editors including Neovim, Helix, and GitHub.

Language File Extensions
Python .py
JavaScript .js .mjs
TypeScript .ts .tsx
Go .go
Rust .rs
Java .java
PHP .php
C# .cs
C .c .h
C++ .cpp .cc .h
Elixir .ex .exs
Ruby .rb
Vue .vue

13 languages. More can be added by installing additional tree-sitter grammar packages.


Supported Platforms

SymDex speaks the Model Context Protocol (MCP) — the open standard for connecting AI agents to external tools. If a platform supports MCP, SymDex works with it — no custom integration required.

Platform By How to Connect
Claude Desktop Anthropic Add to claude_desktop_config.json
Claude Code Anthropic claude mcp add symdex -- symdex serve
Codex CLI OpenAI Add to MCP settings
Codex App OpenAI Add to MCP settings
Gemini CLI Google Add to MCP settings
Cursor Anysphere Add to .cursor/mcp.json
Windsurf Codeium Add to MCP settings
GitHub Copilot (agent mode) Microsoft Add to .vscode/mcp.json
Continue.dev Continue Add to config.json
Cline Cline Add to MCP settings
Zed Zed Industries Add to MCP settings
OpenCode OpenCode Add to opencode.json
Any custom MCP client stdio or HTTP transport

Configuration (same pattern for all platforms)

{
  "mcpServers": {
    "symdex": {
      "command": "symdex",
      "args": ["serve"]
    }
  }
}

For HTTP mode (remote agents):

{
  "mcpServers": {
    "symdex": {
      "url": "http://localhost:8080/mcp"
    }
  }
}

Installation

Available on PyPI:

pip install symdex

Requires Python 3.11 or higher.


Quickstart

1. Index your project

symdex index ./myproject --name myproject

SymDex walks the directory, parses every supported source file, and writes the index to a local SQLite database. Run this once. Re-run it when your code changes (only modified files are re-processed).

2. Search for a symbol

symdex search "validate_email" --repo myproject
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Repo           ┃ Kind     ┃ Name           ┃ File                                    ┃ Start ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ myproject      │ function │ validate_email │ auth/utils.py                           │ 1024  │
└────────────────┴──────────┴────────────────┴─────────────────────────────────────────┴───────┘

3. Start the MCP server

symdex serve

Point your agent at it using the config above. The agent can now use all 15 MCP tools.


MCP Tool Reference

These are the tools your AI agent can call once SymDex is running as an MCP server.

Tool Description
index_folder Index a local folder — run once per project
index_repo Index a named, registered repo
search_symbols Find function or class by name — returns byte offsets
get_symbol Retrieve one symbol's full source by byte offset
get_symbols Bulk symbol retrieval by a list of offsets
get_file_outline All symbols in a file — no file content transferred
get_repo_outline Directory structure and symbol statistics for a repo
get_file_tree Directory tree — structure only, no content
search_text Text or regex search — returns matching lines only
list_repos List all indexed repos in the registry
invalidate_cache Force re-index on next request
semantic_search Find symbols by meaning using embedding similarity
get_callers Find all functions that call a named function
get_callees Find all functions called by a named function
search_routes Find HTTP routes indexed from a repo (Flask/FastAPI/Django/Express) — filter by method or path

CLI Reference

# Indexing
symdex index ./myproject                            # Index a folder
symdex index ./myproject --name myproj             # Index with a custom name
symdex invalidate --repo myproj                    # Force re-index a repo
symdex invalidate --repo myproj --file auth.py     # Force re-index one file

# Symbol search
symdex search "validate email" --repo myproj       # Search by name
symdex search "validate email"                     # Search across all repos
symdex find MyClass --repo myproj                  # Exact name lookup

# Semantic search
symdex semantic "authentication token parsing" --repo myproj

# File and repo inspection
symdex outline myproj/auth/utils.py --repo myproj  # All symbols in a file
symdex repos                                       # List all indexed repos
symdex text "TODO" --repo myproj                   # Text search

# Call graph
symdex callers process_payment --repo myproj       # Who calls this function
symdex callees process_payment --repo myproj       # What this function calls

# Watch (auto-reindex on file changes)
symdex watch ./myproject                           # Auto-reindex on file changes
symdex watch ./myproject --interval 10             # Custom poll interval (seconds)

# Routes
symdex routes myproject                            # List all indexed HTTP routes
symdex routes myproject -m GET                     # Filter by HTTP method

# Server
symdex serve                                       # Start MCP server (stdio)
symdex serve --port 8080                           # Start MCP server (HTTP)

How SymDex Differs from Other Tools

vs. LSP (Language Server Protocol)

LSP servers (pylsp, typescript-language-server, rust-analyzer, etc.) are excellent tools — but they are designed for editors, not for standalone agents. They require a running editor process, a language server per language installed and running, and they operate on live files in real time.

SymDex is terminal-native and editor-free. It runs wherever Python runs. No editor, no language server, no per-language daemon required. An agent running in a terminal (Claude Code, Codex CLI, OpenCode) gets the same symbol lookup capability with zero editor dependency.

The other thing LSP cannot do: semantic search. LSP can find validate_email if you know the name. SymDex can find it if you describe what it does — "check that an email address is properly formatted" — without knowing the name exists.

vs. Graph-database code indexers

Some tools build a full graph database (Neo4j, KùzuDB) over your code. This enables powerful queries — complexity analysis, cycle detection, deep inheritance trees. The tradeoff is operational complexity: choosing a backend, installing it, keeping it running.

SymDex uses SQLite — one file per repo, zero configuration. No backend to choose, no server to run, no Docker. The index lives in ~/.symdex/. Delete it and it rebuilds in seconds.

SymDex adds what graph-db tools lack: semantic search (find by meaning, not just name) and HTTP route indexing (expose your API surface without reading files).

vs. LSP-wrapper tools (Serena, etc.)

Tools that wrap real language servers get true type-aware analysis — they can resolve which concrete implementation is called through an interface, track generics, follow pointer dispatch. That is genuinely powerful for large, strongly-typed codebases.

The tradeoff: they require language servers installed per language, and queries hit live files rather than a pre-built index. SymDex is faster per query (pre-indexed), works offline, and adds semantic search and route indexing — capabilities no language server provides.


Architecture

Click to expand — internals for the technically curious

Storage

Each indexed repo gets its own SQLite database file stored in ~/.symdex/. A shared registry database tracks all repos.

-- Every extracted symbol
symbols (
    id          INTEGER PRIMARY KEY,
    repo        TEXT NOT NULL,
    file        TEXT NOT NULL,
    name        TEXT NOT NULL,
    kind        TEXT NOT NULL,   -- function | class | method | constant | variable
    start_byte  INTEGER NOT NULL,
    end_byte    INTEGER NOT NULL,
    signature   TEXT,
    docstring   TEXT,
    embedding   BLOB             -- float32 vector stored via sqlite-vec
)

-- Call graph edges
edges (
    caller_id   INTEGER REFERENCES symbols(id),
    callee_name TEXT NOT NULL,
    callee_file TEXT
)

-- Change detection
files (
    repo        TEXT NOT NULL,
    path        TEXT NOT NULL,
    hash        TEXT NOT NULL,   -- SHA-256 of file contents
    indexed_at  DATETIME NOT NULL,
    PRIMARY KEY (repo, path)
)

-- Cross-repo registry
repos (
    name         TEXT PRIMARY KEY,
    root_path    TEXT NOT NULL,
    db_path      TEXT NOT NULL,
    last_indexed DATETIME
)

-- HTTP routes (Flask, FastAPI, Django, Express)
routes (
    repo        TEXT NOT NULL,
    file        TEXT NOT NULL,
    method      TEXT NOT NULL,   -- GET | POST | PUT | DELETE | PATCH | ANY
    path        TEXT NOT NULL,   -- /users/{id}
    handler     TEXT,            -- function name
    start_byte  INTEGER NOT NULL,
    end_byte    INTEGER NOT NULL
)

Parsing

Source files are parsed using tree-sitter. tree-sitter produces a concrete syntax tree for each file. SymDex walks the tree and extracts nodes matching known symbol types per language (e.g. function_definition for Python, function_declaration for Go, method_definition for JavaScript).

Semantic Embeddings

When a symbol has a docstring or signature, SymDex generates a vector embedding using sentence-transformers (model: all-MiniLM-L6-v2 by default). Embeddings are stored as raw float32 blobs and queried using sqlite-vec — a SQLite extension for vector similarity search. Everything runs locally. No embedding API calls.

MCP Server

Built on FastMCP. Supports both stdio transport (for desktop agents) and streamable HTTP transport (for remote access).

Project Layout

symdex/
├── cli.py                  — Typer CLI (all user-facing commands)
├── core/
│   ├── parser.py           — tree-sitter symbol extraction (14 languages + Vue)
│   ├── storage.py          — SQLite read/write, vector storage, route storage
│   ├── indexer.py          — orchestrates parse → store pipeline
│   ├── watcher.py          — file-system watcher (watchdog), auto-reindex
│   ├── route_extractor.py  — regex-based HTTP route detection
│   └── schema.sql          — database schema (symbols, edges, files, repos, routes)
├── mcp/
│   ├── server.py           — FastMCP server definition
│   └── tools.py            — 15 MCP tool implementations
├── search/
│   ├── symbol_search.py    — name-based FTS search
│   ├── text_search.py      — regex/text search
│   └── semantic.py         — embedding similarity search
└── graph/
    ├── call_graph.py       — call edge extraction and query
    └── registry.py         — cross-repo registry and multi-DB search

FAQ

Do I need to re-index every time I change my code? Only if you want SymDex to reflect your latest changes. SymDex uses SHA-256 hashes to track which files have changed — re-indexing only processes modified files, so it is fast on large codebases.

Does semantic search send my code to an API? No. Embeddings are generated locally using sentence-transformers. Nothing leaves your machine.

Can I use SymDex without an AI agent? Yes. The full CLI gives you direct access to every search capability — symbol search, semantic search, call graph, file outlines — without any agent involved.

Does it work with monorepos? Yes. Index each sub-project separately with a unique --name, then search across all of them using symdex search without a --repo flag.

What happens if a language is not supported? SymDex skips files with unrecognised extensions. Supported and unsupported files can coexist in the same project — only the supported ones are indexed.

Is the index portable? Yes. The SQLite .db files can be copied to another machine. As long as SymDex is installed there, the index works. The only caveat is that absolute file paths in the index will point to the original machine.


License

MIT — see LICENSE

Contributing

Issues and pull requests are welcome at github.com/husnainpk/SymDex.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

symdex-0.1.3.tar.gz (38.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

symdex-0.1.3-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file symdex-0.1.3.tar.gz.

File metadata

  • Download URL: symdex-0.1.3.tar.gz
  • Upload date:
  • Size: 38.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for symdex-0.1.3.tar.gz
Algorithm Hash digest
SHA256 06314aa44814a8f3c7bf9c6a51f896c238b239c2a47c979a75c7f611c3a32f7f
MD5 e919b96bb19517963c07282bb8d39da3
BLAKE2b-256 858bd58efa1bef50f60c5ae3c1f24c32db6d2618f97ac8fc4b3bdcb7ab3180ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for symdex-0.1.3.tar.gz:

Publisher: publish.yml on husnainpk/SymDex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file symdex-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: symdex-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for symdex-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9bd8abb3966df22ff4dbba5f776dadd3be6cfc5a1dcde0a16b38b5eac4a5502a
MD5 9f18225032c34d6036361368022a21c3
BLAKE2b-256 6d4c5ebca0285e1dae4d16df94426e25a92c72468d85eb98f94e992775cc323d

See more details on using hashes here.

Provenance

The following attestation bundles were made for symdex-0.1.3-py3-none-any.whl:

Publisher: publish.yml on husnainpk/SymDex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page