Skip to main content

Intelligent code indexing MCP server. Tree-sitter AST extraction, hybrid search (FTS5 + vector), call graphs, 10 languages, incremental indexing. Save 99% of tokens.

Project description

TokenNuke

Intelligent code indexing MCP server. 15 tools, 10 languages, tree-sitter AST extraction, hybrid search (FTS5 + vector), call graphs, remote repo indexing, incremental indexing.

Save 99% of tokens — get exact function source via byte-offset seek instead of reading entire files.

Formerly codemunch-pro. Same code, better name.

Install

pip install tokennuke

Quick Start

Claude Desktop / Cline

Add to your MCP client config:

{
  "mcpServers": {
    "tokennuke": {
      "command": "tokennuke"
    }
  }
}

HTTP Server

tokennuke --transport streamable-http --port 5002

15 MCP Tools

Tool Description
index_folder Index a local directory (incremental, SHA-256 based)
index_repo Index a GitHub/GitLab repo (tarball download, no git needed)
list_repos List all indexed repositories with stats
invalidate_cache Force re-index a repository
file_tree Get directory tree with file counts
file_outline List symbols in a single file
repo_outline List all symbols in repo (summary)
get_symbol Get full source of one symbol (O(1) byte seek)
get_symbols Batch get multiple symbols
search_symbols Hybrid search (FTS5 + vector RRF)
search_text Full-text search in file contents
get_callees What does this function call?
get_callers Who calls this function?
diff_symbols What changed since last index? (PR review)
dependency_map What does this file depend on? What depends on it?

10 Languages

Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby

All via tree-sitter-language-pack — zero compilation, pre-built binaries.

Key Features

O(1) Symbol Retrieval

Every symbol stores its byte offset and length. get_symbol seeks directly to the function source — no reading entire files. A 200-byte function from a 40KB file = 99.5% token savings.

Incremental Indexing

Files are hashed (SHA-256). Only changed files are re-parsed. Re-indexing a 10K file repo after changing one file takes milliseconds.

Hybrid Search (FTS5 + Vector)

Combines BM25 keyword matching with semantic vector similarity using Reciprocal Rank Fusion. Search "authentication middleware" and find auth_middleware, verify_token, and login_handler.

Call Graphs

Traces function calls through the AST. get_callees("main") shows what main calls. get_callers("authenticate") shows who calls authenticate. Supports depth traversal.

Remote Repo Indexing

Index any public GitHub or GitLab repo by URL — no git binary needed. Downloads the tarball via API, extracts, and indexes. Cached locally with SHA-based freshness checks. Supports private repos with auth tokens and sparse paths.

Full-Text Content Search

Search raw file contents — string literals, TODO comments, config values, error messages. Not just symbol names.

How It Works

  1. Parse — tree-sitter builds an AST for each source file
  2. Extract — Walk AST to find functions, classes, methods, types, interfaces
  3. Store — SQLite database per repo with FTS5 virtual tables
  4. Embed — FastEmbed (ONNX, CPU-only) generates 384-dim vectors for semantic search
  5. Graph — Call expressions extracted from function bodies, edges stored and resolved
  6. Serve — FastMCP exposes 15 tools via stdio or HTTP

Architecture

~/.tokennuke/
├── myproject_a1b2c3d4e5f6.db    # Per-repo SQLite database
├── otherproject_7890abcdef.db
└── ...

Each DB contains:
├── files          # Indexed files with SHA-256 hashes
├── symbols        # Functions, classes, methods, types
├── symbols_fts    # FTS5 full-text search index
├── symbols_vec    # sqlite-vec 384-dim vector index
├── call_edges     # Call graph (caller → callee)
└── file_content_fts  # Raw file content search

Use Cases

  • AI Coding Agents: Give your agent surgical access to codebases without burning context
  • Code Review: Find all callers of a function before changing its signature
  • Onboarding: Search symbols semantically — "where is error handling?" finds relevant code
  • Refactoring: Map call graphs before moving functions between modules
  • Documentation: Extract all public APIs with signatures and docstrings

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokennuke-1.3.0.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokennuke-1.3.0-py3-none-any.whl (31.1 kB view details)

Uploaded Python 3

File details

Details for the file tokennuke-1.3.0.tar.gz.

File metadata

  • Download URL: tokennuke-1.3.0.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tokennuke-1.3.0.tar.gz
Algorithm Hash digest
SHA256 b9912867d028ff0bc533ff65cae29decf4ea9cb5bb76d1e9ae78e87437b2929c
MD5 bf08351b1bca34ba5ffd9c8a0ef37be1
BLAKE2b-256 ab9e76994e5271c71f8212093780d79c88d01618536b6ecba17c449dbf278fe7

See more details on using hashes here.

File details

Details for the file tokennuke-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: tokennuke-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 31.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tokennuke-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b6ea30a6ca1ebb162b5d32224058f21320f3ca198cd089c3f5dc060a7e159fa5
MD5 5b0a6492d569f117f8eda7564298903d
BLAKE2b-256 19f9d46a260232364902184c70e38f857e63cca9e0b551be667990e7047b42a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page