Skip to main content

Source code AST analysis tool for AI context generation — unified multi-framework knowledge graph

Project description

English Korean Japanese Chinese Spanish French German Portuguese (Brazil)

codebeacon

Source code AST analysis and AI context generation — unified multi-framework knowledge graph

PyPI Python MIT License GitHub Stars Last Commit


Why codebeacon?

Every time you open a new AI coding session, your assistant starts blind. It doesn't know your routes, your service layer, your entity model, or how your microservices call each other. You spend the first chunk of every session just getting the AI back up to speed — pasting files, explaining structure, re-establishing context.

Existing tools solve this partially. Route analyzers map your controllers but miss service dependencies. Knowledge graph tools capture relationships but ignore your API surface. You end up running both, stitching output manually, and repeating it every time the codebase changes.

codebeacon unifies both approaches in a single CLI. One command scans your entire codebase with tree-sitter AST parsing, resolves dependency injection across files, detects community clusters in your architecture, and writes a ready-to-use context map directly into CLAUDE.md, .cursorrules, and AGENTS.md — so your AI assistant walks into every session already knowing your codebase.


Key Features

  • Unified pipeline — route/controller analysis + knowledge graph in one tool, no manual stitching
  • 27 frameworks, 9 languages — Spring Boot, NestJS, Django, FastAPI, Flask, Rails, Express, Fastify, Koa, React, Next.js, Vue, Nuxt, Angular, SvelteKit, Gin, Echo, Fiber, Laravel, Actix-Web, Axum, Tauri, Rocket, Warp, ASP.NET Core, Vapor, Ktor
  • Tree-sitter based — structural AST parsing, not regex; all language grammars included out of the box
  • Two-pass DI resolution — Pass 1 extracts local AST nodes; Pass 2 builds a global symbol table and resolves Interface → Implementation mappings that single-pass tools miss
  • Wave merge architecture — files processed in parallel chunks, results merged globally; handles large monorepos without memory blowouts
  • Multiple output formats — JSON knowledge graph, Markdown wiki, Obsidian vault, AI context maps, MCP server
  • Community detection — Leiden/Louvain clustering reveals your actual architectural boundaries
  • Incremental cache — SHA-256 based; only re-extracts files that changed since the last scan
  • Zero configuration — auto-detects frameworks and languages; generates codebeacon.yaml for repeat runs
  • Deep-dive mode--deep-dive generates per-project .codebeacon/ + CLAUDE.md for every sub-project; running codebeacon scan . --update from any sub-project folder automatically syncs all projects in the workspace

Quick Start

pip install codebeacon

codebeacon scan .

That's it. codebeacon detects your project types, extracts routes/services/entities/components, builds a knowledge graph, and writes everything to .codebeacon/.

For a multi-project workspace:

codebeacon scan /path/to/workspace   # auto-detects all projects, generates codebeacon.yaml
codebeacon sync                      # subsequent runs via config

Supported Frameworks

Language Frameworks
Java / Kotlin Spring Boot, Ktor
Python Django, FastAPI, Flask
JavaScript / TypeScript Express, Fastify, Koa, NestJS, React, Next.js, Vue, Nuxt, Angular, SvelteKit
Go Gin, Echo, Fiber
Ruby Rails
PHP Laravel
Rust Actix-Web, Axum, Tauri, Rocket, Warp
C# ASP.NET Core
Swift Vapor

Architecture

codebeacon runs a two-pass extraction pipeline:

[Config] → [Discover] → [Wave / Extract] → [Resolve] → [Filter] → [Enrich] → [Graph] → [Wiki] → [ContextMap] → [Export]
                              │                  │           │          │
                         Local AST           Symbol      Cross-lang  HTTP API
                         per chunk           table       artifact    Shared DB
                         (Pass 1)           matching    removal     entity edges
                                            (Pass 2)

Pass 1 — Wave extraction: Files are processed in parallel chunks via ThreadPoolExecutor. Each file runs through five extractors: routes, services, entities, components, and dependencies. Results are cached by SHA-256 for incremental re-scans.

Pass 2 — Graph build: All wave results are merged. A global symbol table resolves unresolved dependency injection references — mapping interfaces to implementations in the way Spring's implicit Bean wiring or TypeScript's injection tokens require. Filters remove build artifacts, spurious cross-language imports, and false cross-service edges.

Post-processing: HTTP API edges connect frontend URL calls to matching backend routes. Community detection (Leiden → Louvain → connected components fallback) partitions the graph into architectural clusters. A structural report identifies god nodes, surprising cross-cluster connections, and hub files.


Output Structure

After a scan, context map files are updated at the project root (existing user content is preserved) and the knowledge graph lands in .codebeacon/:

project-root/
  CLAUDE.md              ← AI context map (codebeacon block merged; user content kept)
  .cursorrules           ← Cursor IDE context (same merge strategy)
  AGENTS.md              ← OpenAI Agents / Codex context (same merge strategy)
  .codebeacon/
    beacon.json          ← full knowledge graph (node-link JSON, queryable)
    REPORT.md            ← god nodes, surprising connections, hub files
    wiki/
      index.md           ← global index (~200 tokens)
      overview.md        ← platform stats + cross-project connections
      routes.md          ← all routes table
      cross-project/
        connections.md   ← cross-service edges
      <project>/
        index.md
        routes.md
        controllers/<Name>.md
        services/<Name>.md
        entities/<Name>.md
        components/<Name>.md
    obsidian/            ← Obsidian vault (one note per graph node)

Deep Dive Mode

With --deep-dive, each sub-project also gets its own .codebeacon/ directory and CLAUDE.md, so AI sessions opened inside a sub-project have full project-specific context:

workspace/
  CLAUDE.md                   ← combined (all projects)
  .cursorrules
  AGENTS.md
  codebeacon.yaml             ← deep_dive: true
  .codebeacon/                ← combined knowledge graph
    beacon.json
    wiki/
    obsidian/
  api-server/
    CLAUDE.md                 ← api-server only
    .codebeacon/              ← api-server graph
      beacon.json
      wiki/
      obsidian/
  frontend/
    CLAUDE.md                 ← frontend only
    .codebeacon/              ← frontend graph
      beacon.json
      wiki/
      obsidian/

Claude Code loads CLAUDE.md hierarchically, so opening a session in api-server/ loads both the parent workspace overview and the project-specific details.

To update from any sub-project directory after the initial scan:

# Initial deep-dive scan
codebeacon scan /workspace --deep-dive

# Later, from any sub-project — finds the parent config and updates ALL projects
cd /workspace/api-server
codebeacon scan . --update

AI Integration

Claude Code Skill (/codebeacon)

Install codebeacon as a Claude Code slash command:

pip install codebeacon
codebeacon install

This copies SKILL.md to ~/.claude/skills/codebeacon/ and registers the /codebeacon trigger in ~/.claude/CLAUDE.md. Restart your Claude Code session, then type /codebeacon to scan the current directory.

/codebeacon                  # scan current directory
/codebeacon /path/to/project # scan a specific path
/codebeacon sync             # re-scan from codebeacon.yaml

MCP Server

Run codebeacon as a persistent MCP server so any MCP-compatible client can query your knowledge graph directly.

Step 1 — scan your project:

codebeacon scan .

Step 2 — add to your MCP client config:

Claude Code (.claude.json in project root or ~/.claude.json globally):

{
  "mcpServers": {
    "codebeacon": {
      "command": "codebeacon",
      "args": ["serve"]
    }
  }
}

Cursor (~/.cursor/mcp.json):

{
  "mcpServers": {
    "codebeacon": {
      "command": "codebeacon",
      "args": ["serve", "--dir", "/path/to/.codebeacon"]
    }
  }
}

Available MCP tools once connected:

Tool Description
beacon_wiki_index Global project overview (routes, services, entities count)
beacon_wiki_article Read a specific wiki article by path
beacon_query Search nodes by label substring
beacon_path Shortest dependency path between two nodes
beacon_blast_radius Upstream callers + downstream affected nodes
beacon_routes List all HTTP routes, filterable by project
beacon_services List all services/classes, filterable by project

Installation Options

pip install codebeacon              # all language grammars included
pip install codebeacon[cluster]     # + Leiden community detection (graspologic)
pip install --upgrade codebeacon    # upgrade to latest version with all dependencies

All language parsers (Java, Kotlin, Python, JavaScript, TypeScript, Go, Ruby, PHP, C#, Rust, Swift, HTML, Svelte) are bundled by default — no extra flags needed.


CLI Reference

# Scan a project or workspace
codebeacon scan <path> [options]
codebeacon scan .                         # current directory
codebeacon scan /workspace                # workspace root (multi-project)
codebeacon scan . --update                # incremental: only re-extract changed files
codebeacon scan . --wiki-only             # skip re-extraction, regenerate wiki/obsidian/context map from existing beacon.json
codebeacon scan . --obsidian-dir <path>   # write Obsidian vault to custom location
codebeacon scan . --semantic              # enable LLM semantic extraction
codebeacon scan . --list-only             # detect frameworks only, don't extract
codebeacon scan /workspace --deep-dive    # per-project + combined workspace outputs

# Config-driven mode
codebeacon init [path]                    # auto-generate codebeacon.yaml
codebeacon sync                           # run from codebeacon.yaml
codebeacon sync --config <file>           # use a specific config file

# Query the knowledge graph (coming soon)
codebeacon query <term>                   # search nodes and edges
codebeacon path <source> <target>         # shortest path between two nodes

# Integrations
codebeacon serve [--dir .codebeacon]      # start MCP server (stdio)
codebeacon install                        # install Claude Code skill

Configuration

Run codebeacon init to generate codebeacon.yaml, or write it manually:

version: 1

projects:
  - name: api-server
    path: ./api-server
    type: spring-boot          # optional: auto-detected if omitted

  - name: frontend
    path: ./frontend
    type: react

output:
  dir: .codebeacon
  wiki: true
  obsidian: true
  context_map:
    targets: [CLAUDE.md, .cursorrules, AGENTS.md]

wave:
  auto: true
  chunk_size: 300              # files per chunk
  max_parallel: 5              # parallel threads

semantic:
  enabled: false               # override with --semantic flag

deep_dive: false               # set to true to generate per-project outputs

.codebeaconignore

Place a .codebeaconignore file at your project root to exclude directories or files from scanning. Syntax is the same as .gitignore — one pattern per line, # for comments.

# .codebeaconignore
generated/
build/
*.generated.ts
fixtures/

How It Compares

codesight graphify codebeacon
Route / controller analysis
Service / DI graph partial
Interface → Impl resolution
Entity / ORM model extraction
Frontend component analysis
Community detection
Obsidian vault export
MCP server
AI context map (CLAUDE.md)
Multi-project workspace partial
Python-based

codebeacon is not a replacement for either tool — it's the union of what both do, built around a shared extraction and graph layer.


Benchmarks

Codebase Stack Files Nodes Edges Communities Scan time
multi-service SaaS app SvelteKit + Next.js + Spring Boot (3 projects) 444 382 553 175 ~12s

Privacy & Security

All processing is local. Your source code never leaves your machine.

  • Tree-sitter AST parsing runs entirely in-process
  • No telemetry, no analytics, no network calls during normal operation
  • The --semantic flag (disabled by default) activates two extraction modes:
    1. Structured comment parsing (no LLM required) — infers cross-references from Javadoc (@see, {@link}), Python docstrings (:class:, :func:), and JSDoc (@see, @param types)
    2. LLM inference (optional) — when ANTHROPIC_API_KEY is set, sends code excerpts to the Claude API for deeper relationship inference; only enable it explicitly

Contributing

git clone https://github.com/Wandererer/codebeacon
cd codebeacon
pip install -e ".[dev,cluster]"
pytest

The easiest entry point for adding new framework support is writing a tree-sitter query file in codebeacon/extract/queries/. See codebeacon/extract/queries/README.md for the full guide — it walks through grammar setup, .scm query syntax, capture naming conventions, and how to wire up a new extractor.

Contributions welcome: new framework queries, language parsers, output formats, and benchmark datasets.


License

MIT — see LICENSE.


Acknowledgments

Built on tree-sitter for structural AST parsing, NetworkX for graph operations, and graspologic for Leiden community detection.

Inspired by the complementary approaches of codesight and graphify.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codebeacon-0.1.8.tar.gz (152.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codebeacon-0.1.8-py3-none-any.whl (128.8 kB view details)

Uploaded Python 3

File details

Details for the file codebeacon-0.1.8.tar.gz.

File metadata

  • Download URL: codebeacon-0.1.8.tar.gz
  • Upload date:
  • Size: 152.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codebeacon-0.1.8.tar.gz
Algorithm Hash digest
SHA256 d57602ab55966c5dff8790b82853c697dbf054246909f25bd62d32401ccbd454
MD5 65f176a3f4661fceb4f90263bcdf9cf2
BLAKE2b-256 81dee36fab1bedda33d1b86c633413b72f414e5a783ee852045c34065137c18b

See more details on using hashes here.

Provenance

The following attestation bundles were made for codebeacon-0.1.8.tar.gz:

Publisher: release.yml on Wandererer/codebeacon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file codebeacon-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: codebeacon-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 128.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codebeacon-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 9fbe28debbfeae72a8c77b48cc4da50f165903ecaba5759e28bfb7e94837c0ad
MD5 30d61c1c39dfaf1a1108b8069e0f5aac
BLAKE2b-256 ae7aceb207256d7749791dc2c774ea3956f7d8970d4912c8486ae9957fb2e890

See more details on using hashes here.

Provenance

The following attestation bundles were made for codebeacon-0.1.8-py3-none-any.whl:

Publisher: release.yml on Wandererer/codebeacon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page