Skip to main content

Source code AST analysis tool for AI context generation — unified multi-framework knowledge graph

Project description

English Korean Japanese Chinese Spanish French German Portuguese (Brazil)

codebeacon

Source code AST analysis and AI context generation — unified multi-framework knowledge graph

PyPI Python MIT License GitHub Stars Last Commit


Why codebeacon?

Every time you open a new AI coding session, your assistant starts blind. It doesn't know your routes, your service layer, your entity model, or how your microservices call each other. You spend the first chunk of every session just getting the AI back up to speed — pasting files, explaining structure, re-establishing context.

Existing tools solve this partially. Route analyzers map your controllers but miss service dependencies. Knowledge graph tools capture relationships but ignore your API surface. You end up running both, stitching output manually, and repeating it every time the codebase changes.

codebeacon unifies both approaches in a single CLI. One command scans your entire codebase with tree-sitter AST parsing, resolves dependency injection across files, detects community clusters in your architecture, and writes a ready-to-use context map directly into CLAUDE.md, .cursorrules, and AGENTS.md — so your AI assistant walks into every session already knowing your codebase.


Key Features

  • Unified pipeline — route/controller analysis + knowledge graph in one tool, no manual stitching
  • 17 frameworks, 9 languages — Spring Boot, NestJS, Django, FastAPI, Rails, Express, React, Vue, Angular, Svelte, Gin, Laravel, Actix-Web, ASP.NET Core, Vapor, Ktor, and more
  • Tree-sitter based — structural AST parsing, not regex; all 17 language grammars included out of the box
  • Two-pass DI resolution — Pass 1 extracts local AST nodes; Pass 2 builds a global symbol table and resolves Interface → Implementation mappings that single-pass tools miss
  • Wave merge architecture — files processed in parallel chunks, results merged globally; handles large monorepos without memory blowouts
  • Multiple output formats — JSON knowledge graph, Markdown wiki, Obsidian vault, AI context maps, MCP server
  • Community detection — Leiden/Louvain clustering reveals your actual architectural boundaries
  • Incremental cache — SHA-256 based; only re-extracts files that changed since the last scan
  • Zero configuration — auto-detects frameworks and languages; generates codebeacon.yaml for repeat runs

Quick Start

pip install codebeacon

codebeacon scan .

That's it. codebeacon detects your project types, extracts routes/services/entities/components, builds a knowledge graph, and writes everything to .codebeacon/.

For a multi-project workspace:

codebeacon scan /path/to/workspace   # auto-detects all projects, generates codebeacon.yaml
codebeacon sync                      # subsequent runs via config

Supported Frameworks

Language Frameworks
Java / Kotlin Spring Boot, Ktor
Python Django, FastAPI, Flask
JavaScript / TypeScript Express, NestJS, React, Vue, Angular, Svelte
Go Gin
Ruby Rails
PHP Laravel
Rust Actix-Web
C# ASP.NET Core
Swift Vapor

Architecture

codebeacon runs a two-pass extraction pipeline:

[Config] → [Discover] → [Wave / Extract] → [Resolve] → [Filter] → [Enrich] → [Graph] → [Wiki] → [ContextMap] → [Export]
                              │                  │           │          │
                         Local AST           Symbol      Cross-lang  HTTP API
                         per chunk           table       artifact    Shared DB
                         (Pass 1)           matching    removal     entity edges
                                            (Pass 2)

Pass 1 — Wave extraction: Files are processed in parallel chunks via ThreadPoolExecutor. Each file runs through five extractors: routes, services, entities, components, and dependencies. Results are cached by SHA-256 for incremental re-scans.

Pass 2 — Graph build: All wave results are merged. A global symbol table resolves unresolved dependency injection references — mapping interfaces to implementations in the way Spring's implicit Bean wiring or TypeScript's injection tokens require. Filters remove build artifacts, spurious cross-language imports, and false cross-service edges.

Post-processing: HTTP API edges connect frontend URL calls to matching backend routes. Community detection (Leiden → Louvain → connected components fallback) partitions the graph into architectural clusters. A structural report identifies god nodes, surprising cross-cluster connections, and hub files.


Output Structure

After a scan, everything lands in .codebeacon/:

.codebeacon/
  beacon.json          ← full knowledge graph (node-link JSON, queryable)
  REPORT.md            ← god nodes, surprising connections, hub files
  CLAUDE.md            ← AI context map (also written to project root)
  .cursorrules         ← Cursor IDE context
  AGENTS.md            ← OpenAI Agents / Codex context
  wiki/
    index.md           ← global index (~200 tokens)
    overview.md        ← platform stats + cross-project connections
    routes.md          ← all routes table
    cross-project/
      connections.md   ← cross-service edges
    <project>/
      index.md
      routes.md
      controllers/<Name>.md
      services/<Name>.md
      entities/<Name>.md
      components/<Name>.md
  obsidian/            ← Obsidian vault (one note per graph node)

Installation Options

pip install codebeacon              # all 17 language grammars included
pip install codebeacon[cluster]     # + Leiden community detection (graspologic)
pip install --upgrade codebeacon    # upgrade to latest version with all dependencies

All language parsers (Java, Kotlin, Python, JavaScript, TypeScript, Go, Ruby, PHP, C#, Rust, Swift, HTML, Svelte) are bundled by default — no extra flags needed.


CLI Reference

# Scan a project or workspace
codebeacon scan <path> [options]
codebeacon scan .                         # current directory
codebeacon scan /workspace                # workspace root (multi-project)
codebeacon scan . --update                # incremental: only re-extract changed files
codebeacon scan . --wiki-only             # regenerate wiki without re-extracting
codebeacon scan . --obsidian-dir <path>   # write Obsidian vault to custom location
codebeacon scan . --semantic              # enable LLM semantic extraction
codebeacon scan . --list-only             # detect frameworks only, don't extract

# Config-driven mode
codebeacon init [path]                    # auto-generate codebeacon.yaml
codebeacon sync                           # run from codebeacon.yaml
codebeacon sync --config <file>           # use a specific config file

# Query the knowledge graph (coming soon)
codebeacon query <term>                   # search nodes and edges
codebeacon path <source> <target>         # shortest path between two nodes

# Integrations
codebeacon serve [--dir .codebeacon]      # start MCP server (stdio)
codebeacon install                        # install Claude Code skill

Configuration

Run codebeacon init to generate codebeacon.yaml, or write it manually:

version: 1

projects:
  - name: api-server
    path: ./api-server
    type: spring-boot          # optional: auto-detected if omitted

  - name: frontend
    path: ./frontend
    type: react

output:
  dir: .codebeacon
  wiki: true
  obsidian: true
  graph_html: true
  context_map:
    targets: [CLAUDE.md, .cursorrules, AGENTS.md]

wave:
  auto: true
  chunk_size: 300              # files per chunk
  max_parallel: 5              # parallel threads

semantic:
  enabled: false               # override with --semantic flag

How It Compares

codesight graphify codebeacon
Route / controller analysis
Service / DI graph partial
Interface → Impl resolution
Entity / ORM model extraction
Frontend component analysis
Community detection
Obsidian vault export
MCP server
AI context map (CLAUDE.md)
Multi-project workspace partial
Python-based

codebeacon is not a replacement for either tool — it's the union of what both do, built around a shared extraction and graph layer.


Benchmarks

Codebase Stack Files Nodes Edges Communities Scan time
multi-service SaaS app SvelteKit + Next.js + Spring Boot (3 projects) 444 382 553 175 ~12s

Privacy & Security

All processing is local. Your source code never leaves your machine.

  • Tree-sitter AST parsing runs entirely in-process
  • No telemetry, no analytics, no network calls during normal operation
  • The --semantic flag (disabled by default) sends code excerpts to your configured LLM API — only enable it explicitly

Contributing

git clone https://github.com/codebeacon/codebeacon
cd codebeacon
pip install -e ".[dev,cluster]"
pytest

The easiest entry point for adding new framework support is writing a tree-sitter query file in codebeacon/extract/queries/. See codebeacon/extract/queries/README.md for the full guide — it walks through grammar setup, .scm query syntax, capture naming conventions, and how to wire up a new extractor.

Contributions welcome: new framework queries, language parsers, output formats, and benchmark datasets.


License

MIT — see LICENSE.


Acknowledgments

Built on tree-sitter for structural AST parsing, NetworkX for graph operations, and graspologic for Leiden community detection.

Inspired by the complementary approaches of codesight and graphify.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codebeacon-0.1.2.tar.gz (134.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codebeacon-0.1.2-py3-none-any.whl (119.9 kB view details)

Uploaded Python 3

File details

Details for the file codebeacon-0.1.2.tar.gz.

File metadata

  • Download URL: codebeacon-0.1.2.tar.gz
  • Upload date:
  • Size: 134.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for codebeacon-0.1.2.tar.gz
Algorithm Hash digest
SHA256 58b0eb5b16b792467bc3e0ce67efb28a628cfe82bbfce661f23e23e0606e1698
MD5 b87fc4e67ff99ecc3909efd3c0b19f3f
BLAKE2b-256 22f4e6776f388c56d2295779446801466ca3280b1c3ba8a66713e5547e47d621

See more details on using hashes here.

File details

Details for the file codebeacon-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: codebeacon-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 119.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for codebeacon-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d27e90b0cf9fc8e792674ba1bb28d88cd465cc6413caa49041dca355df3468b0
MD5 85d0390946e9f3e70786d42ded6d676e
BLAKE2b-256 cd70f17a472ea0e49bbfe55e34677ca02ca601376153c350898bb252ef5b84a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page