Skip to main content

AST, Call Graph, CFG, DFG, PDG.

Project description

DIG: Code Analysis for AI Agents

Give LLMs exactly the code they need. Nothing more.

# One-liner: Install, index, search
pip install axe-dig && chop warm . && chop semantic "what you're looking for" .

Your codebase is 100K lines. Claude's context window is 200K tokens. Raw code won't fit—and even if it did, the LLM would drown in irrelevant details.

DIG extracts structure instead of dumping text. The result: 95% fewer tokens while preserving everything needed to understand and edit code correctly.

pip install axe-dig
chop warm .                    # Index your project
chop context main --project .  # Get LLM-ready summary

How It Works

DIG builds 5 analysis layers, each answering different questions:

┌─────────────────────────────────────────────────────────────┐
│ Layer 5: Program Dependence  → "What affects line 42?"      │
│ Layer 4: Data Flow           → "Where does this value go?"  │
│ Layer 3: Control Flow        → "How complex is this?"       │
│ Layer 2: Call Graph          → "Who calls this function?"   │
│ Layer 1: AST                 → "What functions exist?"      │
└─────────────────────────────────────────────────────────────┘

Why layers? Different tasks need different depth:

  • Browsing code? Layer 1 (structure) is enough
  • Refactoring? Layer 2 (call graph) shows what breaks
  • Debugging null? Layer 5 (slice) shows only relevant lines

The daemon keeps indexes in memory for 100ms queries instead of 30-second CLI spawns.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                         YOUR CODE                                │
│  src/*.py, lib/*.ts, pkg/*.go                                    │
└───────────────────────────┬──────────────────────────────────────┘
                            │ tree-sitter
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│                     5-LAYER ANALYSIS                             │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐     │
│  │   AST   │→│  Calls  │→│   CFG   │→│   DFG   │→│   PDG   │     │
│  │   L1    │ │   L2    │ │   L3    │ │   L4    │ │   L5    │     │
│  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘     │
└───────────────────────────┬──────────────────────────────────────┘
                            │ bge-large-en-v1.5
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│                    SEMANTIC INDEX                                │
│  1024-dim embeddings in FAISS  →  "find JWT validation"          │
└───────────────────────────┬──────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│                       DAEMON                                     │
│  In-memory indexes  •  100ms queries  •  Auto-lifecycle          │
└──────────────────────────────────────────────────────────────────┘

The Semantic Layer: Search by Behavior

The real power comes from combining all 5 layers into searchable embeddings.

Every function gets indexed with:

  • Signature + docstring (L1)
  • What it calls + who calls it (L2)
  • Complexity metrics (L3)
  • Data flow patterns (L4)
  • Dependencies (L5)
  • First ~10 lines of actual code

This gets encoded into 1024-dimensional vectors using bge-large-en-v1.5. The result: search by what code does, not just what it says.

# "validate JWT" finds verify_access_token() even without that exact text
chop semantic "validate JWT tokens and check expiration" .

Why this works: Traditional search finds authentication in variable names and comments. Semantic search understands that verify_access_token() performs JWT validation because the call graph and data flow reveal its purpose.

Setting Up Semantic Search

# Build the semantic index (one-time, ~2 min for typical project)
chop warm /path/to/project

# Search by behavior
chop semantic "database connection pooling" .

Embedding dependencies (sentence-transformers, faiss-cpu) are included with pip install axe-dig. The index is cached in .dig/cache/semantic.faiss.

Keeping the Index Fresh

The daemon tracks dirty files and auto-rebuilds after 20 changes, but you need to notify it when files change:

# Notify daemon of a changed file
chop daemon notify src/auth.py --project .

Integration options:

  1. Git hook (post-commit):

    git diff --name-only HEAD~1 | xargs -I{} chop daemon notify {} --project .
    
  2. Editor hook (on save):

    chop daemon notify "$FILE" --project .
    
  3. Manual rebuild (when needed):

    chop warm .  # Full rebuild
    

The daemon auto-rebuilds semantic embeddings in the background once the dirty threshold (default: 20 files) is reached.


The Workflow

Before Reading Code

chop tree src/                      # See file structure
chop structure src/ --lang python   # See functions/classes

Before Editing

chop extract src/auth.py            # Full file analysis
chop context login --project .      # LLM-ready summary (95% savings)

Before Refactoring

chop impact login .                 # Who calls this? (reverse call graph)
chop change-impact                  # Which tests need to run?

Debugging

chop slice src/auth.py login 42     # What affects line 42?
chop dfg src/auth.py login          # Trace data flow

Finding Code by Behavior

chop semantic "validate JWT tokens" .   # Natural language search

Advanced Analysis & Visualization

chop cycles .                           # Detect recursion loops
chop path src_func tgt_func             # Find shortest call path
# Run inject_data.py to generate the dig_visualizer.html

Quick Setup

1. Install

pip install axe-dig

2. Index Your Project

chop warm /path/to/project

This builds all analysis layers and starts the daemon. Takes 30-60 seconds for a typical project, then queries are instant.

3. Start Using

chop context main --project .   # Get context for a function
chop impact helper_func .       # See who calls it
chop semantic "error handling"  # Find by behavior

Real Example: Why This Matters

Scenario: Debug why user is null on line 42.

Without DIG:

  1. Read the 150-line function
  2. Trace every variable manually
  3. Miss the bug because it's hidden in control flow

With DIG:

chop slice src/auth.py login 42

Output: Only 6 lines that affect line 42:

3:   user = db.get_user(username)
7:   if user is None:
12:      raise NotFound
28:  token = create_token(user)  # ← BUG: skipped null check
35:  session.token = token
42:  return session

The bug is obvious. Line 28 uses user without going through the null check path.


Command Reference

Exploration

Command What It Does
chop tree [path] File tree
chop structure [path] --lang <lang> Functions, classes, methods
chop search <pattern> [path] Text pattern search
chop extract <file> Full file analysis

Analysis

Command What It Does
chop context <func> --project <path> LLM-ready summary (95% savings)
chop cfg <file> <function> Control flow graph
chop dfg <file> <function> Data flow graph
chop slice <file> <func> <line> Program slice

Cross-File

Command What It Does
chop calls [path] Build call graph
chop impact <func> [path] Find all callers (reverse call graph)
chop cycles [path] Detect recursive loops and circular deps
chop path <src> <tgt> [path] Find shortest path between functions
chop dead [path] Find unreachable code
chop arch [path] Detect architecture layers
chop imports <file> Parse imports
chop importers <module> [path] Find files that import a module

Visualization

Command What It Does
python3 axe-dig/inject_data.py Generate interactive knowledge graph
open dig_visualizer.html Open premium 5-layout visualizer

Semantic

Command What It Does
chop warm <path> Build all indexes (including embeddings)
chop semantic <query> [path] Natural language code search

Diagnostics

Command What It Does
chop diagnostics <file> Type check + lint
chop change-impact [files] Find tests affected by changes
chop doctor Check/install diagnostic tools

Daemon

Command What It Does
chop daemon start Start background daemon
chop daemon stop Stop daemon
chop daemon status Check status

Supported Languages

Python, TypeScript, JavaScript, Go, Rust, Java, C, C++, Ruby, PHP, C#, Kotlin, Scala, Swift, Lua, Elixir

Language is auto-detected or specify with --lang.


MCP Integration

For AI tools (Claude Desktop, Claude Code):

Claude Desktop - Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "dig": {
      "command": "dig-mcp",
      "args": ["--project", "/path/to/your/project"]
    }
  }
}

Claude Code - Add to .claude/settings.json:

{
  "mcpServers": {
    "dig": {
      "command": "dig-mcp",
      "args": ["--project", "."]
    }
  }
}

Configuration

.digignore - Exclude Files

DIG respects .digignore (gitignore syntax) for all commands including tree, structure, search, calls, and semantic indexing:

# Auto-create with sensible defaults
dig warm .  # Creates .digignore if missing

Default exclusions:

  • node_modules/, .venv/, __pycache__/
  • dist/, build/, *.egg-info/
  • Binary files (*.so, *.dll, *.whl)
  • Security files (.env, *.pem, *.key)

Customize by editing .digignore:

# Add your patterns
large_test_fixtures/
vendor/
data/*.csv

CLI Flags:

# Add patterns from command line (can be repeated)
chop --ignore "packages/old/" --ignore "*.generated.ts" tree .

# Bypass all ignore patterns
chop --no-ignore tree .

Settings - Daemon Behavior

Create .dig/config.json for daemon settings:

{
  "semantic": {
    "enabled": true,
    "auto_reindex_threshold": 20
  }
}
Setting Default Description
enabled true Enable semantic search
auto_reindex_threshold 20 Files changed before auto-rebuild

Monorepo Support

For monorepos, create .claude/workspace.json to scope indexing:

{
  "active_packages": ["packages/core", "packages/api"],
  "exclude_patterns": ["**/fixtures/**"]
}

Performance

Metric Raw Code DIG Improvement
Tokens for function context 21,000 175 99% savings
Tokens for codebase overview 104,000 12,000 89% savings
Query latency (daemon) 30s 100ms 300x faster

Deep Dive

For the full architecture explanation, benchmarks, and advanced workflows:

Full Documentation


License

AGPL-3.0 - See LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

axe_dig-1.6.0.tar.gz (198.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

axe_dig-1.6.0-py3-none-any.whl (190.0 kB view details)

Uploaded Python 3

File details

Details for the file axe_dig-1.6.0.tar.gz.

File metadata

  • Download URL: axe_dig-1.6.0.tar.gz
  • Upload date:
  • Size: 198.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for axe_dig-1.6.0.tar.gz
Algorithm Hash digest
SHA256 92b029b53c1ac2dc26069f5b7f4237c1e1524c236381718192151c0ced48832f
MD5 2745fb7532fd7c452da516be69e53942
BLAKE2b-256 00d1fa983f0e74d74310baab64c8689d153b946808e94b5f8ca2eb0d4c2351fd

See more details on using hashes here.

File details

Details for the file axe_dig-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: axe_dig-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 190.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for axe_dig-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a8575b2b0089b83528923bcc10f77d4a1e5c42740c5cbac1595b51bff3c92429
MD5 3f09a18c77cdf3ac985131c9c29cd6d5
BLAKE2b-256 f899dee250ff224e1601ae8d4b14e8fbe47a4892c17a642040a8fcfcbc088427

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page