AST, Call Graph, CFG, DFG, PDG.
Project description
DIG: Code Analysis for AI Agents
Give LLMs exactly the code they need. Nothing more.
# One-liner: Install, index, search
pip install axe-dig && chop warm . && chop semantic "what you're looking for" .
Your codebase is 100K lines. Claude's context window is 200K tokens. Raw code won't fit—and even if it did, the LLM would drown in irrelevant details.
DIG extracts structure instead of dumping text. The result: 95% fewer tokens while preserving everything needed to understand and edit code correctly.
pip install axe-dig
chop warm . # Index your project
chop context main --project . # Get LLM-ready summary
How It Works
DIG builds 5 analysis layers, each answering different questions:
┌─────────────────────────────────────────────────────────────┐
│ Layer 5: Program Dependence → "What affects line 42?" │
│ Layer 4: Data Flow → "Where does this value go?" │
│ Layer 3: Control Flow → "How complex is this?" │
│ Layer 2: Call Graph → "Who calls this function?" │
│ Layer 1: AST → "What functions exist?" │
└─────────────────────────────────────────────────────────────┘
Why layers? Different tasks need different depth:
- Browsing code? Layer 1 (structure) is enough
- Refactoring? Layer 2 (call graph) shows what breaks
- Debugging null? Layer 5 (slice) shows only relevant lines
The daemon keeps indexes in memory for 100ms queries instead of 30-second CLI spawns.
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ YOUR CODE │
│ src/*.py, lib/*.ts, pkg/*.go │
└───────────────────────────┬──────────────────────────────────────┘
│ tree-sitter
▼
┌──────────────────────────────────────────────────────────────────┐
│ 5-LAYER ANALYSIS │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ AST │→│ Calls │→│ CFG │→│ DFG │→│ PDG │ │
│ │ L1 │ │ L2 │ │ L3 │ │ L4 │ │ L5 │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
└───────────────────────────┬──────────────────────────────────────┘
│ bge-large-en-v1.5
▼
┌──────────────────────────────────────────────────────────────────┐
│ SEMANTIC INDEX │
│ 1024-dim embeddings in FAISS → "find JWT validation" │
└───────────────────────────┬──────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ DAEMON │
│ In-memory indexes • 100ms queries • Auto-lifecycle │
└──────────────────────────────────────────────────────────────────┘
The Semantic Layer: Search by Behavior
The real power comes from combining all 5 layers into searchable embeddings.
Every function gets indexed with:
- Signature + docstring (L1)
- What it calls + who calls it (L2)
- Complexity metrics (L3)
- Data flow patterns (L4)
- Dependencies (L5)
- First ~10 lines of actual code
This gets encoded into 1024-dimensional vectors using bge-large-en-v1.5. The result: search by what code does, not just what it says.
# "validate JWT" finds verify_access_token() even without that exact text
chop semantic "validate JWT tokens and check expiration" .
Why this works: Traditional search finds authentication in variable names and comments. Semantic search understands that verify_access_token() performs JWT validation because the call graph and data flow reveal its purpose.
Setting Up Semantic Search
# Build the semantic index (one-time, ~2 min for typical project)
chop warm /path/to/project
# Search by behavior
chop semantic "database connection pooling" .
Embedding dependencies (sentence-transformers, faiss-cpu) are included with pip install axe-dig. The index is cached in .dig/cache/semantic.faiss.
Keeping the Index Fresh
The daemon tracks dirty files and auto-rebuilds after 20 changes, but you need to notify it when files change:
# Notify daemon of a changed file
chop daemon notify src/auth.py --project .
Integration options:
-
Git hook (post-commit):
git diff --name-only HEAD~1 | xargs -I{} chop daemon notify {} --project .
-
Editor hook (on save):
chop daemon notify "$FILE" --project .
-
Manual rebuild (when needed):
chop warm . # Full rebuild
The daemon auto-rebuilds semantic embeddings in the background once the dirty threshold (default: 20 files) is reached.
The Workflow
Before Reading Code
chop tree src/ # See file structure
chop structure src/ --lang python # See functions/classes
Before Editing
chop extract src/auth.py # Full file analysis
chop context login --project . # LLM-ready summary (95% savings)
Before Refactoring
chop impact login . # Who calls this? (reverse call graph)
chop change-impact # Which tests need to run?
Debugging
chop slice src/auth.py login 42 # What affects line 42?
chop dfg src/auth.py login # Trace data flow
Finding Code by Behavior
chop semantic "validate JWT tokens" . # Natural language search
Advanced Analysis & Visualization
chop cycles . # Detect recursion loops
chop path src_func tgt_func # Find shortest call path
# Run inject_data.py to generate the dig_visualizer.html
Quick Setup
1. Install
pip install axe-dig
2. Index Your Project
chop warm /path/to/project
This builds all analysis layers and starts the daemon. Takes 30-60 seconds for a typical project, then queries are instant.
3. Start Using
chop context main --project . # Get context for a function
chop impact helper_func . # See who calls it
chop semantic "error handling" # Find by behavior
Real Example: Why This Matters
Scenario: Debug why user is null on line 42.
Without DIG:
- Read the 150-line function
- Trace every variable manually
- Miss the bug because it's hidden in control flow
With DIG:
chop slice src/auth.py login 42
Output: Only 6 lines that affect line 42:
3: user = db.get_user(username)
7: if user is None:
12: raise NotFound
28: token = create_token(user) # ← BUG: skipped null check
35: session.token = token
42: return session
The bug is obvious. Line 28 uses user without going through the null check path.
Command Reference
Exploration
| Command | What It Does |
|---|---|
chop tree [path] |
File tree |
chop structure [path] --lang <lang> |
Functions, classes, methods |
chop search <pattern> [path] |
Text pattern search |
chop extract <file> |
Full file analysis |
Analysis
| Command | What It Does |
|---|---|
chop context <func> --project <path> |
LLM-ready summary (95% savings) |
chop cfg <file> <function> |
Control flow graph |
chop dfg <file> <function> |
Data flow graph |
chop slice <file> <func> <line> |
Program slice |
Cross-File
| Command | What It Does |
|---|---|
chop calls [path] |
Build call graph |
chop impact <func> [path] |
Find all callers (reverse call graph) |
chop cycles [path] |
Detect recursive loops and circular deps |
chop path <src> <tgt> [path] |
Find shortest path between functions |
chop dead [path] |
Find unreachable code |
chop arch [path] |
Detect architecture layers |
chop imports <file> |
Parse imports |
chop importers <module> [path] |
Find files that import a module |
Visualization
| Command | What It Does |
|---|---|
python3 axe-dig/inject_data.py |
Generate interactive knowledge graph |
open dig_visualizer.html |
Open premium 5-layout visualizer |
Semantic
| Command | What It Does |
|---|---|
chop warm <path> |
Build all indexes (including embeddings) |
chop semantic <query> [path] |
Natural language code search |
Diagnostics
| Command | What It Does |
|---|---|
chop diagnostics <file> |
Type check + lint |
chop change-impact [files] |
Find tests affected by changes |
chop doctor |
Check/install diagnostic tools |
Daemon
| Command | What It Does |
|---|---|
chop daemon start |
Start background daemon |
chop daemon stop |
Stop daemon |
chop daemon status |
Check status |
Supported Languages
Python, TypeScript, JavaScript, Go, Rust, Java, C, C++, Ruby, PHP, C#, Kotlin, Scala, Swift, Lua, Elixir
Language is auto-detected or specify with --lang.
MCP Integration
For AI tools (Claude Desktop, Claude Code):
Claude Desktop - Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"dig": {
"command": "dig-mcp",
"args": ["--project", "/path/to/your/project"]
}
}
}
Claude Code - Add to .claude/settings.json:
{
"mcpServers": {
"dig": {
"command": "dig-mcp",
"args": ["--project", "."]
}
}
}
Configuration
.digignore - Exclude Files
DIG respects .digignore (gitignore syntax) for all commands including tree, structure, search, calls, and semantic indexing:
# Auto-create with sensible defaults
dig warm . # Creates .digignore if missing
Default exclusions:
node_modules/,.venv/,__pycache__/dist/,build/,*.egg-info/- Binary files (
*.so,*.dll,*.whl) - Security files (
.env,*.pem,*.key)
Customize by editing .digignore:
# Add your patterns
large_test_fixtures/
vendor/
data/*.csv
CLI Flags:
# Add patterns from command line (can be repeated)
chop --ignore "packages/old/" --ignore "*.generated.ts" tree .
# Bypass all ignore patterns
chop --no-ignore tree .
Settings - Daemon Behavior
Create .dig/config.json for daemon settings:
{
"semantic": {
"enabled": true,
"auto_reindex_threshold": 20
}
}
| Setting | Default | Description |
|---|---|---|
enabled |
true |
Enable semantic search |
auto_reindex_threshold |
20 |
Files changed before auto-rebuild |
Monorepo Support
For monorepos, create .claude/workspace.json to scope indexing:
{
"active_packages": ["packages/core", "packages/api"],
"exclude_patterns": ["**/fixtures/**"]
}
Performance
| Metric | Raw Code | DIG | Improvement |
|---|---|---|---|
| Tokens for function context | 21,000 | 175 | 99% savings |
| Tokens for codebase overview | 104,000 | 12,000 | 89% savings |
| Query latency (daemon) | 30s | 100ms | 300x faster |
Deep Dive
For the full architecture explanation, benchmarks, and advanced workflows:
License
AGPL-3.0 - See LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file axe_dig-1.6.0.tar.gz.
File metadata
- Download URL: axe_dig-1.6.0.tar.gz
- Upload date:
- Size: 198.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92b029b53c1ac2dc26069f5b7f4237c1e1524c236381718192151c0ced48832f
|
|
| MD5 |
2745fb7532fd7c452da516be69e53942
|
|
| BLAKE2b-256 |
00d1fa983f0e74d74310baab64c8689d153b946808e94b5f8ca2eb0d4c2351fd
|
File details
Details for the file axe_dig-1.6.0-py3-none-any.whl.
File metadata
- Download URL: axe_dig-1.6.0-py3-none-any.whl
- Upload date:
- Size: 190.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8575b2b0089b83528923bcc10f77d4a1e5c42740c5cbac1595b51bff3c92429
|
|
| MD5 |
3f09a18c77cdf3ac985131c9c29cd6d5
|
|
| BLAKE2b-256 |
f899dee250ff224e1601ae8d4b14e8fbe47a4892c17a642040a8fcfcbc088427
|