AI-native code indexing tool for large codebases
Project description
codeindex
Universal Code Parser - Best-in-class multi-language AST parser for AI-assisted development.
codeindex focuses on code parsing and structured data extraction using tree-sitter. It extracts symbols, inheritance relationships, call relationships, and imports from Python, PHP, Java (and more languages coming). Perfect for feeding structured code data to AI tools, knowledge graphs, and code intelligence platforms.
๐ค For LoomGraph Developers: Looking to integrate codeindex for code parsing? Start here:
- Quick Start:
FOR_LOOMGRAPH.md(5 min read)- Complete Guide:
docs/guides/loomgraph-integration.md(20 min, with code examples)
โจ Features
- ๐ AI-Powered Documentation: Generate comprehensive README files using Claude, GPT, or any AI CLI
- ๐ณ Tree-sitter Parsing: Accurate symbol extraction (classes, functions, methods, imports) for Python, PHP & Java
- ๐ Single File Parse (v0.13.0+): Parse individual files with JSON output for loose coupling with downstream tools
- โก Parallel Scanning: Scan multiple directories concurrently for fast indexing
- ๐ฏ Smart Filtering: Include/exclude patterns with glob support
- ๐ง Flexible Integration: Works with any AI CLI tool via configurable commands
- ๐ Coverage Tracking: Check which directories have been indexed
- ๐จ Fallback Mode: Generate basic documentation without AI
- ๐ฏ KISS Universal Description (v0.4.0+): Language-agnostic, zero-assumption module descriptions
- ๐๏ธ Modular Architecture (v0.3.1+): Clean, maintainable 6-module CLI design
- ๐ Adaptive Symbols (v0.2.0+): Dynamic symbol extraction (5-150 per file based on size)
- ๐ Technical Debt Analysis (v0.3.0+): Detect code quality issues and complexity metrics
- ๐ Symbol Indexing (v0.1.2+): Global symbol search and project-wide navigation
- ๐งช Template-Based Test Generation (v0.14.0+): AI-assisted test generation with 88-91% time savings
- YAML-driven specifications: Declarative language definitions
- Jinja2 templating: Automated test code generation
- 100% quality validation: Python syntax + language syntax checks
- Community-friendly: Enable non-Python developers to contribute language support
- ๐ฃ๏ธ Framework Route Extraction (v0.5.0+): Auto-detect and extract routes from web frameworks
- ThinkPHP (v0.5.0+): Convention-based routing with line numbers and PHPDoc descriptions
- Spring Boot (v0.8.0+): @GetMapping, @PostMapping, REST controllers with path variables
- Laravel (v0.16.0): Explicit route definitions (Epic 17)
- FastAPI (v0.16.0): Decorator-based routes (Epic 17)
- Django (v0.16.0): URL patterns (Epic 17)
- Express.js (v0.16.0): TypeScript/JavaScript routes (Epic 17)
- ๐ AI Docstring Extraction (v0.4.0+, Epic 9): Multi-language documentation normalization
- Hybrid mode: Selective AI processing (<$1 per 250 directories)
- All-AI mode: Maximum quality for critical projects
- Language support: PHP (PHPDoc + inline comments), Python (coming soon)
- Mixed language: Normalize Chinese + English comments to clean English
๐ฆ Installation
codeindex uses lazy loading - language parsers are only imported when needed. Install only the languages you use to keep dependencies minimal.
Basic Installation (Core Only)
# Install core only (no language parsers)
pip install ai-codeindex
Language-Specific Installation
Install only the languages you need:
# Python projects
pip install ai-codeindex[python]
# PHP projects
pip install ai-codeindex[php]
# Java projects
pip install ai-codeindex[java]
# Multiple languages
pip install ai-codeindex[python,php]
# All languages
pip install ai-codeindex[all]
Using pipx (Recommended)
# All languages
pipx install ai-codeindex[all]
# Or specific languages
pipx install ai-codeindex[python,php]
From Source
git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[all]" # Development mode with all languages
๐ Quick Start
1. Initialize Configuration
cd /your/project
codeindex init
This creates .codeindex.yaml in your project.
2. Configure AI CLI
Edit .codeindex.yaml:
# AI CLI command to use for generating documentation
ai_command: 'claude -p "{prompt}" --allowedTools "Read"'
# List of patterns to include for scanning
include:
- src/
# List of patterns to exclude from scanning
exclude:
- "**/test/**"
- "**/__pycache__/**"
# Supported languages
languages:
- python
- php
# Output filename
output_file: "README_AI.md"
Other AI CLI examples:
# OpenAI
ai_command: 'openai chat "{prompt}" --model gpt-4'
# Gemini
ai_command: 'gemini "{prompt}"'
# Custom script
ai_command: '/path/to/my-ai-wrapper.sh "{prompt}"'
3. Scan a Directory
# Scan single directory
codeindex scan ./src/auth
# Preview prompt without executing
codeindex scan ./src/auth --dry-run
# Generate without AI (fallback mode)
codeindex scan ./src/auth --fallback
๐ก Pro Tip: When scanning web framework directories (like Application/Admin/Controller for ThinkPHP), codeindex automatically:
- โ Detects the framework
- โ Extracts routes with line numbers
- โ Includes method descriptions from PHPDoc/docstrings
- โ Generates route tables in README_AI.md
4. Batch Processing
# Scan all directories (generates SmartWriter READMEs)
codeindex scan-all
# Traditional batch processing (for AI-enhanced docs)
codeindex list-dirs | xargs -P 4 -I {} codeindex scan {}
codeindex list-dirs | parallel -j 4 codeindex scan {}
Example output:
๐ Generating READMEs (SmartWriter)...
โ Application ( 50KB)
โ Admin ( 20KB)
โ api ( 15KB)
โ Completed: 3/3 directories
5. Generate Structured Data (JSON)
NEW in v0.5.0: For tool integration (e.g., LoomGraph, custom scripts, CI/CD pipelines), generate machine-readable JSON output.
# Single directory
codeindex scan ./src --output json
# Entire project
codeindex scan-all --output json > parse_results.json
# View formatted JSON
codeindex scan ./src --output json | jq .
JSON Output Structure:
{
"success": true,
"results": [
{
"file": "src/parser.py",
"symbols": [
{
"name": "Parser",
"kind": "class",
"signature": "class Parser:",
"line_start": 15,
"line_end": 120
}
],
"imports": [
{"module": "pathlib", "names": ["Path"], "is_from": true}
],
"error": null
}
],
"summary": {
"total_files": 1,
"total_symbols": 1,
"total_imports": 1,
"errors": 0
}
}
Error Handling:
When errors occur, the JSON response includes structured error information:
{
"success": false,
"error": {
"code": "DIRECTORY_NOT_FOUND",
"message": "Directory does not exist: /path/to/dir",
"detail": null
},
"results": [],
"summary": {
"total_files": 0,
"errors": 1
}
}
Use Cases:
- ๐ Tool Integration: Feed parse results to visualization tools like LoomGraph
- ๐ค CI/CD Pipelines: Validate code structure in automated workflows
- ๐ Analytics: Analyze codebase metrics across versions
- ๐งช Testing: Verify expected code structure in tests
6. Parse Single Files
NEW in v0.13.0: Parse individual source files for loose coupling with downstream tools.
๐ก For LoomGraph Integration: See complete guide at
docs/guides/loomgraph-integration.md
# Parse a Python file
codeindex parse src/auth/user.py
# Parse a PHP file
codeindex parse Application/Controller/User.php
# Parse a Java file
codeindex parse src/main/java/User.java
# Pretty print with jq
codeindex parse myfile.py | jq .
# Extract specific fields
codeindex parse myfile.py | jq '.symbols[] | {name, kind}'
JSON Output Structure (single file):
{
"file_path": "src/auth/user.py",
"language": "python",
"symbols": [
{
"name": "User",
"kind": "class",
"signature": "class User:",
"docstring": "User authentication model",
"line_start": 10,
"line_end": 50,
"annotations": []
}
],
"imports": [
{"module": "typing", "names": ["Dict"], "is_from": true, "alias": null}
],
"namespace": "",
"error": null
}
Exit Codes:
0: Success (includes partial parse with errors)1: File not found or permission denied2: Unsupported language3: Parse error
Integration Example (with LoomGraph):
# Parse and pipe to downstream tool
codeindex parse myfile.py | loomgraph import --format codeindex
# Batch parse multiple files
find src/ -name "*.py" -exec codeindex parse {} \; | \
jq -s '.' > all_symbols.json
See also:
- Quick examples:
examples/parse_integration_example.sh - For LoomGraph developers: See
docs/guides/loomgraph-integration.mdfor detailed integration guide with Python/Node.js code examples
7. Check Status
codeindex status
Output:
Indexing Status
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
src/auth/
โ
src/utils/
โ ๏ธ src/api/ (no README_AI.md)
โ
src/db/
Indexed: 3/4 (75%)
8. Generate Symbol Indexes (v0.1.2+)
Global symbol index - Find any class/function across your codebase:
# Generate PROJECT_SYMBOLS.md (global symbol index)
codeindex symbols
# Generate PROJECT_INDEX.md (module overview)
codeindex index
# Analyze git changes and affected directories
codeindex affected --since HEAD~5 --until HEAD
codeindex affected --json # For scripting/CI
What you get:
PROJECT_SYMBOLS.md provides:
- Quick class/function lookup across all files
- Cross-file references and imports
- Symbol locations with line numbers
- Grouped by directory
PROJECT_INDEX.md provides:
- Module overview with descriptions
- Directory structure
- Entry points and CLI commands
- Generated from README_AI.md files
Affected analysis helps with incremental updates:
- Shows which directories changed in git commits
- Suggests which README_AI.md files need regeneration
- JSON output for CI/CD integration
9. Analyze Technical Debt (v0.3.0+)
NEW in v0.3.0: Detect code quality issues and technical debt patterns.
# Analyze directory for technical debt
codeindex tech-debt ./src
# Output formats
codeindex tech-debt ./src --format console # Human-readable (default)
codeindex tech-debt ./src --format markdown # Documentation
codeindex tech-debt ./src --format json # API/scripting
# Save to file
codeindex tech-debt ./src --output debt_report.md
# Recursive analysis
codeindex tech-debt ./src --recursive
# Quiet mode (minimal output)
codeindex tech-debt ./src --quiet
What it detects:
- ๐ด Super large files (>5000 lines) - CRITICAL
- ๐ก Large files (>2000 lines) - HIGH
- ๐ด God Classes (>50 methods) - CRITICAL
- ๐ก Symbol overload (>100 symbols) - CRITICAL
- ๐ High noise ratio (>50% low-quality symbols) - HIGH
Example output:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Technical Debt Report
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Summary
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Files analyzed: 15
Issues found: 3
Quality Score: 78.3/100
Severity Breakdown
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
CRITICAL: 1
HIGH: 2
MEDIUM: 0
LOW: 0
File Details
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ src/models/user.py (Quality: 70.0)
๐ด CRITICAL - super_large_file
File has 6000 lines (threshold: 5000)
โ Split into 3-5 smaller files
10. Generate Test Suite for New Languages (v0.14.0+)
NEW in v0.14.0: Use the template-based test generation system to quickly add language support.
cd test_generator
# Create language specification (or copy template)
cp specs/_template.yaml specs/go.yaml
# Edit go.yaml with Go code examples
# Generate tests automatically
python generator.py \
--spec specs/go.yaml \
--template templates/inheritance_test.py.j2 \
--output test_go_inheritance.py
# Validate generated code
python -m py_compile test_go_inheritance.py # Python syntax
# Review Go code syntax manually
# Output: 500-700 lines of high-quality test code in 5 minutes!
Benefits:
- โฑ๏ธ 88-91% faster than manual test writing
- โ 100% syntax correctness (automated validation)
- ๐ Language-agnostic (just provide code examples in YAML)
- ๐ค Community-friendly (non-Python developers can contribute)
Example output:
โ
Loaded spec: Go (extension: .go)
โ
Loaded template: inheritance_test.py.j2
๐ง Generating Go tests...
โ
Code validation passed
โ
Generated test file:
File: test_go_inheritance.py
Lines: 587
Test classes: 7
Test methods: 22
See CONTRIBUTING_LANGUAGE_SUPPORT.md for complete guide.
11. Framework Route Extraction (v0.5.0+)
NEW in v0.5.0: Automatically detect and extract routes from web frameworks with line numbers and descriptions.
codeindex automatically identifies web frameworks and extracts route information when scanning Controller/View directories. Routes are displayed as beautiful markdown tables in your README_AI.md files.
Supported Frameworks
| Framework | Language | Status | Features |
|---|---|---|---|
| ThinkPHP | PHP | โ Stable | Line numbers, PHPDoc descriptions, module-based routing |
| Laravel | PHP | ๐ Coming v0.6.0 | Named routes, route groups, middleware |
| FastAPI | Python | ๐ Coming v0.6.0 | Path operations, dependencies, tags |
| Django | Python | ๐ Coming v0.6.0 | URL patterns, namespaces, view classes |
Example Output
ThinkPHP Controller (Application/Admin/Controller/UserController.php):
class UserController {
/**
* Get user list with pagination
*/
public function index() {
// ...
}
/**
* ๅๅปบๆฐ็จๆท
*/
public function create() {
// ...
}
}
Generated Route Table in README_AI.md:
## Routes (ThinkPHP)
| URL | Controller | Action | Location | Description |
|-----|------------|--------|----------|-------------|
| `/admin/user/index` | UserController | index | `UserController.php:12` | Get user list with pagination |
| `/admin/user/create` | UserController | create | `UserController.php:20` | ๅๅปบๆฐ็จๆท |
How It Works
- Auto-Detection: Scans directory structure to detect web frameworks
- Symbol Extraction: Parses controllers/views using tree-sitter
- Route Inference: Applies framework-specific routing conventions
- Documentation Extraction: Extracts docstrings/PHPDoc comments
- Table Generation: Formats as markdown table in README_AI.md
Features:
- โ
Line Numbers: Clickable
file:linelocations - โ Descriptions: From PHPDoc/docstrings (auto-truncated to 60 chars)
- โ Multi-language: Supports Chinese and English descriptions
- โ Smart Filtering: Only public methods, excludes magic methods
- โ Zero Configuration: Just scan, routes auto-appear
Usage
# Routes are automatically extracted when scanning
codeindex scan-all
# Or scan specific controller directory
codeindex scan ./Application/Admin/Controller
No configuration needed! Routes are detected and extracted automatically.
For Developers
Want to add support for your favorite framework? See CLAUDE.md for the complete developer guide on creating custom route extractors.
๐ Recent Updates
Current version: v0.15.0
Key Features
- ๐งช Template-Based Test Generation (v0.14.0): AI-assisted test generation system
- 88-91% time savings (11-17 hours โ ~2 hours per language)
- YAML-driven specifications: Declarative language definitions
- Jinja2 templating: Automated test code generation
- 100% quality validation: TypeScript tests ready (25 methods)
- Community-friendly: Enable non-Python developers to contribute
- ๐ Call Relationship Extraction (v0.12.0): Function/method call graphs and dependency analysis
- ๐ฃ๏ธ Framework Route Extraction: Auto-detect routes from ThinkPHP and Spring frameworks
- ๐ค AI Docstring Extraction: Multi-language documentation normalization (PHP, Python)
- ๐ฏ KISS Universal Descriptions: Language-agnostic module summaries with actual symbol names
- ๐ Technical Debt Analysis: Detect code quality issues and complexity metrics
- ๐ Automated Release Workflow: One-command releases with GitHub Actions + PyPI Trusted Publisher
Latest Improvements (v0.14.0)
- โ Interactive Setup Wizard with smart auto-detection
- โ Makefile automation for development and releases
- โ Git hooks for code quality (pre-commit, post-commit, pre-push)
- โ Modular CLI architecture (6 focused modules)
- โ Adaptive symbol extraction (5-150 symbols per file)
- โ Parallel scanning for faster indexing
See: CHANGELOG.md for complete version history
๐ Documentation
User Guides
- Getting Started - Detailed installation and setup
- Configuration Guide - All config options explained
- Configuration Changelog - Version-by-version config changes
- Advanced Usage - Parallel scanning, custom prompts
- Git Hooks Integration - Automated code quality checks
Developer Guides
- CONTRIBUTING.md - Development setup, TDD workflow, code style guidelines
- CLAUDE.md - Quick reference for Claude Code and contributors
- Design Philosophy - Core design principles and architecture
- Release Automation - 5-minute automated release workflow
- Multi-Language Support - Guide for adding new language support
- Requirements Workflow - Planning, issues, and development process
Planning
- Strategic Roadmap - Long-term vision and priorities
- Changelog - Version history and breaking changes
โ๏ธ Configuration Reference
Complete .codeindex.yaml
codeindex: 1
# AI CLI command (required)
ai_command: 'claude -p "{prompt}" --allowedTools "Read"'
# Directory patterns
include:
- src/ # Include all subdirectories recursively
- modules/
exclude:
- "**/test/**"
- "**/__pycache__/**"
- "**/node_modules/**"
# Language support
languages:
- python
- php
# Output settings
output_file: "README_AI.md"
parallel_workers: 8
batch_size: 50
# Smart indexing (generates tiered documentation)
indexing:
max_readme_size: 51200
root_level: "overview"
module_level: "navigation"
leaf_level: "detailed"
# Adaptive symbol extraction (v0.2.0+)
symbols:
adaptive_symbols:
enabled: true # Enable dynamic symbol limits based on file size
min_symbols: 5 # Minimum symbols for tiny files
max_symbols: 150 # Maximum symbols for huge files
thresholds: # File size thresholds (lines)
tiny: 100 # <100 lines โ 5 symbols
small: 500 # 100-500 lines โ 15 symbols
medium: 1500 # 500-1500 lines โ 30 symbols
large: 3000 # 1500-3000 lines โ 50 symbols
xlarge: 5000 # 3000-5000 lines โ 80 symbols
huge: 8000 # 5000-8000 lines โ 120 symbols
mega: null # >8000 lines โ 150 symbols
limits: # Symbol limits per category
tiny: 5
small: 15
medium: 30
large: 50
xlarge: 80
huge: 120
mega: 150
# Incremental updates
incremental:
enabled: true
thresholds:
skip_lines: 5
current_only: 50
suggest_full: 200
# Git Hooks configuration (v0.7.0+, Story 6)
hooks:
post_commit:
mode: auto # auto | disabled | async | sync | prompt
max_dirs_sync: 2 # Auto mode: โค2 dirs = sync, >2 = async
enabled: true # Master switch
log_file: ~/.codeindex/hooks/post-commit.log
Hooks Modes:
auto(default): Smart detection based on project sizedisabled: Completely disabledasync: Always non-blocking (background updates)sync: Always blocking (immediate updates)prompt: Reminder only, no auto-execution
See Git Hooks Integration Guide for detailed configuration.
๐ค Claude Code Integration
codeindex generates README_AI.md files that are perfect for Claude Code to understand your project architecture. By adding a CLAUDE.md file to your project, you can guide Claude Code to use these indexes effectively.
Why Use CLAUDE.md?
Without guidance, Claude Code might:
- โ Blindly search through all source files (slow and inefficient)
- โ Miss important architectural context
- โ Use Glob/Grep instead of semantic understanding
With CLAUDE.md, Claude Code will:
- โ
Read
README_AI.mdfiles first (fast and structured) - โ Understand your project architecture before diving into code
- โ Use Serena MCP tools for precise symbol navigation
Quick Setup
1. Copy the template to your project:
# After running codeindex scan-all
cp examples/CLAUDE.md.template CLAUDE.md
2. Customize the project-specific sections:
Edit the "Project Specific Configuration" section in your CLAUDE.md to document your project structure, key components, and development guidelines.
3. Commit and push:
git add CLAUDE.md README_AI.md **/README_AI.md
git commit -m "docs: add Claude Code integration"
What's Included in the Template
The template includes guidance for Claude Code to:
- Prioritize README_AI.md files when understanding architecture
- Use Serena MCP tools (find_symbol, find_referencing_symbols) for precise navigation
- Follow a structured workflow: README โ find_symbol โ read source โ analyze dependencies
- Avoid inefficient patterns like Glob/Grep searches
Example Workflow
After setup, when you ask Claude Code about your project:
โ Without CLAUDE.md:
You: "Where is the authentication module?"
Claude: [Uses Glob to search for "auth*"]
[Scans 50 files, wastes time]
โ
With CLAUDE.md:
You: "Where is the authentication module?"
Claude: [Reads /src/README_AI.md]
[Reads /src/auth/README_AI.md]
"The authentication module is in src/auth/authenticator.py:15
with UserAuthenticator class..."
Advanced Integration: MCP Skills
codeindex also includes MCP skills for Claude Code:
| Skill | Description |
|---|---|
/mo:arch |
Query code architecture using README_AI.md indexes |
/mo:index |
Generate repository index with codeindex |
Install skills:
# Navigate to codeindex directory
cd /path/to/codeindex
# Run install script
./skills/install.sh
For Git Hooks Users (v0.5.0+)
If you're using codeindex Git Hooks, help your AI Code CLI understand how hooks work:
Method 1: Let AI Code read the guide โญ๏ธ (Recommended)
# In your project directory, run:
codeindex docs show-ai-guide
Then tell your AI:
User: "Read the output above and update my CLAUDE.md with Git Hooks documentation"
AI Code: [Reads the guide]
[Understands Git Hooks]
[Updates your CLAUDE.md/AGENTS.md]
โ
Done!
Method 2: Direct AI integration
User: "Help my AI CLI understand codeindex Git Hooks"
AI Code: [User runs: codeindex docs show-ai-guide]
[AI reads output]
[Updates CLAUDE.md with Git Hooks section]
โ
Done! Future AI sessions will know about hooks.
What the guide contains:
- Complete Git Hooks functionality explanation
- Pre-commit and post-commit behaviors
- Ready-to-use section template for your CLAUDE.md
- Troubleshooting and common scenarios
- Expected behaviors (auto-commits are normal!)
Why this matters: Your AI CLI needs to know that post-commit will create auto-commits (normal behavior) and that lint failures will block commits (by design).
Full Documentation
- User Guide: docs/guides/claude-code-integration.md
- Git Hooks Guide: docs/guides/git-hooks-integration.md
- AI Integration: examples/ai-integration-guide.md
- Template File: examples/CLAUDE.md.template
- Skills Documentation: skills/README.md
๐ฏ Use Cases
๐ Code Understanding
Generate comprehensive documentation for legacy codebases to help new developers onboard faster.
๐ Codebase Navigation
Create structured overviews of large projects (10,000+ files) for efficient exploration.
๐ค AI Agent Integration
Use generated indexes with tools like Claude Code or Cursor for better code context.
๐ Living Documentation
Keep documentation up-to-date by regenerating README_AI.md files as code changes.
๐ ๏ธ How It Works
Code Parsing & Documentation
Directory โ Scanner โ Parser (tree-sitter) โ Smart Writer โ README_AI.md (โค50KB)
- Scanner: Walks directories, filters by config patterns
- Parser: Extracts symbols (classes, functions, imports) using tree-sitter
- Smart Writer: Generates tiered documentation with size limits
- Output: Optimized
README_AI.mdfor AI consumption
Test Generation (v0.14.0+)
Language Spec (YAML) โ Jinja2 Template โ Python Generator โ Test File (500-700 lines)
โ โ
Code Examples Validation (100%)
Expected Results Python + Target Language
- YAML Specification: Define language syntax patterns and test scenarios
- Jinja2 Template: Reusable test code template
- Generator: Automated test file creation with validation
- Output: High-quality pytest test suite
Key Innovation: Separate test definition (YAML) from test implementation (Python), enabling non-Python developers to contribute language support.
๐ Smart Indexing Architecture
codeindex generates tiered documentation optimized for AI agents:
Project Root/
โโโ PROJECT_INDEX.md (~10KB) # Overview level
โ โโโ Module list + descriptions
โ
โโโ Module/
โ โโโ README_AI.md (~30KB) # Navigation level
โ โโโ Grouped files by type
โ โโโ Key classes summary
โ
โโโ LeafDir/
โโโ README_AI.md (โค50KB) # Detailed level
โโโ Full symbol info
โโโ Dependencies
Configuration
indexing:
max_readme_size: 51200 # 50KB limit
symbols:
max_per_file: 15
include_visibility: [public, protected]
exclude_patterns: ["get*", "set*"]
grouping:
by: suffix
patterns:
Controller: "HTTP handlers"
Service: "Business logic"
Model: "Data models"
๐ค AI Coder Integration
For Claude Code Users
Add this to your project's CLAUDE.md:
## Code Index
This project uses codeindex for AI-friendly documentation.
### How to Read Code Index
1. **Start with overview**: Read `PROJECT_INDEX.md` or root `README_AI.md` to understand project structure
2. **Locate module**: Find the relevant module from the module list
3. **Deep dive**: Read module's `README_AI.md` for file/symbol details
4. **Read source**: Open specific files when you need implementation details
### Index Files
- `README_AI.md` - Directory-level documentation (โค50KB each)
- Each directory with source code has its own README_AI.md
### Example Workflow
Task: "Fix user authentication bug"
1. Read root README_AI.md โ Find Auth/User module
2. Read Auth/README_AI.md โ Find AuthService.php
3. Read AuthService.php โ Understand implementation
Usage Tips
- Token efficient: Each README is โค50KB, suitable for LLM context
- Progressive loading: Start from overview, drill down as needed
- Keep indexes updated: Run
codeindex scan-all --fallbackafter major changes
CLAUDE.md Template
Copy the template to your project:
cp /path/to/codeindex/examples/CLAUDE.md.template your-project/CLAUDE.md
Or see examples/CLAUDE.md.template for the full template.
๐ Integration with LoomGraph
codeindex and LoomGraph work together as complementary tools:
Architecture
codeindex (AST Parser)
โ Structured Data (JSON)
LoomGraph (Knowledge Graph + AI)
โ Insights & Analysis
Applications (IDE, CI/CD, Team Tools)
Division of Responsibilities
| Tool | Focus | Key Features |
|---|---|---|
| codeindex | Code Parsing | AST extraction, symbol extraction, call/inheritance relationships, multi-language support |
| LoomGraph | AI Analysis | Knowledge graph, vector embeddings, semantic search, refactoring suggestions, team collaboration |
What codeindex Provides
- โ Structured code data (symbols, calls, imports, inheritance)
- โ Multi-language support (Python, PHP, Java, TypeScript, Go, Rust, C#)
- โ Framework awareness (ThinkPHP, Spring, Laravel, FastAPI routes)
- โ
JSON output for downstream tools (
codeindex parse,codeindex scan --output json)
What LoomGraph Adds
- ๐ Code similarity search (vector embeddings + semantic search)
- ๐ค Automated refactoring suggestions (graph analysis + AI)
- ๐ฅ Team collaboration (shared knowledge graphs)
- ๐ IDE integration (LSP server for real-time features)
Integration Guide
See docs/guides/loomgraph-integration.md or FOR_LOOMGRAPH.md for complete integration examples.
Quick Example:
# Parse a file and get JSON output
codeindex parse myfile.py | jq .
# Parse all files in a directory
codeindex scan ./src --output json > parse_results.json
# LoomGraph consumes this JSON to build knowledge graph
Why This Separation?
- Single Responsibility: codeindex focuses on parsing, LoomGraph focuses on AI
- Independent Evolution: Each tool can evolve without affecting the other
- Flexible Integration: Use codeindex alone or with LoomGraph
- Performance: Lightweight parser vs. heavyweight graph+AI system
๐ Language Support
| Language | Status | Version | Features |
|---|---|---|---|
| Python | โ Supported | v0.1.0+ | Classes, functions, methods, imports, docstrings, inheritance, calls |
| PHP | โ Supported | v0.5.0+ | Classes (extends/implements), methods (visibility, static, return types), properties, functions, inheritance, calls |
| Java | โ Supported | v0.7.0+ | Classes, interfaces, enums, records, annotations, Spring routes, Lombok, inheritance, calls |
| TypeScript/JS | ๐งช Tests Ready | v0.14.0 | Classes, functions, React components, JSDoc (Epic 15) - Parser implementation in progress |
| Go | ๐ Planned | v0.15.0 | Packages, interfaces, struct methods (Epic 16) |
| Rust | ๐ Planned | v0.17.0 | Structs, traits, modules (Epic 19) |
| C# | ๐ Planned | v0.18.0 | Classes, interfaces, .NET projects |
๐ฏ Test Architecture (v0.14.0+)
codeindex uses a template-based test generation system to accelerate language support development:
- YAML Language Specifications: Declarative syntax patterns and test scenarios
- Jinja2 Templates: Automated Python test code generation
- Quality Validation: 100% syntax correctness for both Python and target language
- Time Savings: 88-91% reduction (11-17 hours โ ~2 hours per language)
Current test coverage:
- โ Python: 50+ test methods (hand-written)
- โ PHP: 30+ test methods (hand-written)
- โ Java: 60+ test methods (hand-written)
- โ TypeScript: 25 test methods (template-generated, 100% quality)
Want to contribute a new language? See Contributing Language Support below.
๐ค Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
๐ Quick Start for Contributors
# Clone and install
git clone https://github.com/dreamlx/codeindex.git
cd codeindex
# Install with dev dependencies
make install-dev
# or: pip install -e ".[dev,all]"
# Install Git hooks (pre-push checks)
make install-hooks
# Run tests
make test
# or: pytest
# Lint and auto-fix
make lint-fix
# or: ruff check --fix src/
# See all available commands
make help
๐ Contributing Language Support
Want to add support for Go, Rust, C++, or other languages? You don't need to know Python!
We use a template-based test generation system that lets you contribute by only knowing your target language:
Quick Start (1-3 hours total)
-
Create YAML specification (1-2 hours)
cd test_generator/specs cp _template.yaml <language>.yaml # Fill in code examples in your language
-
Generate tests (5 minutes)
python generator.py \ --spec specs/<language>.yaml \ --template templates/inheritance_test.py.j2 \ --output test_<language>_inheritance.py
-
Review and submit PR (30-60 minutes)
- Verify Python syntax:
python -m py_compile test_*.py - Verify your language syntax (manual review)
- Submit PR with both YAML and generated test file
- Verify Python syntax:
What You Need
- โ Familiarity with target language (Go/Rust/C++/etc.)
- โ Ability to write code examples in that language
- โ 1-3 hours of time
- โ NO Python knowledge required!
What You'll Create
- YAML file: 20-30 code templates with expected parsing results
- Test file: Auto-generated Python tests (you just review)
Quality Standards
- Minimum: 6 test classes, 15 test methods
- Target: 8 test classes, 25+ test methods
- Validation: 100% Python syntax + 100% target language syntax
Examples
- TypeScript: See
test_generator/specs/typescript.yaml(351 lines, 28 templates) - Template: See
test_generator/specs/_template.yaml(fully documented starter)
Full Guide
See CONTRIBUTING_LANGUAGE_SUPPORT.md for:
- Detailed step-by-step instructions
- YAML specification guide
- PR template and checklist
- FAQ and troubleshooting
Current recruitment: ๐ฅ Go, Rust, C++, C#, Ruby, Kotlin
๐ Developer Documentation
- Quick Start Release Guide - 5-minute automated release workflow
- Release Workflow - Complete release process documentation
- Multi-Language Support - Guide for adding new language support
- CONTRIBUTING.md - Development setup, TDD workflow, code style guidelines
- Makefile - Run
make helpto see all available commands
๐ฏ Release Process (Maintainers)
# Automated one-command release
make release VERSION=0.13.0
# GitHub Actions will automatically:
# โ
Run tests on Python 3.10, 3.11, 3.12
# โ
Build and publish to PyPI
# โ
Create GitHub Release
# See: docs/development/QUICK_START_RELEASE.md
๐ Roadmap
See Strategic Roadmap for detailed plans.
Completed (v0.14.0):
- โ Python, PHP, Java language support (with LoomGraph integration)
- โ Single file parse command (loose coupling with downstream tools)
- โ Parser modularization (3622โ374 lines refactoring)
- โ Windows platform compatibility (UTF-8 + path optimization)
- โ Call relationships extraction (Python/Java/PHP)
- โ Framework routes (ThinkPHP, Spring Boot)
- โ
Interactive Setup Wizard (
codeindex init)
In Progress (v0.15.0):
- ๐ Template-based test generation system (Epic 18)
- ๐ Test architecture migration (Python/PHP/Java โ YAML specs)
Next (v0.16.0 - v0.18.0):
- ๐ Framework routes expansion: Express, Laravel, FastAPI, Django (v0.16.0, Epic 17)
- ๐ Rust language support (v0.17.0, Epic 19)
- ๐ C# language support (v0.18.0)
Not Included (Moved to LoomGraph):
- โ Code similarity search โ LoomGraph v0.3.0
- โ Automated refactoring suggestions โ LoomGraph v0.4.0
- โ Team collaboration features โ LoomGraph v0.5.0
- โ IDE deep integration (LSP server) โ LoomGraph v0.6.0
Reason: codeindex focuses on code parsing (AST โ structured data), while LoomGraph focuses on AI analysis (structured data โ knowledge graph โ insights).
๐ License
MIT License - see LICENSE file for details.
๐ Acknowledgments
- tree-sitter - Fast, incremental parsing
- Claude CLI - AI integration inspiration
- All contributors and users
๐ Support
- Questions: GitHub Discussions
- Bugs: GitHub Issues
- Feature Requests: GitHub Issues
โญ Star History
If you find codeindex useful, please star the repository to show your support!
Made with โค๏ธ by the codeindex team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_codeindex-0.15.0.tar.gz.
File metadata
- Download URL: ai_codeindex-0.15.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4d68be428c144d48f574cd07695d0e0b3c3d626c409409d3dce4d3f231c1497
|
|
| MD5 |
7da067c221c3507e31f902bcae681894
|
|
| BLAKE2b-256 |
47c48c4721ce8b1c7ca1efb12e165e7c41e97b10501c6846caed8d88dbad81dd
|
Provenance
The following attestation bundles were made for ai_codeindex-0.15.0.tar.gz:
Publisher:
publish.yml on dreamlx/codeindex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_codeindex-0.15.0.tar.gz -
Subject digest:
b4d68be428c144d48f574cd07695d0e0b3c3d626c409409d3dce4d3f231c1497 - Sigstore transparency entry: 942873318
- Sigstore integration time:
-
Permalink:
dreamlx/codeindex@d921838d477a83f0c6eea2a8c63ae6103936e471 -
Branch / Tag:
refs/tags/v0.15.0 - Owner: https://github.com/dreamlx
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d921838d477a83f0c6eea2a8c63ae6103936e471 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ai_codeindex-0.15.0-py3-none-any.whl.
File metadata
- Download URL: ai_codeindex-0.15.0-py3-none-any.whl
- Upload date:
- Size: 144.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5729d7b72d071db985fed72c2b70d1436319dac7615418fe6c80f7a19a38cd3
|
|
| MD5 |
30746d48a3d61c84add1a4180a34624c
|
|
| BLAKE2b-256 |
06dce0ce8665f6cda35f19dac802cc3b992fd3a32431b10f2cd200ddfc3f858c
|
Provenance
The following attestation bundles were made for ai_codeindex-0.15.0-py3-none-any.whl:
Publisher:
publish.yml on dreamlx/codeindex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_codeindex-0.15.0-py3-none-any.whl -
Subject digest:
b5729d7b72d071db985fed72c2b70d1436319dac7615418fe6c80f7a19a38cd3 - Sigstore transparency entry: 942873321
- Sigstore integration time:
-
Permalink:
dreamlx/codeindex@d921838d477a83f0c6eea2a8c63ae6103936e471 -
Branch / Tag:
refs/tags/v0.15.0 - Owner: https://github.com/dreamlx
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d921838d477a83f0c6eea2a8c63ae6103936e471 -
Trigger Event:
push
-
Statement type: