Skip to main content

AI-native code indexing tool for large codebases

Project description

codeindex

PyPI version Python 3.10+ License: MIT Tests

Universal Code Parser - Best-in-class multi-language AST parser for AI-assisted development.

codeindex focuses on code parsing and structured data extraction using tree-sitter. It extracts symbols, inheritance relationships, call relationships, and imports from Python, PHP, Java (and more languages coming). Perfect for feeding structured code data to AI tools, knowledge graphs, and code intelligence platforms.


๐Ÿค For LoomGraph Developers: Looking to integrate codeindex for code parsing? Start here:


โœจ Features

  • ๐Ÿš€ AI-Powered Documentation: Generate comprehensive README files using Claude, GPT, or any AI CLI
  • ๐ŸŒณ Tree-sitter Parsing: Accurate symbol extraction (classes, functions, methods, imports) for Python, PHP & Java
  • ๐Ÿ“„ Single File Parse (v0.13.0+): Parse individual files with JSON output for loose coupling with downstream tools
  • โšก Parallel Scanning: Scan multiple directories concurrently for fast indexing
  • ๐ŸŽฏ Smart Filtering: Include/exclude patterns with glob support
  • ๐Ÿ”ง Flexible Integration: Works with any AI CLI tool via configurable commands
  • ๐Ÿ“Š Coverage Tracking: Check which directories have been indexed
  • ๐ŸŽจ Fallback Mode: Generate basic documentation without AI
  • ๐ŸŽฏ KISS Universal Description (v0.4.0+): Language-agnostic, zero-assumption module descriptions
  • ๐Ÿ—๏ธ Modular Architecture (v0.3.1+): Clean, maintainable 6-module CLI design
  • ๐Ÿ”„ Adaptive Symbols (v0.2.0+): Dynamic symbol extraction (5-150 per file based on size)
  • ๐Ÿ“ˆ Technical Debt Analysis (v0.3.0+): Detect code quality issues and complexity metrics
  • ๐Ÿ” Symbol Indexing (v0.1.2+): Global symbol search and project-wide navigation
  • ๐Ÿงช Template-Based Test Generation (v0.14.0+): AI-assisted test generation with 88-91% time savings
    • YAML-driven specifications: Declarative language definitions
    • Jinja2 templating: Automated test code generation
    • 100% quality validation: Python syntax + language syntax checks
    • Community-friendly: Enable non-Python developers to contribute language support
  • ๐Ÿ›ฃ๏ธ Framework Route Extraction (v0.5.0+): Auto-detect and extract routes from web frameworks
    • ThinkPHP (v0.5.0+): Convention-based routing with line numbers and PHPDoc descriptions
    • Spring Boot (v0.8.0+): @GetMapping, @PostMapping, REST controllers with path variables
    • Laravel (v0.16.0): Explicit route definitions (Epic 17)
    • FastAPI (v0.16.0): Decorator-based routes (Epic 17)
    • Django (v0.16.0): URL patterns (Epic 17)
    • Express.js (v0.16.0): TypeScript/JavaScript routes (Epic 17)
  • ๐Ÿ“ AI Docstring Extraction (v0.4.0+, Epic 9): Multi-language documentation normalization
    • Hybrid mode: Selective AI processing (<$1 per 250 directories)
    • All-AI mode: Maximum quality for critical projects
    • Language support: PHP (PHPDoc + inline comments), Python (coming soon)
    • Mixed language: Normalize Chinese + English comments to clean English

๐Ÿ“ฆ Installation

codeindex uses lazy loading - language parsers are only imported when needed. Install only the languages you use to keep dependencies minimal.

Basic Installation (Core Only)

# Install core only (no language parsers)
pip install ai-codeindex

Language-Specific Installation

Install only the languages you need:

# Python projects
pip install ai-codeindex[python]

# PHP projects
pip install ai-codeindex[php]

# Java projects
pip install ai-codeindex[java]

# Multiple languages
pip install ai-codeindex[python,php]

# All languages
pip install ai-codeindex[all]

Using pipx (Recommended)

# All languages
pipx install ai-codeindex[all]

# Or specific languages
pipx install ai-codeindex[python,php]

From Source

git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[all]"  # Development mode with all languages

๐Ÿš€ Quick Start

1. Initialize Configuration

cd /your/project
codeindex init

This creates .codeindex.yaml in your project.

2. Configure AI CLI

Edit .codeindex.yaml:

# AI CLI command to use for generating documentation
ai_command: 'claude -p "{prompt}" --allowedTools "Read"'

# List of patterns to include for scanning
include:
  - src/

# List of patterns to exclude from scanning
exclude:
  - "**/test/**"
  - "**/__pycache__/**"

# Supported languages
languages:
  - python
  - php

# Output filename
output_file: "README_AI.md"

Other AI CLI examples:

# OpenAI
ai_command: 'openai chat "{prompt}" --model gpt-4'

# Gemini
ai_command: 'gemini "{prompt}"'

# Custom script
ai_command: '/path/to/my-ai-wrapper.sh "{prompt}"'

3. Scan a Directory

# Scan single directory
codeindex scan ./src/auth

# Preview prompt without executing
codeindex scan ./src/auth --dry-run

# Generate without AI (fallback mode)
codeindex scan ./src/auth --fallback

๐Ÿ’ก Pro Tip: When scanning web framework directories (like Application/Admin/Controller for ThinkPHP), codeindex automatically:

  • โœ… Detects the framework
  • โœ… Extracts routes with line numbers
  • โœ… Includes method descriptions from PHPDoc/docstrings
  • โœ… Generates route tables in README_AI.md

4. Batch Processing

# Scan all directories (generates SmartWriter READMEs)
codeindex scan-all

# Traditional batch processing (for AI-enhanced docs)
codeindex list-dirs | xargs -P 4 -I {} codeindex scan {}
codeindex list-dirs | parallel -j 4 codeindex scan {}

Example output:

๐Ÿ“ Generating READMEs (SmartWriter)...
โœ“ Application ( 50KB)
โœ“ Admin ( 20KB)
โœ“ api ( 15KB)
โ†’ Completed: 3/3 directories

5. Generate Structured Data (JSON)

NEW in v0.5.0: For tool integration (e.g., LoomGraph, custom scripts, CI/CD pipelines), generate machine-readable JSON output.

# Single directory
codeindex scan ./src --output json

# Entire project
codeindex scan-all --output json > parse_results.json

# View formatted JSON
codeindex scan ./src --output json | jq .

JSON Output Structure:

{
  "success": true,
  "results": [
    {
      "file": "src/parser.py",
      "symbols": [
        {
          "name": "Parser",
          "kind": "class",
          "signature": "class Parser:",
          "line_start": 15,
          "line_end": 120
        }
      ],
      "imports": [
        {"module": "pathlib", "names": ["Path"], "is_from": true}
      ],
      "error": null
    }
  ],
  "summary": {
    "total_files": 1,
    "total_symbols": 1,
    "total_imports": 1,
    "errors": 0
  }
}

Error Handling:

When errors occur, the JSON response includes structured error information:

{
  "success": false,
  "error": {
    "code": "DIRECTORY_NOT_FOUND",
    "message": "Directory does not exist: /path/to/dir",
    "detail": null
  },
  "results": [],
  "summary": {
    "total_files": 0,
    "errors": 1
  }
}

Use Cases:

  • ๐Ÿ”Œ Tool Integration: Feed parse results to visualization tools like LoomGraph
  • ๐Ÿค– CI/CD Pipelines: Validate code structure in automated workflows
  • ๐Ÿ“Š Analytics: Analyze codebase metrics across versions
  • ๐Ÿงช Testing: Verify expected code structure in tests

6. Parse Single Files

NEW in v0.13.0: Parse individual source files for loose coupling with downstream tools.

๐Ÿ’ก For LoomGraph Integration: See complete guide at docs/guides/loomgraph-integration.md

# Parse a Python file
codeindex parse src/auth/user.py

# Parse a PHP file
codeindex parse Application/Controller/User.php

# Parse a Java file
codeindex parse src/main/java/User.java

# Pretty print with jq
codeindex parse myfile.py | jq .

# Extract specific fields
codeindex parse myfile.py | jq '.symbols[] | {name, kind}'

JSON Output Structure (single file):

{
  "file_path": "src/auth/user.py",
  "language": "python",
  "symbols": [
    {
      "name": "User",
      "kind": "class",
      "signature": "class User:",
      "docstring": "User authentication model",
      "line_start": 10,
      "line_end": 50,
      "annotations": []
    }
  ],
  "imports": [
    {"module": "typing", "names": ["Dict"], "is_from": true, "alias": null}
  ],
  "namespace": "",
  "error": null
}

Exit Codes:

  • 0: Success (includes partial parse with errors)
  • 1: File not found or permission denied
  • 2: Unsupported language
  • 3: Parse error

Integration Example (with LoomGraph):

# Parse and pipe to downstream tool
codeindex parse myfile.py | loomgraph import --format codeindex

# Batch parse multiple files
find src/ -name "*.py" -exec codeindex parse {} \; | \
  jq -s '.' > all_symbols.json

See also:

  • Quick examples: examples/parse_integration_example.sh
  • For LoomGraph developers: See docs/guides/loomgraph-integration.md for detailed integration guide with Python/Node.js code examples

7. Check Status

codeindex status

Output:

Indexing Status
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
โœ… src/auth/
โœ… src/utils/
โš ๏ธ  src/api/ (no README_AI.md)
โœ… src/db/

Indexed: 3/4 (75%)

8. Generate Symbol Indexes (v0.1.2+)

Global symbol index - Find any class/function across your codebase:

# Generate PROJECT_SYMBOLS.md (global symbol index)
codeindex symbols

# Generate PROJECT_INDEX.md (module overview)
codeindex index

# Analyze git changes and affected directories
codeindex affected --since HEAD~5 --until HEAD
codeindex affected --json  # For scripting/CI

What you get:

PROJECT_SYMBOLS.md provides:

  • Quick class/function lookup across all files
  • Cross-file references and imports
  • Symbol locations with line numbers
  • Grouped by directory

PROJECT_INDEX.md provides:

  • Module overview with descriptions
  • Directory structure
  • Entry points and CLI commands
  • Generated from README_AI.md files

Affected analysis helps with incremental updates:

  • Shows which directories changed in git commits
  • Suggests which README_AI.md files need regeneration
  • JSON output for CI/CD integration

9. Analyze Technical Debt (v0.3.0+)

NEW in v0.3.0: Detect code quality issues and technical debt patterns.

# Analyze directory for technical debt
codeindex tech-debt ./src

# Output formats
codeindex tech-debt ./src --format console   # Human-readable (default)
codeindex tech-debt ./src --format markdown  # Documentation
codeindex tech-debt ./src --format json      # API/scripting

# Save to file
codeindex tech-debt ./src --output debt_report.md

# Recursive analysis
codeindex tech-debt ./src --recursive

# Quiet mode (minimal output)
codeindex tech-debt ./src --quiet

What it detects:

  • ๐Ÿ”ด Super large files (>5000 lines) - CRITICAL
  • ๐ŸŸก Large files (>2000 lines) - HIGH
  • ๐Ÿ”ด God Classes (>50 methods) - CRITICAL
  • ๐ŸŸก Symbol overload (>100 symbols) - CRITICAL
  • ๐ŸŸ  High noise ratio (>50% low-quality symbols) - HIGH

Example output:

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
  Technical Debt Report
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Summary
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
Files analyzed: 15
Issues found: 3
Quality Score: 78.3/100

Severity Breakdown
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
CRITICAL: 1
HIGH: 2
MEDIUM: 0
LOW: 0

File Details
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”

๐Ÿ“„ src/models/user.py (Quality: 70.0)
  ๐Ÿ”ด CRITICAL - super_large_file
     File has 6000 lines (threshold: 5000)
     โ†’ Split into 3-5 smaller files

10. Generate Test Suite for New Languages (v0.14.0+)

NEW in v0.14.0: Use the template-based test generation system to quickly add language support.

cd test_generator

# Create language specification (or copy template)
cp specs/_template.yaml specs/go.yaml
# Edit go.yaml with Go code examples

# Generate tests automatically
python generator.py \
  --spec specs/go.yaml \
  --template templates/inheritance_test.py.j2 \
  --output test_go_inheritance.py

# Validate generated code
python -m py_compile test_go_inheritance.py  # Python syntax
# Review Go code syntax manually

# Output: 500-700 lines of high-quality test code in 5 minutes!

Benefits:

  • โฑ๏ธ 88-91% faster than manual test writing
  • โœ… 100% syntax correctness (automated validation)
  • ๐ŸŒ Language-agnostic (just provide code examples in YAML)
  • ๐Ÿค Community-friendly (non-Python developers can contribute)

Example output:

โœ… Loaded spec: Go (extension: .go)
โœ… Loaded template: inheritance_test.py.j2
๐Ÿ”ง Generating Go tests...
โœ… Code validation passed

โœ… Generated test file:
   File: test_go_inheritance.py
   Lines: 587
   Test classes: 7
   Test methods: 22

See CONTRIBUTING_LANGUAGE_SUPPORT.md for complete guide.

11. Framework Route Extraction (v0.5.0+)

NEW in v0.5.0: Automatically detect and extract routes from web frameworks with line numbers and descriptions.

codeindex automatically identifies web frameworks and extracts route information when scanning Controller/View directories. Routes are displayed as beautiful markdown tables in your README_AI.md files.

Supported Frameworks

Framework Language Status Features
ThinkPHP PHP โœ… Stable Line numbers, PHPDoc descriptions, module-based routing
Laravel PHP ๐Ÿ”„ Coming v0.6.0 Named routes, route groups, middleware
FastAPI Python ๐Ÿ”„ Coming v0.6.0 Path operations, dependencies, tags
Django Python ๐Ÿ”„ Coming v0.6.0 URL patterns, namespaces, view classes

Example Output

ThinkPHP Controller (Application/Admin/Controller/UserController.php):

class UserController {
    /**
     * Get user list with pagination
     */
    public function index() {
        // ...
    }

    /**
     * ๅˆ›ๅปบๆ–ฐ็”จๆˆท
     */
    public function create() {
        // ...
    }
}

Generated Route Table in README_AI.md:

## Routes (ThinkPHP)

| URL | Controller | Action | Location | Description |
|-----|------------|--------|----------|-------------|
| `/admin/user/index` | UserController | index | `UserController.php:12` | Get user list with pagination |
| `/admin/user/create` | UserController | create | `UserController.php:20` | ๅˆ›ๅปบๆ–ฐ็”จๆˆท |

How It Works

  1. Auto-Detection: Scans directory structure to detect web frameworks
  2. Symbol Extraction: Parses controllers/views using tree-sitter
  3. Route Inference: Applies framework-specific routing conventions
  4. Documentation Extraction: Extracts docstrings/PHPDoc comments
  5. Table Generation: Formats as markdown table in README_AI.md

Features:

  • โœ… Line Numbers: Clickable file:line locations
  • โœ… Descriptions: From PHPDoc/docstrings (auto-truncated to 60 chars)
  • โœ… Multi-language: Supports Chinese and English descriptions
  • โœ… Smart Filtering: Only public methods, excludes magic methods
  • โœ… Zero Configuration: Just scan, routes auto-appear

Usage

# Routes are automatically extracted when scanning
codeindex scan-all

# Or scan specific controller directory
codeindex scan ./Application/Admin/Controller

No configuration needed! Routes are detected and extracted automatically.

For Developers

Want to add support for your favorite framework? See CLAUDE.md for the complete developer guide on creating custom route extractors.


๐Ÿ“‹ Recent Updates

Current version: v0.15.1

Key Features

  • ๐Ÿงช Template-Based Test Generation (v0.14.0): AI-assisted test generation system
    • 88-91% time savings (11-17 hours โ†’ ~2 hours per language)
    • YAML-driven specifications: Declarative language definitions
    • Jinja2 templating: Automated test code generation
    • 100% quality validation: TypeScript tests ready (25 methods)
    • Community-friendly: Enable non-Python developers to contribute
  • ๐Ÿ”— Call Relationship Extraction (v0.12.0): Function/method call graphs and dependency analysis
  • ๐Ÿ›ฃ๏ธ Framework Route Extraction: Auto-detect routes from ThinkPHP and Spring frameworks
  • ๐Ÿค– AI Docstring Extraction: Multi-language documentation normalization (PHP, Python)
  • ๐ŸŽฏ KISS Universal Descriptions: Language-agnostic module summaries with actual symbol names
  • ๐Ÿ“Š Technical Debt Analysis: Detect code quality issues and complexity metrics
  • ๐Ÿš€ Automated Release Workflow: One-command releases with GitHub Actions + PyPI Trusted Publisher

Latest Improvements (v0.14.0)

  • โœ… Interactive Setup Wizard with smart auto-detection
  • โœ… Makefile automation for development and releases
  • โœ… Git hooks for code quality (pre-commit, post-commit, pre-push)
  • โœ… Modular CLI architecture (6 focused modules)
  • โœ… Adaptive symbol extraction (5-150 symbols per file)
  • โœ… Parallel scanning for faster indexing

See: CHANGELOG.md for complete version history


๐Ÿ“– Documentation

User Guides

Developer Guides

Planning


โš™๏ธ Configuration Reference

Complete .codeindex.yaml

codeindex: 1

# AI CLI command (required)
ai_command: 'claude -p "{prompt}" --allowedTools "Read"'

# Directory patterns
include:
  - src/                # Include all subdirectories recursively
  - modules/

exclude:
  - "**/test/**"
  - "**/__pycache__/**"
  - "**/node_modules/**"

# Language support
languages:
  - python
  - php

# Output settings
output_file: "README_AI.md"
parallel_workers: 8
batch_size: 50

# Smart indexing (generates tiered documentation)
indexing:
  max_readme_size: 51200
  root_level: "overview"
  module_level: "navigation"
  leaf_level: "detailed"

# Adaptive symbol extraction (v0.2.0+)
symbols:
  adaptive_symbols:
    enabled: true           # Enable dynamic symbol limits based on file size
    min_symbols: 5          # Minimum symbols for tiny files
    max_symbols: 150        # Maximum symbols for huge files
    thresholds:             # File size thresholds (lines)
      tiny: 100             # <100 lines โ†’ 5 symbols
      small: 500            # 100-500 lines โ†’ 15 symbols
      medium: 1500          # 500-1500 lines โ†’ 30 symbols
      large: 3000           # 1500-3000 lines โ†’ 50 symbols
      xlarge: 5000          # 3000-5000 lines โ†’ 80 symbols
      huge: 8000            # 5000-8000 lines โ†’ 120 symbols
      mega: null            # >8000 lines โ†’ 150 symbols
    limits:                 # Symbol limits per category
      tiny: 5
      small: 15
      medium: 30
      large: 50
      xlarge: 80
      huge: 120
      mega: 150

# Incremental updates
incremental:
  enabled: true
  thresholds:
    skip_lines: 5
    current_only: 50
    suggest_full: 200

# Git Hooks configuration (v0.7.0+, Story 6)
hooks:
  post_commit:
    mode: auto            # auto | disabled | async | sync | prompt
    max_dirs_sync: 2      # Auto mode: โ‰ค2 dirs = sync, >2 = async
    enabled: true         # Master switch
    log_file: ~/.codeindex/hooks/post-commit.log

Hooks Modes:

  • auto (default): Smart detection based on project size
  • disabled: Completely disabled
  • async: Always non-blocking (background updates)
  • sync: Always blocking (immediate updates)
  • prompt: Reminder only, no auto-execution

See Git Hooks Integration Guide for detailed configuration.


๐Ÿค– Claude Code Integration

codeindex generates README_AI.md files that are perfect for Claude Code to understand your project architecture. By adding a CLAUDE.md file to your project, you can guide Claude Code to use these indexes effectively.

Why Use CLAUDE.md?

Without guidance, Claude Code might:

  • โŒ Blindly search through all source files (slow and inefficient)
  • โŒ Miss important architectural context
  • โŒ Use Glob/Grep instead of semantic understanding

With CLAUDE.md, Claude Code will:

  • โœ… Read README_AI.md files first (fast and structured)
  • โœ… Understand your project architecture before diving into code
  • โœ… Use Serena MCP tools for precise symbol navigation

Quick Setup

1. Copy the template to your project:

# After running codeindex scan-all
cp examples/CLAUDE.md.template CLAUDE.md

2. Customize the project-specific sections:

Edit the "Project Specific Configuration" section in your CLAUDE.md to document your project structure, key components, and development guidelines.

3. Commit and push:

git add CLAUDE.md README_AI.md **/README_AI.md
git commit -m "docs: add Claude Code integration"

What's Included in the Template

The template includes guidance for Claude Code to:

  1. Prioritize README_AI.md files when understanding architecture
  2. Use Serena MCP tools (find_symbol, find_referencing_symbols) for precise navigation
  3. Follow a structured workflow: README โ†’ find_symbol โ†’ read source โ†’ analyze dependencies
  4. Avoid inefficient patterns like Glob/Grep searches

Example Workflow

After setup, when you ask Claude Code about your project:

โŒ Without CLAUDE.md:
You: "Where is the authentication module?"
Claude: [Uses Glob to search for "auth*"]
        [Scans 50 files, wastes time]

โœ… With CLAUDE.md:
You: "Where is the authentication module?"
Claude: [Reads /src/README_AI.md]
        [Reads /src/auth/README_AI.md]
        "The authentication module is in src/auth/authenticator.py:15
         with UserAuthenticator class..."

Advanced Integration: MCP Skills

codeindex also includes MCP skills for Claude Code:

Skill Description
/mo:arch Query code architecture using README_AI.md indexes
/mo:index Generate repository index with codeindex

Install skills:

# Navigate to codeindex directory
cd /path/to/codeindex

# Run install script
./skills/install.sh

For Git Hooks Users (v0.5.0+)

If you're using codeindex Git Hooks, help your AI Code CLI understand how hooks work:

Method 1: Let AI Code read the guide โญ๏ธ (Recommended)

# In your project directory, run:
codeindex docs show-ai-guide

Then tell your AI:

User: "Read the output above and update my CLAUDE.md with Git Hooks documentation"
AI Code: [Reads the guide]
         [Understands Git Hooks]
         [Updates your CLAUDE.md/AGENTS.md]
         โœ… Done!

Method 2: Direct AI integration

User: "Help my AI CLI understand codeindex Git Hooks"
AI Code: [User runs: codeindex docs show-ai-guide]
         [AI reads output]
         [Updates CLAUDE.md with Git Hooks section]
         โœ… Done! Future AI sessions will know about hooks.

What the guide contains:

  • Complete Git Hooks functionality explanation
  • Pre-commit and post-commit behaviors
  • Ready-to-use section template for your CLAUDE.md
  • Troubleshooting and common scenarios
  • Expected behaviors (auto-commits are normal!)

Why this matters: Your AI CLI needs to know that post-commit will create auto-commits (normal behavior) and that lint failures will block commits (by design).

Full Documentation


๐ŸŽฏ Use Cases

๐Ÿ“š Code Understanding

Generate comprehensive documentation for legacy codebases to help new developers onboard faster.

๐Ÿ” Codebase Navigation

Create structured overviews of large projects (10,000+ files) for efficient exploration.

๐Ÿค– AI Agent Integration

Use generated indexes with tools like Claude Code or Cursor for better code context.

๐Ÿ“ Living Documentation

Keep documentation up-to-date by regenerating README_AI.md files as code changes.


๐Ÿ› ๏ธ How It Works

Code Parsing & Documentation

Directory โ†’ Scanner โ†’ Parser (tree-sitter) โ†’ Smart Writer โ†’ README_AI.md (โ‰ค50KB)
  1. Scanner: Walks directories, filters by config patterns
  2. Parser: Extracts symbols (classes, functions, imports) using tree-sitter
  3. Smart Writer: Generates tiered documentation with size limits
  4. Output: Optimized README_AI.md for AI consumption

Test Generation (v0.14.0+)

Language Spec (YAML) โ†’ Jinja2 Template โ†’ Python Generator โ†’ Test File (500-700 lines)
                โ†“                                              โ†“
         Code Examples                                  Validation (100%)
         Expected Results                               Python + Target Language
  1. YAML Specification: Define language syntax patterns and test scenarios
  2. Jinja2 Template: Reusable test code template
  3. Generator: Automated test file creation with validation
  4. Output: High-quality pytest test suite

Key Innovation: Separate test definition (YAML) from test implementation (Python), enabling non-Python developers to contribute language support.


๐Ÿ“ Smart Indexing Architecture

codeindex generates tiered documentation optimized for AI agents:

Project Root/
โ”œโ”€โ”€ PROJECT_INDEX.md (~10KB)     # Overview level
โ”‚   โ””โ”€โ”€ Module list + descriptions
โ”‚
โ”œโ”€โ”€ Module/
โ”‚   โ””โ”€โ”€ README_AI.md (~30KB)     # Navigation level
โ”‚       โ”œโ”€โ”€ Grouped files by type
โ”‚       โ””โ”€โ”€ Key classes summary
โ”‚
โ””โ”€โ”€ LeafDir/
    โ””โ”€โ”€ README_AI.md (โ‰ค50KB)     # Detailed level
        โ”œโ”€โ”€ Full symbol info
        โ””โ”€โ”€ Dependencies

Configuration

indexing:
  max_readme_size: 51200    # 50KB limit
  symbols:
    max_per_file: 15
    include_visibility: [public, protected]
    exclude_patterns: ["get*", "set*"]
  grouping:
    by: suffix
    patterns:
      Controller: "HTTP handlers"
      Service: "Business logic"
      Model: "Data models"

๐Ÿค– AI Coder Integration

For Claude Code Users

Add this to your project's CLAUDE.md:

## Code Index

This project uses codeindex for AI-friendly documentation.

### How to Read Code Index

1. **Start with overview**: Read `PROJECT_INDEX.md` or root `README_AI.md` to understand project structure
2. **Locate module**: Find the relevant module from the module list
3. **Deep dive**: Read module's `README_AI.md` for file/symbol details
4. **Read source**: Open specific files when you need implementation details

### Index Files

- `README_AI.md` - Directory-level documentation (โ‰ค50KB each)
- Each directory with source code has its own README_AI.md

### Example Workflow

Task: "Fix user authentication bug"
1. Read root README_AI.md โ†’ Find Auth/User module
2. Read Auth/README_AI.md โ†’ Find AuthService.php
3. Read AuthService.php โ†’ Understand implementation

Usage Tips

  • Token efficient: Each README is โ‰ค50KB, suitable for LLM context
  • Progressive loading: Start from overview, drill down as needed
  • Keep indexes updated: Run codeindex scan-all --fallback after major changes

CLAUDE.md Template

Copy the template to your project:

cp /path/to/codeindex/examples/CLAUDE.md.template your-project/CLAUDE.md

Or see examples/CLAUDE.md.template for the full template.


๐Ÿ”— Integration with LoomGraph

codeindex and LoomGraph work together as complementary tools:

Architecture

codeindex (AST Parser)
    โ†“ Structured Data (JSON)
LoomGraph (Knowledge Graph + AI)
    โ†“ Insights & Analysis
Applications (IDE, CI/CD, Team Tools)

Division of Responsibilities

Tool Focus Key Features
codeindex Code Parsing AST extraction, symbol extraction, call/inheritance relationships, multi-language support
LoomGraph AI Analysis Knowledge graph, vector embeddings, semantic search, refactoring suggestions, team collaboration

What codeindex Provides

  • โœ… Structured code data (symbols, calls, imports, inheritance)
  • โœ… Multi-language support (Python, PHP, Java, TypeScript, Go, Rust, C#)
  • โœ… Framework awareness (ThinkPHP, Spring, Laravel, FastAPI routes)
  • โœ… JSON output for downstream tools (codeindex parse, codeindex scan --output json)

What LoomGraph Adds

  • ๐Ÿ” Code similarity search (vector embeddings + semantic search)
  • ๐Ÿค– Automated refactoring suggestions (graph analysis + AI)
  • ๐Ÿ‘ฅ Team collaboration (shared knowledge graphs)
  • ๐Ÿ”Œ IDE integration (LSP server for real-time features)

Integration Guide

See docs/guides/loomgraph-integration.md or FOR_LOOMGRAPH.md for complete integration examples.

Quick Example:

# Parse a file and get JSON output
codeindex parse myfile.py | jq .

# Parse all files in a directory
codeindex scan ./src --output json > parse_results.json

# LoomGraph consumes this JSON to build knowledge graph

Why This Separation?

  1. Single Responsibility: codeindex focuses on parsing, LoomGraph focuses on AI
  2. Independent Evolution: Each tool can evolve without affecting the other
  3. Flexible Integration: Use codeindex alone or with LoomGraph
  4. Performance: Lightweight parser vs. heavyweight graph+AI system

๐ŸŒ Language Support

Language Status Version Features
Python โœ… Supported v0.1.0+ Classes, functions, methods, imports, docstrings, inheritance, calls
PHP โœ… Supported v0.5.0+ Classes (extends/implements), methods (visibility, static, return types), properties, functions, inheritance, calls
Java โœ… Supported v0.7.0+ Classes, interfaces, enums, records, annotations, Spring routes, Lombok, inheritance, calls
TypeScript/JS ๐Ÿงช Tests Ready v0.14.0 Classes, functions, React components, JSDoc (Epic 15) - Parser implementation in progress
Go ๐Ÿ“‹ Planned v0.15.0 Packages, interfaces, struct methods (Epic 16)
Rust ๐Ÿ“‹ Planned v0.17.0 Structs, traits, modules (Epic 19)
C# ๐Ÿ“‹ Planned v0.18.0 Classes, interfaces, .NET projects

๐ŸŽฏ Test Architecture (v0.14.0+)

codeindex uses a template-based test generation system to accelerate language support development:

  • YAML Language Specifications: Declarative syntax patterns and test scenarios
  • Jinja2 Templates: Automated Python test code generation
  • Quality Validation: 100% syntax correctness for both Python and target language
  • Time Savings: 88-91% reduction (11-17 hours โ†’ ~2 hours per language)

Current test coverage:

  • โœ… Python: 50+ test methods (hand-written)
  • โœ… PHP: 30+ test methods (hand-written)
  • โœ… Java: 60+ test methods (hand-written)
  • โœ… TypeScript: 25 test methods (template-generated, 100% quality)

Want to contribute a new language? See Contributing Language Support below.


๐Ÿค Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

๐Ÿš€ Quick Start for Contributors

# Clone and install
git clone https://github.com/dreamlx/codeindex.git
cd codeindex

# Install with dev dependencies
make install-dev
# or: pip install -e ".[dev,all]"

# Install Git hooks (pre-push checks)
make install-hooks

# Run tests
make test
# or: pytest

# Lint and auto-fix
make lint-fix
# or: ruff check --fix src/

# See all available commands
make help

๐ŸŒŸ Contributing Language Support

Want to add support for Go, Rust, C++, or other languages? You don't need to know Python!

We use a template-based test generation system that lets you contribute by only knowing your target language:

Quick Start (1-3 hours total)

  1. Create YAML specification (1-2 hours)

    cd test_generator/specs
    cp _template.yaml <language>.yaml
    # Fill in code examples in your language
    
  2. Generate tests (5 minutes)

    python generator.py \
      --spec specs/<language>.yaml \
      --template templates/inheritance_test.py.j2 \
      --output test_<language>_inheritance.py
    
  3. Review and submit PR (30-60 minutes)

    • Verify Python syntax: python -m py_compile test_*.py
    • Verify your language syntax (manual review)
    • Submit PR with both YAML and generated test file

What You Need

  • โœ… Familiarity with target language (Go/Rust/C++/etc.)
  • โœ… Ability to write code examples in that language
  • โœ… 1-3 hours of time
  • โŒ NO Python knowledge required!

What You'll Create

  • YAML file: 20-30 code templates with expected parsing results
  • Test file: Auto-generated Python tests (you just review)

Quality Standards

  • Minimum: 6 test classes, 15 test methods
  • Target: 8 test classes, 25+ test methods
  • Validation: 100% Python syntax + 100% target language syntax

Examples

  • TypeScript: See test_generator/specs/typescript.yaml (351 lines, 28 templates)
  • Template: See test_generator/specs/_template.yaml (fully documented starter)

Full Guide

See CONTRIBUTING_LANGUAGE_SUPPORT.md for:

  • Detailed step-by-step instructions
  • YAML specification guide
  • PR template and checklist
  • FAQ and troubleshooting

Current recruitment: ๐Ÿ”ฅ Go, Rust, C++, C#, Ruby, Kotlin

๐Ÿ“š Developer Documentation

๐ŸŽฏ Release Process (Maintainers)

# Automated one-command release
make release VERSION=0.13.0

# GitHub Actions will automatically:
# โœ… Run tests on Python 3.10, 3.11, 3.12
# โœ… Build and publish to PyPI
# โœ… Create GitHub Release

# See: docs/development/QUICK_START_RELEASE.md

๐Ÿ“Š Roadmap

See Strategic Roadmap for detailed plans.

Completed (v0.14.0):

  • โœ… Python, PHP, Java language support (with LoomGraph integration)
  • โœ… Single file parse command (loose coupling with downstream tools)
  • โœ… Parser modularization (3622โ†’374 lines refactoring)
  • โœ… Windows platform compatibility (UTF-8 + path optimization)
  • โœ… Call relationships extraction (Python/Java/PHP)
  • โœ… Framework routes (ThinkPHP, Spring Boot)
  • โœ… Interactive Setup Wizard (codeindex init)

In Progress (v0.15.0):

  • ๐Ÿ”„ Template-based test generation system (Epic 18)
  • ๐Ÿ”„ Test architecture migration (Python/PHP/Java โ†’ YAML specs)

Next (v0.16.0 - v0.18.0):

  • ๐Ÿ“‹ Framework routes expansion: Express, Laravel, FastAPI, Django (v0.16.0, Epic 17)
  • ๐Ÿ“‹ Rust language support (v0.17.0, Epic 19)
  • ๐Ÿ“‹ C# language support (v0.18.0)

Not Included (Moved to LoomGraph):

  • โŒ Code similarity search โ†’ LoomGraph v0.3.0
  • โŒ Automated refactoring suggestions โ†’ LoomGraph v0.4.0
  • โŒ Team collaboration features โ†’ LoomGraph v0.5.0
  • โŒ IDE deep integration (LSP server) โ†’ LoomGraph v0.6.0

Reason: codeindex focuses on code parsing (AST โ†’ structured data), while LoomGraph focuses on AI analysis (structured data โ†’ knowledge graph โ†’ insights).


๐Ÿ“„ License

MIT License - see LICENSE file for details.


๐Ÿ™ Acknowledgments

  • tree-sitter - Fast, incremental parsing
  • Claude CLI - AI integration inspiration
  • All contributors and users

๐Ÿ“ž Support


โญ Star History

If you find codeindex useful, please star the repository to show your support!

Star History Chart


Made with โค๏ธ by the codeindex team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_codeindex-0.15.1.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_codeindex-0.15.1-py3-none-any.whl (145.7 kB view details)

Uploaded Python 3

File details

Details for the file ai_codeindex-0.15.1.tar.gz.

File metadata

  • Download URL: ai_codeindex-0.15.1.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_codeindex-0.15.1.tar.gz
Algorithm Hash digest
SHA256 da82c92c74418a93869e73f8b8a7c55d0690015e436348321a744c0cf011380f
MD5 87e1d1b8e4f97dbedf11e5fec4fd54e2
BLAKE2b-256 6bd6b0e88643eb18ce8ed9b5646f88ffdd795c538ead202f466764e5568bc0f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_codeindex-0.15.1.tar.gz:

Publisher: publish.yml on dreamlx/codeindex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_codeindex-0.15.1-py3-none-any.whl.

File metadata

  • Download URL: ai_codeindex-0.15.1-py3-none-any.whl
  • Upload date:
  • Size: 145.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_codeindex-0.15.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6edeb2d36d89419d4be39da84b83929295b8abfa9ae53a0dbbeb78773414c0ce
MD5 1954b83c298ff5d05d8aaf6b21d44033
BLAKE2b-256 391708584fbe7388854f5092d8f6ddd6f5d59638c2333ddf0b1840a9eebb65c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_codeindex-0.15.1-py3-none-any.whl:

Publisher: publish.yml on dreamlx/codeindex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page