AI-native code indexing tool for large codebases

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dreamlinx

These details have not been verified by PyPI

Project description

codeindex

Universal Code Parser - Best-in-class multi-language AST parser for AI-assisted development.

codeindex focuses on code parsing and structured data extraction using tree-sitter. It extracts symbols, inheritance relationships, call relationships, and imports from Python, PHP, Java (and more languages coming). Perfect for feeding structured code data to AI tools, knowledge graphs, and code intelligence platforms.

🤝 For LoomGraph Developers: Looking to integrate codeindex for code parsing? Start here:

Quick Start: FOR_LOOMGRAPH.md (5 min read)

Complete Guide: docs/guides/loomgraph-integration.md (20 min, with code examples)

✨ Features

🚀 AI-Powered Documentation: Generate comprehensive README files using Claude, GPT, or any AI CLI
🌳 Tree-sitter Parsing: Accurate symbol extraction (classes, functions, methods, imports) for Python, PHP & Java
📄 Single File Parse (v0.13.0+): Parse individual files with JSON output for loose coupling with downstream tools
⚡ Parallel Scanning: Scan multiple directories concurrently for fast indexing
🎯 Smart Filtering: Include/exclude patterns with glob support
🔧 Flexible Integration: Works with any AI CLI tool via configurable commands
📊 Coverage Tracking: Check which directories have been indexed
🎨 Fallback Mode: Generate basic documentation without AI
🎯 KISS Universal Description (v0.4.0+): Language-agnostic, zero-assumption module descriptions
🏗️ Modular Architecture (v0.3.1+): Clean, maintainable 6-module CLI design
🔄 Adaptive Symbols (v0.2.0+): Dynamic symbol extraction (5-150 per file based on size)
📈 Technical Debt Analysis (v0.3.0+): Detect code quality issues and complexity metrics
🔍 Symbol Indexing (v0.1.2+): Global symbol search and project-wide navigation
🧪 Template-Based Test Generation (v0.14.0+): AI-assisted test generation with 88-91% time savings
- YAML-driven specifications: Declarative language definitions
- Jinja2 templating: Automated test code generation
- 100% quality validation: Python syntax + language syntax checks
- Community-friendly: Enable non-Python developers to contribute language support
🛣️ Framework Route Extraction (v0.5.0+): Auto-detect and extract routes from web frameworks
- ThinkPHP (v0.5.0+): Convention-based routing with line numbers and PHPDoc descriptions
- Spring Boot (v0.8.0+): @GetMapping, @PostMapping, REST controllers with path variables
- Laravel (v0.16.0): Explicit route definitions (Epic 17)
- FastAPI (v0.16.0): Decorator-based routes (Epic 17)
- Django (v0.16.0): URL patterns (Epic 17)
- Express.js (v0.16.0): TypeScript/JavaScript routes (Epic 17)
📝 AI Docstring Extraction (v0.4.0+, Epic 9): Multi-language documentation normalization
- Hybrid mode: Selective AI processing (<$1 per 250 directories)
- All-AI mode: Maximum quality for critical projects
- Language support: PHP (PHPDoc + inline comments), Python (coming soon)
- Mixed language: Normalize Chinese + English comments to clean English

📦 Installation

codeindex uses lazy loading - language parsers are only imported when needed. Install only the languages you use to keep dependencies minimal.

Basic Installation (Core Only)

# Install core only (no language parsers)
pip install ai-codeindex

Language-Specific Installation

Install only the languages you need:

# Python projects
pip install ai-codeindex[python]

# PHP projects
pip install ai-codeindex[php]

# Java projects
pip install ai-codeindex[java]

# Multiple languages
pip install ai-codeindex[python,php]

# All languages
pip install ai-codeindex[all]

Using pipx (Recommended)

# All languages
pipx install ai-codeindex[all]

# Or specific languages
pipx install ai-codeindex[python,php]

From Source

git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[all]"  # Development mode with all languages

🚀 Quick Start

1. Initialize Configuration

cd /your/project
codeindex init

This creates .codeindex.yaml in your project.

2. Configure AI CLI

Edit .codeindex.yaml:

# AI CLI command to use for generating documentation
ai_command: 'claude -p "{prompt}" --allowedTools "Read"'

# List of patterns to include for scanning
include:
  - src/

# List of patterns to exclude from scanning
exclude:
  - "**/test/**"
  - "**/__pycache__/**"

# Supported languages
languages:
  - python
  - php

# Output filename
output_file: "README_AI.md"

Other AI CLI examples:

# OpenAI
ai_command: 'openai chat "{prompt}" --model gpt-4'

# Gemini
ai_command: 'gemini "{prompt}"'

# Custom script
ai_command: '/path/to/my-ai-wrapper.sh "{prompt}"'

3. Scan a Directory

# Scan single directory
codeindex scan ./src/auth

# Preview prompt without executing
codeindex scan ./src/auth --dry-run

# Generate without AI (fallback mode)
codeindex scan ./src/auth --fallback

💡 Pro Tip: When scanning web framework directories (like Application/Admin/Controller for ThinkPHP), codeindex automatically:

✅ Detects the framework
✅ Extracts routes with line numbers
✅ Includes method descriptions from PHPDoc/docstrings
✅ Generates route tables in README_AI.md

4. Batch Processing

# Scan all directories (generates SmartWriter READMEs)
codeindex scan-all

# Traditional batch processing (for AI-enhanced docs)
codeindex list-dirs | xargs -P 4 -I {} codeindex scan {}
codeindex list-dirs | parallel -j 4 codeindex scan {}

Example output:

📝 Generating READMEs (SmartWriter)...
✓ Application ( 50KB)
✓ Admin ( 20KB)
✓ api ( 15KB)
→ Completed: 3/3 directories

5. Generate Structured Data (JSON)

NEW in v0.5.0: For tool integration (e.g., LoomGraph, custom scripts, CI/CD pipelines), generate machine-readable JSON output.

# Single directory
codeindex scan ./src --output json

# Entire project
codeindex scan-all --output json > parse_results.json

# View formatted JSON
codeindex scan ./src --output json | jq .

JSON Output Structure:

{
  "success": true,
  "results": [
    {
      "file": "src/parser.py",
      "symbols": [
        {
          "name": "Parser",
          "kind": "class",
          "signature": "class Parser:",
          "line_start": 15,
          "line_end": 120
        }
      ],
      "imports": [
        {"module": "pathlib", "names": ["Path"], "is_from": true}
      ],
      "error": null
    }
  ],
  "summary": {
    "total_files": 1,
    "total_symbols": 1,
    "total_imports": 1,
    "errors": 0
  }
}

Error Handling:

When errors occur, the JSON response includes structured error information:

{
  "success": false,
  "error": {
    "code": "DIRECTORY_NOT_FOUND",
    "message": "Directory does not exist: /path/to/dir",
    "detail": null
  },
  "results": [],
  "summary": {
    "total_files": 0,
    "errors": 1
  }
}

Use Cases:

🔌 Tool Integration: Feed parse results to visualization tools like LoomGraph
🤖 CI/CD Pipelines: Validate code structure in automated workflows
📊 Analytics: Analyze codebase metrics across versions
🧪 Testing: Verify expected code structure in tests

6. Parse Single Files

NEW in v0.13.0: Parse individual source files for loose coupling with downstream tools.

💡 For LoomGraph Integration: See complete guide at docs/guides/loomgraph-integration.md

# Parse a Python file
codeindex parse src/auth/user.py

# Parse a PHP file
codeindex parse Application/Controller/User.php

# Parse a Java file
codeindex parse src/main/java/User.java

# Pretty print with jq
codeindex parse myfile.py | jq .

# Extract specific fields
codeindex parse myfile.py | jq '.symbols[] | {name, kind}'

JSON Output Structure (single file):

{
  "file_path": "src/auth/user.py",
  "language": "python",
  "symbols": [
    {
      "name": "User",
      "kind": "class",
      "signature": "class User:",
      "docstring": "User authentication model",
      "line_start": 10,
      "line_end": 50,
      "annotations": []
    }
  ],
  "imports": [
    {"module": "typing", "names": ["Dict"], "is_from": true, "alias": null}
  ],
  "namespace": "",
  "error": null
}

Exit Codes:

0: Success (includes partial parse with errors)
1: File not found or permission denied
2: Unsupported language
3: Parse error

Integration Example (with LoomGraph):

# Parse and pipe to downstream tool
codeindex parse myfile.py | loomgraph import --format codeindex

# Batch parse multiple files
find src/ -name "*.py" -exec codeindex parse {} \; | \
  jq -s '.' > all_symbols.json

See also:

Quick examples: examples/parse_integration_example.sh
For LoomGraph developers: See docs/guides/loomgraph-integration.md for detailed integration guide with Python/Node.js code examples

7. Check Status

codeindex status

Output:

Indexing Status
───────────────────────────────────────
✅ src/auth/
✅ src/utils/
⚠️  src/api/ (no README_AI.md)
✅ src/db/

Indexed: 3/4 (75%)

8. Generate Symbol Indexes (v0.1.2+)

Global symbol index - Find any class/function across your codebase:

# Generate PROJECT_SYMBOLS.md (global symbol index)
codeindex symbols

# Generate PROJECT_INDEX.md (module overview)
codeindex index

# Analyze git changes and affected directories
codeindex affected --since HEAD~5 --until HEAD
codeindex affected --json  # For scripting/CI

What you get:

PROJECT_SYMBOLS.md provides:

Quick class/function lookup across all files
Cross-file references and imports
Symbol locations with line numbers
Grouped by directory

PROJECT_INDEX.md provides:

Module overview with descriptions
Directory structure
Entry points and CLI commands
Generated from README_AI.md files

Affected analysis helps with incremental updates:

Shows which directories changed in git commits
Suggests which README_AI.md files need regeneration
JSON output for CI/CD integration

9. Analyze Technical Debt (v0.3.0+)

NEW in v0.3.0: Detect code quality issues and technical debt patterns.

# Analyze directory for technical debt
codeindex tech-debt ./src

# Output formats
codeindex tech-debt ./src --format console   # Human-readable (default)
codeindex tech-debt ./src --format markdown  # Documentation
codeindex tech-debt ./src --format json      # API/scripting

# Save to file
codeindex tech-debt ./src --output debt_report.md

# Recursive analysis
codeindex tech-debt ./src --recursive

# Quiet mode (minimal output)
codeindex tech-debt ./src --quiet

What it detects:

🔴 Super large files (>5000 lines) - CRITICAL
🟡 Large files (>2000 lines) - HIGH
🔴 God Classes (>50 methods) - CRITICAL
🟡 Symbol overload (>100 symbols) - CRITICAL
🟠 High noise ratio (>50% low-quality symbols) - HIGH

Example output:

══════════════════════════════════════
  Technical Debt Report
══════════════════════════════════════

Summary
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Files analyzed: 15
Issues found: 3
Quality Score: 78.3/100

Severity Breakdown
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CRITICAL: 1
HIGH: 2
MEDIUM: 0
LOW: 0

File Details
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📄 src/models/user.py (Quality: 70.0)
  🔴 CRITICAL - super_large_file
     File has 6000 lines (threshold: 5000)
     → Split into 3-5 smaller files

10. Generate Test Suite for New Languages (v0.14.0+)

NEW in v0.14.0: Use the template-based test generation system to quickly add language support.

cd test_generator

# Create language specification (or copy template)
cp specs/_template.yaml specs/go.yaml
# Edit go.yaml with Go code examples

# Generate tests automatically
python generator.py \
  --spec specs/go.yaml \
  --template templates/inheritance_test.py.j2 \
  --output test_go_inheritance.py

# Validate generated code
python -m py_compile test_go_inheritance.py  # Python syntax
# Review Go code syntax manually

# Output: 500-700 lines of high-quality test code in 5 minutes!

Benefits:

⏱️ 88-91% faster than manual test writing
✅ 100% syntax correctness (automated validation)
🌍 Language-agnostic (just provide code examples in YAML)
🤝 Community-friendly (non-Python developers can contribute)

Example output:

✅ Loaded spec: Go (extension: .go)
✅ Loaded template: inheritance_test.py.j2
🔧 Generating Go tests...
✅ Code validation passed

✅ Generated test file:
   File: test_go_inheritance.py
   Lines: 587
   Test classes: 7
   Test methods: 22

See CONTRIBUTING_LANGUAGE_SUPPORT.md for complete guide.

11. Framework Route Extraction (v0.5.0+)

NEW in v0.5.0: Automatically detect and extract routes from web frameworks with line numbers and descriptions.

codeindex automatically identifies web frameworks and extracts route information when scanning Controller/View directories. Routes are displayed as beautiful markdown tables in your README_AI.md files.

Supported Frameworks

Framework	Language	Status	Features
ThinkPHP	PHP	✅ Stable	Line numbers, PHPDoc descriptions, module-based routing
Laravel	PHP	🔄 Coming v0.6.0	Named routes, route groups, middleware
FastAPI	Python	🔄 Coming v0.6.0	Path operations, dependencies, tags
Django	Python	🔄 Coming v0.6.0	URL patterns, namespaces, view classes

Example Output

ThinkPHP Controller (Application/Admin/Controller/UserController.php):

class UserController {
    /**
     * Get user list with pagination
     */
    public function index() {
        // ...
    }

    /**
     * 创建新用户
     */
    public function create() {
        // ...
    }
}

Generated Route Table in README_AI.md:

## Routes (ThinkPHP)

| URL | Controller | Action | Location | Description |
|-----|------------|--------|----------|-------------|
| `/admin/user/index` | UserController | index | `UserController.php:12` | Get user list with pagination |
| `/admin/user/create` | UserController | create | `UserController.php:20` | 创建新用户 |

How It Works

Auto-Detection: Scans directory structure to detect web frameworks
Symbol Extraction: Parses controllers/views using tree-sitter
Route Inference: Applies framework-specific routing conventions
Documentation Extraction: Extracts docstrings/PHPDoc comments
Table Generation: Formats as markdown table in README_AI.md

Features:

✅ Line Numbers: Clickable file:line locations
✅ Descriptions: From PHPDoc/docstrings (auto-truncated to 60 chars)
✅ Multi-language: Supports Chinese and English descriptions
✅ Smart Filtering: Only public methods, excludes magic methods
✅ Zero Configuration: Just scan, routes auto-appear

Usage

# Routes are automatically extracted when scanning
codeindex scan-all

# Or scan specific controller directory
codeindex scan ./Application/Admin/Controller

No configuration needed! Routes are detected and extracted automatically.

For Developers

Want to add support for your favorite framework? See CLAUDE.md for the complete developer guide on creating custom route extractors.

📋 Recent Updates

Current version: v0.15.0

Key Features

🧪 Template-Based Test Generation (v0.14.0): AI-assisted test generation system
- 88-91% time savings (11-17 hours → ~2 hours per language)
- YAML-driven specifications: Declarative language definitions
- Jinja2 templating: Automated test code generation
- 100% quality validation: TypeScript tests ready (25 methods)
- Community-friendly: Enable non-Python developers to contribute
🔗 Call Relationship Extraction (v0.12.0): Function/method call graphs and dependency analysis
🛣️ Framework Route Extraction: Auto-detect routes from ThinkPHP and Spring frameworks
🤖 AI Docstring Extraction: Multi-language documentation normalization (PHP, Python)
🎯 KISS Universal Descriptions: Language-agnostic module summaries with actual symbol names
📊 Technical Debt Analysis: Detect code quality issues and complexity metrics
🚀 Automated Release Workflow: One-command releases with GitHub Actions + PyPI Trusted Publisher

Latest Improvements (v0.14.0)

✅ Interactive Setup Wizard with smart auto-detection
✅ Makefile automation for development and releases
✅ Git hooks for code quality (pre-commit, post-commit, pre-push)
✅ Modular CLI architecture (6 focused modules)
✅ Adaptive symbol extraction (5-150 symbols per file)
✅ Parallel scanning for faster indexing

See: CHANGELOG.md for complete version history

📖 Documentation

User Guides

Getting Started - Detailed installation and setup
Configuration Guide - All config options explained
Configuration Changelog - Version-by-version config changes
Advanced Usage - Parallel scanning, custom prompts
Git Hooks Integration - Automated code quality checks

Developer Guides

CONTRIBUTING.md - Development setup, TDD workflow, code style guidelines
CLAUDE.md - Quick reference for Claude Code and contributors
Design Philosophy - Core design principles and architecture
Release Automation - 5-minute automated release workflow
Multi-Language Support - Guide for adding new language support
Requirements Workflow - Planning, issues, and development process

Planning

Strategic Roadmap - Long-term vision and priorities
Changelog - Version history and breaking changes

⚙️ Configuration Reference

Complete `.codeindex.yaml`

codeindex: 1

# AI CLI command (required)
ai_command: 'claude -p "{prompt}" --allowedTools "Read"'

# Directory patterns
include:
  - src/                # Include all subdirectories recursively
  - modules/

exclude:
  - "**/test/**"
  - "**/__pycache__/**"
  - "**/node_modules/**"

# Language support
languages:
  - python
  - php

# Output settings
output_file: "README_AI.md"
parallel_workers: 8
batch_size: 50

# Smart indexing (generates tiered documentation)
indexing:
  max_readme_size: 51200
  root_level: "overview"
  module_level: "navigation"
  leaf_level: "detailed"

# Adaptive symbol extraction (v0.2.0+)
symbols:
  adaptive_symbols:
    enabled: true           # Enable dynamic symbol limits based on file size
    min_symbols: 5          # Minimum symbols for tiny files
    max_symbols: 150        # Maximum symbols for huge files
    thresholds:             # File size thresholds (lines)
      tiny: 100             # <100 lines → 5 symbols
      small: 500            # 100-500 lines → 15 symbols
      medium: 1500          # 500-1500 lines → 30 symbols
      large: 3000           # 1500-3000 lines → 50 symbols
      xlarge: 5000          # 3000-5000 lines → 80 symbols
      huge: 8000            # 5000-8000 lines → 120 symbols
      mega: null            # >8000 lines → 150 symbols
    limits:                 # Symbol limits per category
      tiny: 5
      small: 15
      medium: 30
      large: 50
      xlarge: 80
      huge: 120
      mega: 150

# Incremental updates
incremental:
  enabled: true
  thresholds:
    skip_lines: 5
    current_only: 50
    suggest_full: 200

# Git Hooks configuration (v0.7.0+, Story 6)
hooks:
  post_commit:
    mode: auto            # auto | disabled | async | sync | prompt
    max_dirs_sync: 2      # Auto mode: ≤2 dirs = sync, >2 = async
    enabled: true         # Master switch
    log_file: ~/.codeindex/hooks/post-commit.log

Hooks Modes:

auto (default): Smart detection based on project size
disabled: Completely disabled
async: Always non-blocking (background updates)
sync: Always blocking (immediate updates)
prompt: Reminder only, no auto-execution

See Git Hooks Integration Guide for detailed configuration.

🤖 Claude Code Integration

codeindex generates README_AI.md files that are perfect for Claude Code to understand your project architecture. By adding a CLAUDE.md file to your project, you can guide Claude Code to use these indexes effectively.

Why Use CLAUDE.md?

Without guidance, Claude Code might:

❌ Blindly search through all source files (slow and inefficient)
❌ Miss important architectural context
❌ Use Glob/Grep instead of semantic understanding

With CLAUDE.md, Claude Code will:

✅ Read README_AI.md files first (fast and structured)
✅ Understand your project architecture before diving into code
✅ Use Serena MCP tools for precise symbol navigation

Quick Setup

1. Copy the template to your project:

# After running codeindex scan-all
cp examples/CLAUDE.md.template CLAUDE.md

2. Customize the project-specific sections:

Edit the "Project Specific Configuration" section in your CLAUDE.md to document your project structure, key components, and development guidelines.

3. Commit and push:

git add CLAUDE.md README_AI.md **/README_AI.md
git commit -m "docs: add Claude Code integration"

What's Included in the Template

The template includes guidance for Claude Code to:

Prioritize README_AI.md files when understanding architecture
Use Serena MCP tools (find_symbol, find_referencing_symbols) for precise navigation
Follow a structured workflow: README → find_symbol → read source → analyze dependencies
Avoid inefficient patterns like Glob/Grep searches

Example Workflow

After setup, when you ask Claude Code about your project:

❌ Without CLAUDE.md:
You: "Where is the authentication module?"
Claude: [Uses Glob to search for "auth*"]
        [Scans 50 files, wastes time]

✅ With CLAUDE.md:
You: "Where is the authentication module?"
Claude: [Reads /src/README_AI.md]
        [Reads /src/auth/README_AI.md]
        "The authentication module is in src/auth/authenticator.py:15
         with UserAuthenticator class..."

Advanced Integration: MCP Skills

codeindex also includes MCP skills for Claude Code:

Skill	Description
`/mo:arch`	Query code architecture using README_AI.md indexes
`/mo:index`	Generate repository index with codeindex

Install skills:

# Navigate to codeindex directory
cd /path/to/codeindex

# Run install script
./skills/install.sh

For Git Hooks Users (v0.5.0+)

If you're using codeindex Git Hooks, help your AI Code CLI understand how hooks work:

Method 1: Let AI Code read the guide ⭐️ (Recommended)

# In your project directory, run:
codeindex docs show-ai-guide

Then tell your AI:

User: "Read the output above and update my CLAUDE.md with Git Hooks documentation"
AI Code: [Reads the guide]
         [Understands Git Hooks]
         [Updates your CLAUDE.md/AGENTS.md]
         ✅ Done!

Method 2: Direct AI integration

User: "Help my AI CLI understand codeindex Git Hooks"
AI Code: [User runs: codeindex docs show-ai-guide]
         [AI reads output]
         [Updates CLAUDE.md with Git Hooks section]
         ✅ Done! Future AI sessions will know about hooks.

What the guide contains:

Complete Git Hooks functionality explanation
Pre-commit and post-commit behaviors
Ready-to-use section template for your CLAUDE.md
Troubleshooting and common scenarios
Expected behaviors (auto-commits are normal!)

Why this matters: Your AI CLI needs to know that post-commit will create auto-commits (normal behavior) and that lint failures will block commits (by design).

Full Documentation

User Guide: docs/guides/claude-code-integration.md
Git Hooks Guide: docs/guides/git-hooks-integration.md
AI Integration: examples/ai-integration-guide.md
Template File: examples/CLAUDE.md.template
Skills Documentation: skills/README.md

🎯 Use Cases

📚 Code Understanding

Generate comprehensive documentation for legacy codebases to help new developers onboard faster.

🔍 Codebase Navigation

Create structured overviews of large projects (10,000+ files) for efficient exploration.

🤖 AI Agent Integration

Use generated indexes with tools like Claude Code or Cursor for better code context.

📝 Living Documentation

Keep documentation up-to-date by regenerating README_AI.md files as code changes.

🛠️ How It Works

Code Parsing & Documentation

Directory → Scanner → Parser (tree-sitter) → Smart Writer → README_AI.md (≤50KB)

Scanner: Walks directories, filters by config patterns
Parser: Extracts symbols (classes, functions, imports) using tree-sitter
Smart Writer: Generates tiered documentation with size limits
Output: Optimized README_AI.md for AI consumption

Test Generation (v0.14.0+)

Language Spec (YAML) → Jinja2 Template → Python Generator → Test File (500-700 lines)
                ↓                                              ↓
         Code Examples                                  Validation (100%)
         Expected Results                               Python + Target Language

YAML Specification: Define language syntax patterns and test scenarios
Jinja2 Template: Reusable test code template
Generator: Automated test file creation with validation
Output: High-quality pytest test suite

Key Innovation: Separate test definition (YAML) from test implementation (Python), enabling non-Python developers to contribute language support.

📐 Smart Indexing Architecture

codeindex generates tiered documentation optimized for AI agents:

Project Root/
├── PROJECT_INDEX.md (~10KB)     # Overview level
│   └── Module list + descriptions
│
├── Module/
│   └── README_AI.md (~30KB)     # Navigation level
│       ├── Grouped files by type
│       └── Key classes summary
│
└── LeafDir/
    └── README_AI.md (≤50KB)     # Detailed level
        ├── Full symbol info
        └── Dependencies

Configuration

indexing:
  max_readme_size: 51200    # 50KB limit
  symbols:
    max_per_file: 15
    include_visibility: [public, protected]
    exclude_patterns: ["get*", "set*"]
  grouping:
    by: suffix
    patterns:
      Controller: "HTTP handlers"
      Service: "Business logic"
      Model: "Data models"

🤖 AI Coder Integration

For Claude Code Users

Add this to your project's CLAUDE.md:

## Code Index

This project uses codeindex for AI-friendly documentation.

### How to Read Code Index

1. **Start with overview**: Read `PROJECT_INDEX.md` or root `README_AI.md` to understand project structure
2. **Locate module**: Find the relevant module from the module list
3. **Deep dive**: Read module's `README_AI.md` for file/symbol details
4. **Read source**: Open specific files when you need implementation details

### Index Files

- `README_AI.md` - Directory-level documentation (≤50KB each)
- Each directory with source code has its own README_AI.md

### Example Workflow

Task: "Fix user authentication bug"
1. Read root README_AI.md → Find Auth/User module
2. Read Auth/README_AI.md → Find AuthService.php
3. Read AuthService.php → Understand implementation

Usage Tips

Token efficient: Each README is ≤50KB, suitable for LLM context
Progressive loading: Start from overview, drill down as needed
Keep indexes updated: Run codeindex scan-all --fallback after major changes

CLAUDE.md Template

Copy the template to your project:

cp /path/to/codeindex/examples/CLAUDE.md.template your-project/CLAUDE.md

Or see examples/CLAUDE.md.template for the full template.

🔗 Integration with LoomGraph

codeindex and LoomGraph work together as complementary tools:

Architecture

codeindex (AST Parser)
    ↓ Structured Data (JSON)
LoomGraph (Knowledge Graph + AI)
    ↓ Insights & Analysis
Applications (IDE, CI/CD, Team Tools)

Division of Responsibilities

Tool	Focus	Key Features
codeindex	Code Parsing	AST extraction, symbol extraction, call/inheritance relationships, multi-language support
LoomGraph	AI Analysis	Knowledge graph, vector embeddings, semantic search, refactoring suggestions, team collaboration

What codeindex Provides

✅ Structured code data (symbols, calls, imports, inheritance)
✅ Multi-language support (Python, PHP, Java, TypeScript, Go, Rust, C#)
✅ Framework awareness (ThinkPHP, Spring, Laravel, FastAPI routes)
✅ JSON output for downstream tools (codeindex parse, codeindex scan --output json)

What LoomGraph Adds

🔍 Code similarity search (vector embeddings + semantic search)
🤖 Automated refactoring suggestions (graph analysis + AI)
👥 Team collaboration (shared knowledge graphs)
🔌 IDE integration (LSP server for real-time features)

Integration Guide

See docs/guides/loomgraph-integration.md or FOR_LOOMGRAPH.md for complete integration examples.

Quick Example:

# Parse a file and get JSON output
codeindex parse myfile.py | jq .

# Parse all files in a directory
codeindex scan ./src --output json > parse_results.json

# LoomGraph consumes this JSON to build knowledge graph

Why This Separation?

Single Responsibility: codeindex focuses on parsing, LoomGraph focuses on AI
Independent Evolution: Each tool can evolve without affecting the other
Flexible Integration: Use codeindex alone or with LoomGraph
Performance: Lightweight parser vs. heavyweight graph+AI system

🌍 Language Support

Language	Status	Version	Features
Python	✅ Supported	v0.1.0+	Classes, functions, methods, imports, docstrings, inheritance, calls
PHP	✅ Supported	v0.5.0+	Classes (extends/implements), methods (visibility, static, return types), properties, functions, inheritance, calls
Java	✅ Supported	v0.7.0+	Classes, interfaces, enums, records, annotations, Spring routes, Lombok, inheritance, calls
TypeScript/JS	🧪 Tests Ready	v0.14.0	Classes, functions, React components, JSDoc (Epic 15) - Parser implementation in progress
Go	📋 Planned	v0.15.0	Packages, interfaces, struct methods (Epic 16)
Rust	📋 Planned	v0.17.0	Structs, traits, modules (Epic 19)
C#	📋 Planned	v0.18.0	Classes, interfaces, .NET projects

🎯 Test Architecture (v0.14.0+)

codeindex uses a template-based test generation system to accelerate language support development:

YAML Language Specifications: Declarative syntax patterns and test scenarios
Jinja2 Templates: Automated Python test code generation
Quality Validation: 100% syntax correctness for both Python and target language
Time Savings: 88-91% reduction (11-17 hours → ~2 hours per language)

Current test coverage:

✅ Python: 50+ test methods (hand-written)
✅ PHP: 30+ test methods (hand-written)
✅ Java: 60+ test methods (hand-written)
✅ TypeScript: 25 test methods (template-generated, 100% quality)

Want to contribute a new language? See Contributing Language Support below.

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

🚀 Quick Start for Contributors

# Clone and install
git clone https://github.com/dreamlx/codeindex.git
cd codeindex

# Install with dev dependencies
make install-dev
# or: pip install -e ".[dev,all]"

# Install Git hooks (pre-push checks)
make install-hooks

# Run tests
make test
# or: pytest

# Lint and auto-fix
make lint-fix
# or: ruff check --fix src/

# See all available commands
make help

🌟 Contributing Language Support

Want to add support for Go, Rust, C++, or other languages? You don't need to know Python!

We use a template-based test generation system that lets you contribute by only knowing your target language:

Quick Start (1-3 hours total)

Create YAML specification (1-2 hours)

cd test_generator/specs
cp _template.yaml <language>.yaml
# Fill in code examples in your language

Generate tests (5 minutes)

python generator.py \
  --spec specs/<language>.yaml \
  --template templates/inheritance_test.py.j2 \
  --output test_<language>_inheritance.py

Review and submit PR (30-60 minutes)
- Verify Python syntax: python -m py_compile test_*.py
- Verify your language syntax (manual review)
- Submit PR with both YAML and generated test file

What You Need

✅ Familiarity with target language (Go/Rust/C++/etc.)
✅ Ability to write code examples in that language
✅ 1-3 hours of time
❌ NO Python knowledge required!

What You'll Create

YAML file: 20-30 code templates with expected parsing results
Test file: Auto-generated Python tests (you just review)

Quality Standards

Minimum: 6 test classes, 15 test methods
Target: 8 test classes, 25+ test methods
Validation: 100% Python syntax + 100% target language syntax

Examples

TypeScript: See test_generator/specs/typescript.yaml (351 lines, 28 templates)
Template: See test_generator/specs/_template.yaml (fully documented starter)

Full Guide

See CONTRIBUTING_LANGUAGE_SUPPORT.md for:

Detailed step-by-step instructions
YAML specification guide
PR template and checklist
FAQ and troubleshooting

Current recruitment: 🔥 Go, Rust, C++, C#, Ruby, Kotlin

📚 Developer Documentation

Quick Start Release Guide - 5-minute automated release workflow
Release Workflow - Complete release process documentation
Multi-Language Support - Guide for adding new language support
CONTRIBUTING.md - Development setup, TDD workflow, code style guidelines
Makefile - Run make help to see all available commands

🎯 Release Process (Maintainers)

# Automated one-command release
make release VERSION=0.13.0

# GitHub Actions will automatically:
# ✅ Run tests on Python 3.10, 3.11, 3.12
# ✅ Build and publish to PyPI
# ✅ Create GitHub Release

# See: docs/development/QUICK_START_RELEASE.md

📊 Roadmap

See Strategic Roadmap for detailed plans.

Completed (v0.14.0):

✅ Python, PHP, Java language support (with LoomGraph integration)
✅ Single file parse command (loose coupling with downstream tools)
✅ Parser modularization (3622→374 lines refactoring)
✅ Windows platform compatibility (UTF-8 + path optimization)
✅ Call relationships extraction (Python/Java/PHP)
✅ Framework routes (ThinkPHP, Spring Boot)
✅ Interactive Setup Wizard (codeindex init)

In Progress (v0.15.0):

🔄 Template-based test generation system (Epic 18)
🔄 Test architecture migration (Python/PHP/Java → YAML specs)

Next (v0.16.0 - v0.18.0):

📋 Framework routes expansion: Express, Laravel, FastAPI, Django (v0.16.0, Epic 17)
📋 Rust language support (v0.17.0, Epic 19)
📋 C# language support (v0.18.0)

Not Included (Moved to LoomGraph):

❌ Code similarity search → LoomGraph v0.3.0
❌ Automated refactoring suggestions → LoomGraph v0.4.0
❌ Team collaboration features → LoomGraph v0.5.0
❌ IDE deep integration (LSP server) → LoomGraph v0.6.0

Reason: codeindex focuses on code parsing (AST → structured data), while LoomGraph focuses on AI analysis (structured data → knowledge graph → insights).

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

tree-sitter - Fast, incremental parsing
Claude CLI - AI integration inspiration
All contributors and users

📞 Support

Questions: GitHub Discussions
Bugs: GitHub Issues
Feature Requests: GitHub Issues

⭐ Star History

If you find codeindex useful, please star the repository to show your support!

Made with ❤️ by the codeindex team

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

dreamlinx

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.23.2

Apr 14, 2026

0.23.1

Mar 15, 2026

0.23.0

Mar 12, 2026

0.22.2

Mar 8, 2026

0.20.0

Feb 20, 2026

0.19.0

Feb 18, 2026

0.18.0

Feb 18, 2026

0.17.3

Feb 13, 2026

0.17.2

Feb 12, 2026

0.17.1

Feb 12, 2026

0.17.0

Feb 12, 2026

0.16.1

Feb 12, 2026

0.16.0

Feb 12, 2026

0.15.1

Feb 12, 2026

This version

0.15.0

Feb 12, 2026

0.14.0

Feb 10, 2026

0.12.1

Feb 7, 2026

0.12.0

Feb 7, 2026

0.11.0

Feb 6, 2026

0.10.1

Feb 6, 2026

0.10.0

Feb 6, 2026

0.9.0

Feb 6, 2026

0.7.0

Feb 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_codeindex-0.15.0.tar.gz (1.0 MB view details)

Uploaded Feb 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_codeindex-0.15.0-py3-none-any.whl (144.8 kB view details)

Uploaded Feb 12, 2026 Python 3

File details

Details for the file ai_codeindex-0.15.0.tar.gz.

File metadata

Download URL: ai_codeindex-0.15.0.tar.gz
Upload date: Feb 12, 2026
Size: 1.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_codeindex-0.15.0.tar.gz
Algorithm	Hash digest
SHA256	`b4d68be428c144d48f574cd07695d0e0b3c3d626c409409d3dce4d3f231c1497`
MD5	`7da067c221c3507e31f902bcae681894`
BLAKE2b-256	`47c48c4721ce8b1c7ca1efb12e165e7c41e97b10501c6846caed8d88dbad81dd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_codeindex-0.15.0.tar.gz:

Publisher: publish.yml on dreamlx/codeindex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_codeindex-0.15.0.tar.gz
- Subject digest: b4d68be428c144d48f574cd07695d0e0b3c3d626c409409d3dce4d3f231c1497
- Sigstore transparency entry: 942873318
- Sigstore integration time: Feb 12, 2026
Source repository:
- Permalink: dreamlx/codeindex@d921838d477a83f0c6eea2a8c63ae6103936e471
- Branch / Tag: refs/tags/v0.15.0
- Owner: https://github.com/dreamlx
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d921838d477a83f0c6eea2a8c63ae6103936e471
- Trigger Event: push

File details

Details for the file ai_codeindex-0.15.0-py3-none-any.whl.

File metadata

Download URL: ai_codeindex-0.15.0-py3-none-any.whl
Upload date: Feb 12, 2026
Size: 144.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_codeindex-0.15.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b5729d7b72d071db985fed72c2b70d1436319dac7615418fe6c80f7a19a38cd3`
MD5	`30746d48a3d61c84add1a4180a34624c`
BLAKE2b-256	`06dce0ce8665f6cda35f19dac802cc3b992fd3a32431b10f2cd200ddfc3f858c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_codeindex-0.15.0-py3-none-any.whl:

Publisher: publish.yml on dreamlx/codeindex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_codeindex-0.15.0-py3-none-any.whl
- Subject digest: b5729d7b72d071db985fed72c2b70d1436319dac7615418fe6c80f7a19a38cd3
- Sigstore transparency entry: 942873321
- Sigstore integration time: Feb 12, 2026
Source repository:
- Permalink: dreamlx/codeindex@d921838d477a83f0c6eea2a8c63ae6103936e471
- Branch / Tag: refs/tags/v0.15.0
- Owner: https://github.com/dreamlx
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d921838d477a83f0c6eea2a8c63ae6103936e471
- Trigger Event: push

ai-codeindex 0.15.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

codeindex

✨ Features

📦 Installation

Basic Installation (Core Only)

Language-Specific Installation

Using pipx (Recommended)

From Source

🚀 Quick Start

1. Initialize Configuration

2. Configure AI CLI

3. Scan a Directory

4. Batch Processing

5. Generate Structured Data (JSON)

6. Parse Single Files

7. Check Status

8. Generate Symbol Indexes (v0.1.2+)

9. Analyze Technical Debt (v0.3.0+)

10. Generate Test Suite for New Languages (v0.14.0+)

11. Framework Route Extraction (v0.5.0+)

Supported Frameworks

Example Output

How It Works

Usage

For Developers

📋 Recent Updates

Key Features

Latest Improvements (v0.14.0)

📖 Documentation

User Guides

Developer Guides

Planning

⚙️ Configuration Reference

Complete .codeindex.yaml

🤖 Claude Code Integration

Why Use CLAUDE.md?

Quick Setup

What's Included in the Template

Example Workflow

Advanced Integration: MCP Skills

For Git Hooks Users (v0.5.0+)

Full Documentation

🎯 Use Cases

📚 Code Understanding

🔍 Codebase Navigation

🤖 AI Agent Integration

📝 Living Documentation

🛠️ How It Works

Code Parsing & Documentation

Test Generation (v0.14.0+)

📐 Smart Indexing Architecture

Configuration

🤖 AI Coder Integration

For Claude Code Users

Usage Tips

CLAUDE.md Template

🔗 Integration with LoomGraph

Architecture

Division of Responsibilities

What codeindex Provides

What LoomGraph Adds

Integration Guide

Why This Separation?

🌍 Language Support

🎯 Test Architecture (v0.14.0+)

🤝 Contributing

🚀 Quick Start for Contributors

🌟 Contributing Language Support

Quick Start (1-3 hours total)

What You Need

What You'll Create

Complete `.codeindex.yaml`