Skip to main content

AI-native code indexing tool for large codebases

Project description

codeindex

๐Ÿ‡ฌ๐Ÿ‡ง English | ๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡

PyPI version Python 3.10+ License: MIT Tests

Enterprise-grade Code Intelligence Platform โ€” Make AI agents understand your codebase through semantic navigation, not grep.

codeindex generates AI-readable documentation with two-phase pipeline: structural indexing (AST parsing via tree-sitter) + AI-powered module descriptions. AI agents can browse README_AI.md hierarchy, see module purposes at a glance, and navigate directly to the right code โ€” across Python, PHP, Java, TypeScript, JavaScript, Swift, and Objective-C. Designed for enterprise environments with intranet isolation.

๐Ÿข Enterprise Ready: โœ… Intranet compatible โœ… Self-contained โœ… Version stable โœ… Data sovereignty


For LoomGraph Developers: FOR_LOOMGRAPH.md (quick start) | docs/guides/loomgraph-integration.md (full guide)


Features

Core: Code Understanding for AI Agents

  • Two-phase documentation pipeline (v0.23.0) โ€” Phase 1: structural README_AI.md via SmartWriter; Phase 2: AI generates one-line functional descriptions per module. AI agents can browse README_AI.md hierarchy and find the right module without grep.
  • Smart indexing โ€” Tiered documentation (overview โ†’ navigation โ†’ detailed) optimized for AI agents, โ‰ค50KB per file
  • Auto-AI enrichment โ€” When ai_command is configured, scan-all automatically enables AI module descriptions. Use --no-ai to opt out
  • Auto-update hooks โ€” Post-commit hook automatically regenerates README_AI.md for changed directories. Thin wrapper pattern: pip upgrade auto-updates hook logic

Parsing & Analysis

  • Multi-language AST parsing โ€” Python, PHP, Java, TypeScript, JavaScript, Swift, Objective-C via tree-sitter (Go, Rust, C# planned)
  • Call relationship extraction โ€” Function/method call graphs across Python, Java, PHP, TypeScript, JavaScript
  • Inheritance extraction โ€” Class hierarchy and interface relationships
  • Framework route extraction โ€” ThinkPHP and Spring Boot route tables (more planned)
  • Technical debt analysis โ€” Detect large files, god classes, symbol overload, test smells
  • Single file parse โ€” codeindex parse <file> with JSON output for tool integration
  • Structured JSON output โ€” --output json for CI/CD, knowledge graphs, and downstream tools

Developer Experience

  • Adaptive symbol extraction โ€” Dynamic 5โ€“150 symbols per file based on size
  • CLAUDE.md injection โ€” codeindex init auto-configures Claude Code integration
  • Auto-update guide โ€” Post-install hook automatically updates ~/.claude/CLAUDE.md after pip upgrade
  • Template-based test generation โ€” YAML + Jinja2 for rapid language support (88โ€“91% time savings)
  • Parallel scanning โ€” Concurrent directory processing with configurable workers

Use Cases

๐Ÿข Enterprise Intranet (Core Scenario)

Without external tools: When Serena MCP or other cloud-based code intelligence tools are unavailable due to network isolation or security policies, codeindex becomes the primary code understanding tool.

# Enterprise developer workflow
git clone <internal-repo>
codeindex init                       # Configure project
codeindex scan-all                   # Structural + AI descriptions (auto)
# AI agent reads README_AI.md โ†’ sees module purposes โ†’ navigates directly
# No grep needed for code discovery
codeindex tech-debt src/ --output review.md  # Code quality analysis

Why enterprises choose codeindex:

  • โœ… Semantic navigation โ€” AI agents understand module purposes from README_AI.md hierarchy
  • โœ… Intranet compatible โ€” no external dependencies, fully offline
  • โœ… Self-contained โ€” no upstream MCP servers required
  • โœ… Version stable โ€” enterprise-controlled release cycle
  • โœ… Data sovereignty โ€” code never leaves internal network

๐Ÿ•ธ๏ธ Knowledge Graph Integration (LoomGraph)

For enterprise teams: codeindex serves as the core data source for LoomGraph knowledge graphs, enabling semantic code search across the organization.

# Data pipeline
codeindex scan --output json > parse_results.json
loomgraph inject parse_results.json  # Build knowledge graph
# Team can now search code using natural language

Three-repo architecture:

codeindex (Parse)  โ†’  LoomGraph (Orchestrate)  โ†’  LightRAG (Store)
   โ†“ ParseResult         โ†“ Embeddings              โ†“ Semantic Search
   AST extraction        Knowledge Graph           Vector + Graph DB

Without codeindex, LoomGraph cannot function. See LoomGraph Integration Guide.


๐Ÿ‘ค Personal Developers (Complementary)

With Serena MCP: For individual developers using Claude Code + Serena MCP, codeindex provides complementary value:

  • codeindex (build-time): Semantic architecture map (README_AI.md with module descriptions) + quality analysis
  • Serena (real-time): Precise symbol navigation (find_symbol, find_referencing_symbols)
# Personal developer workflow
codeindex init                    # Setup CLAUDE.md integration
codeindex scan-all                # Structural + AI descriptions (auto)
codeindex hooks install post-commit  # Auto-update on commit
# Claude Code reads README_AI.md โ†’ understands module purpose โ†’ uses Serena for details

Relationship: codeindex provides the "map with labels," Serena provides the "GPS navigation."


Installation

codeindex uses lazy loading โ€” language parsers are only imported when needed.

Quick Install

# All languages (recommended)
pip install ai-codeindex[all]

# Or specific languages only
pip install ai-codeindex[python]
pip install ai-codeindex[php]
pip install ai-codeindex[java]
pip install ai-codeindex[typescript]
pip install ai-codeindex[python,php]
pip install ai-codeindex[swift]
pip install ai-codeindex[ios]          # Swift + Objective-C

Using pipx (Recommended for CLI use)

pipx install ai-codeindex[all]

From Source

git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[all]"

Quick Start

1. Initialize Your Project

cd /your/project
codeindex init

This creates:

  • .codeindex.yaml โ€” scan configuration (languages, include/exclude patterns)
  • CLAUDE.md โ€” injects codeindex instructions so Claude Code uses README_AI.md automatically
  • CODEINDEX.md โ€” project-level documentation reference

2. Scan Your Codebase

# Scan all directories
# When ai_command is configured โ†’ auto Phase 1 (structural) + Phase 2 (AI descriptions)
# Without ai_command โ†’ Phase 1 only (structural)
codeindex scan-all

# Structural only (skip AI enrichment)
codeindex scan-all --no-ai

# Scan a single directory
codeindex scan ./src/auth

# Full AI-generated README for a single directory
codeindex scan ./src/auth --ai

# Preview AI prompt without executing
codeindex scan ./src/auth --ai --dry-run

3. Check Status

codeindex status
Indexing Status
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
โœ… src/auth/
โœ… src/utils/
โš ๏ธ  src/api/ (no README_AI.md)
Indexed: 2/3 (67%)

4. Generate Indexes

# Global symbol index (PROJECT_SYMBOLS.md)
codeindex symbols

# Module overview (PROJECT_INDEX.md)
codeindex index

# Git change impact analysis
codeindex affected --since HEAD~5

More Commands

Command Description Guide
codeindex scan --output json JSON output for tools JSON Output Guide
codeindex parse <file> Parse single file to JSON LoomGraph Integration
codeindex tech-debt ./src Code quality analysis (debt + test smells) Enhanced in v0.22.0
codeindex debt-scan ./src Alias for tech-debt Backward compatibility
codeindex hooks install Git hooks for auto-update Git Hooks Guide
codeindex config explain <param> Parameter help Configuration Guide

Claude Code Integration (Personal Developers)

For personal developers using Claude Code + Serena MCP:

v0.17.0: codeindex init automatically injects instructions into your project's CLAUDE.md, so Claude Code reads README_AI.md files first โ€” no manual setup required.

# One command sets everything up
codeindex init

# Claude Code will now:
# โœ… Read README_AI.md for architecture understanding
# โœ… Use Serena MCP tools for precise navigation (find_symbol, etc.)
# โœ… Apply tech-debt analysis for code quality checks

For enterprise users without Serena: README_AI.md and PROJECT_SYMBOLS.md become your primary code navigation tools.

For manual setup, MCP skills (/mo:arch, /mo:index), and Git hooks integration, see the Claude Code Integration Guide.


Language Support

Language Status Since Key Features
Python โœ… Supported v0.1.0 Classes, functions, methods, imports, docstrings, inheritance, calls
PHP โœ… Supported v0.5.0 Classes (extends/implements), methods, properties, PHPDoc, inheritance, calls
Java โœ… Supported v0.7.0 Classes, interfaces, enums, records, annotations, Spring routes, Lombok, calls
TypeScript/JS โœ… Supported v0.19.0 Classes, interfaces, enums, type aliases, arrow functions, JSX/TSX, imports/exports, calls
Swift โœ… Supported v0.21.0 Classes, structs, enums, protocols, extensions, methods, properties
Objective-C โœ… Supported v0.21.0 Classes, protocols, categories, properties, methods (instance/class)
Go ๐Ÿ“‹ Planned โ€” Packages, interfaces, struct methods
Rust ๐Ÿ“‹ Planned โ€” Structs, traits, modules
C# ๐Ÿ“‹ Planned โ€” Classes, interfaces, .NET projects

Want to add a language? The template-based test system lets you contribute by writing YAML specs โ€” no Python knowledge required. See CONTRIBUTING.md for details.

Framework Route Extraction

Framework Language Status
ThinkPHP PHP โœ… Stable (v0.5.0)
Spring Boot Java โœ… Stable (v0.8.0)
Laravel PHP ๐Ÿ“‹ Planned
FastAPI Python ๐Ÿ“‹ Planned
Django Python ๐Ÿ“‹ Planned
Express.js JS/TS ๐Ÿ“‹ Planned

Code Quality Analysis

tech-debt: Comprehensive Quality Analysis (Enhanced in v0.22.0)

The tech-debt command provides comprehensive code quality analysis, now including test smells detection:

# JSON output (for LoomGraph integration)
codeindex tech-debt ./src --format json > debt-data.json

# Markdown report (for documentation)
codeindex tech-debt ./src --format markdown > report.md

# Console output (for quick checks)
codeindex tech-debt ./src --format console

# Alias: debt-scan also works (backward compatibility)
codeindex debt-scan ./src --format json

What it detects:

  • ๐Ÿ”ด Super large files (>5000 lines), Large files (>2000 lines)
  • ๐Ÿ”ด God Classes (>50 methods)
  • ๐Ÿ”ด Long methods (>80/150 lines)
  • ๐ŸŸก High coupling (>8 internal imports)
  • ๐ŸŸก Symbol overload (>100 symbols, high noise ratio)
  • ๐Ÿงช Test smells (skipped tests, giant test files) โ€” New in v0.22.0
  • ๐Ÿ“Š Quality scoring (0-100 scale per file)

Enhanced JSON output (v0.22.0):

{
  "timestamp": "2026-03-06T13:45:39Z",
  "summary": {
    "total_files": 97,
    "giant_files": 0,
    "giant_functions": 3,
    "test_smells": 64,
    "avg_maintainability": 9.9
  },
  "total_files": 97,
  "average_quality_score": 99.4,
  "giant_files": [],
  "giant_functions": [...],
  "test_smells": [
    {
      "path": "tests/test_example.py",
      "type": "skipped_test",
      "details": "Skipped test detected: @pytest.mark.skip at line 42",
      "line_number": 42
    }
  ],
  "file_reports": [...]
}

Key features:

  • โœ… Unified command: Single entry point for all quality checks
  • โœ… Backward compatible: All existing JSON fields preserved
  • โœ… LoomGraph ready: Enhanced summary for knowledge graph integration
  • โœ… Framework-agnostic: Detects test smells across Jest, pytest, JUnit, etc.
  • โœ… KISS design: 90% code reuse, simple regex patterns for test detection

How It Works

Two-Phase Pipeline (v0.23.0)

Phase 1 (Structural):
  Directory โ†’ Scanner โ†’ Parser (tree-sitter) โ†’ SmartWriter โ†’ README_AI.md

Phase 2 (AI Enrichment, automatic when ai_command configured):
  README_AI.md โ†’ symbol names + file names โ†’ AI โ†’ one-line description โ†’ blockquote injection

Phase 1: Structural generation (always runs)

  1. Scanner โ€” walks directories, filters by config patterns
  2. Parser โ€” extracts symbols (classes, functions, imports, calls, inheritance) via tree-sitter
  3. SmartWriter โ€” generates tiered documentation with size limits (โ‰ค50KB)
  4. Output โ€” README_AI.md optimized for AI consumption, or JSON for tool integration

Phase 2: AI enrichment (auto-enabled when ai_command configured)

  • Generates a one-line functional description for each non-leaf module
  • Writes as blockquote: > ไผšๅ‘˜็ญ‰็บง็ฎก็†ใ€็งฏๅˆ†ๅ…‘ๆขใ€ๆƒ็›Šๅกๅˆธ
  • ~200-400 tokens per directory, 10-20x cheaper than full AI generation
  • Parent directories read child descriptions for hierarchical navigation

Before vs After: Code Navigation

Before (structural only):
  โ””โ”€โ”€ Application/
      โ”œโ”€โ”€ Vip/           โ€” 48 files | 386 symbols     โ† AI agent cannot determine purpose
      โ”œโ”€โ”€ Pay/           โ€” 23 files | 178 symbols
      โ””โ”€โ”€ SmallProgramApi/ โ€” 31 files | 245 symbols

After (structural + AI enrichment):
  โ””โ”€โ”€ Application/
      โ”œโ”€โ”€ Vip/           โ€” ไผšๅ‘˜็ญ‰็บง็ฎก็†ใ€็งฏๅˆ†ๅ…‘ๆขใ€ๆƒ็›Šๅกๅˆธ | 48 files
      โ”œโ”€โ”€ Pay/           โ€” ๆ”ฏไป˜็ฝ‘ๅ…ณ๏ผˆๆ”ฏไป˜ๅฎ/ๅพฎไฟก/้€€ๆฌพ๏ผ‰ | 23 files
      โ””โ”€โ”€ SmallProgramApi/ โ€” ๅฐ็จ‹ๅบ็ซฏAPI๏ผˆ็™ปๅฝ•ใ€ๅคดๅƒใ€ๅ•†ๅ“๏ผ‰ | 31 files
                             โ†‘ AI agent can navigate directly

Three-Repo Architecture (Enterprise Knowledge Graph)

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚            Enterprise Intranet Environment          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                    โ”‚
โ”‚  ๐Ÿ“ฆ Code Repository (Git)                          โ”‚
โ”‚       โ†“                                            โ”‚
โ”‚  ๐Ÿ” codeindex (Parse Layer)                        โ”‚
โ”‚       โ”œโ”€โ”€ scan --output json โ†’ ParseResult         โ”‚
โ”‚       โ”œโ”€โ”€ README_AI.md โ†’ architecture docs         โ”‚
โ”‚       โ””โ”€โ”€ tech-debt โ†’ comprehensive quality scan   โ”‚
โ”‚       โ†“                                            โ”‚
โ”‚  ๐Ÿ•ธ๏ธ LoomGraph (Orchestration Layer)                โ”‚
โ”‚       โ”œโ”€โ”€ inject ParseResult                       โ”‚
โ”‚       โ”œโ”€โ”€ generate embeddings                      โ”‚
โ”‚       โ””โ”€โ”€ build knowledge graph                    โ”‚
โ”‚       โ†“                                            โ”‚
โ”‚  ๐Ÿ’พ LightRAG (Storage Layer)                       โ”‚
โ”‚       โ”œโ”€โ”€ PostgreSQL (graph data)                  โ”‚
โ”‚       โ”œโ”€โ”€ Vector DB (embeddings)                   โ”‚
โ”‚       โ””โ”€โ”€ Query API (semantic search)              โ”‚
โ”‚       โ†“                                            โ”‚
โ”‚  ๐Ÿ’ฌ AI Agents (Claude Code, Internal Chat)         โ”‚
โ”‚       โ””โ”€โ”€ Natural language code search             โ”‚
โ”‚                                                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

codeindex role: Bottom layer (data collection & parsing) โ€” the entire system depends on codeindex providing structured ParseResult data.


Documentation

User Guides

Guide Description
Getting Started Installation and first scan
Configuration Guide All config options explained
Advanced Usage Parallel scanning, custom prompts
Git Hooks Integration Automated quality checks and doc updates
Claude Code Integration AI agent setup and MCP skills
JSON Output Integration Machine-readable output for tools
LoomGraph Integration Knowledge graph data pipeline

Developer Guides

Guide Description
CONTRIBUTING.md Development setup, TDD workflow, code style
CLAUDE.md Quick reference for Claude Code and contributors
Design Philosophy Core design principles and architecture
Release Automation 5-minute automated release workflow
Multi-Language Support Adding new language parsers
Language Support Contribution Template-based test generation for new languages

Planning


Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[dev,all]"
make install-hooks
make test

Release Process (Maintainers)

make release VERSION=0.17.0
# GitHub Actions: tests โ†’ PyPI publish โ†’ GitHub Release

See Release Automation Guide for details.


Roadmap

Current version: v0.23.1

Recent milestones:

  • v0.23.0 โ€” AI-Enhanced Module Descriptions: two-phase pipeline, auto-AI enrichment, post-commit thin wrapper
  • v0.22.2 โ€” Auto-update CLAUDE.md on pip upgrade, /codeindex-update-guide skill
  • v0.22.0 โ€” Unified tech-debt + test smells analysis
  • v0.21.0 โ€” Swift & Objective-C language support
  • v0.19.0 โ€” TypeScript/JavaScript support with call extraction

Next:

  • Framework routes expansion: Express, Laravel, FastAPI, Django (Epic 17)
  • Go, Rust, C# language support

Moved to LoomGraph:

  • Code similarity search, refactoring suggestions, team collaboration, IDE integration

See Strategic Roadmap for detailed plans.


License

MIT License โ€” see LICENSE file for details.

Acknowledgments

  • tree-sitter โ€” fast, incremental parsing
  • Claude CLI โ€” AI integration inspiration
  • All contributors and users

Support


Made with โค๏ธ by the codeindex team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_codeindex-0.23.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_codeindex-0.23.1-py3-none-any.whl (211.4 kB view details)

Uploaded Python 3

File details

Details for the file ai_codeindex-0.23.1.tar.gz.

File metadata

  • Download URL: ai_codeindex-0.23.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_codeindex-0.23.1.tar.gz
Algorithm Hash digest
SHA256 42962cebb01d5710979268d45c03ce831acbe394a3262010721dfa490692e241
MD5 062f44f963488c72459ad2392d25b487
BLAKE2b-256 7d0ee32ccebccb9804279bed5085bd1e7f0dbb2017ff4c1c0c8d8010c81146e1

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_codeindex-0.23.1.tar.gz:

Publisher: publish.yml on dreamlx/codeindex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_codeindex-0.23.1-py3-none-any.whl.

File metadata

  • Download URL: ai_codeindex-0.23.1-py3-none-any.whl
  • Upload date:
  • Size: 211.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_codeindex-0.23.1-py3-none-any.whl
Algorithm Hash digest
SHA256 312b0bef0afadd12580633010d79bf3bc742bb78055e7a1578b6325ac4fc7708
MD5 bab4e14bdc81aac766383d2265db7c16
BLAKE2b-256 36493b449c52d2a058abecb052dcfa72c23c2bbf7159d79758b82b48c746c115

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_codeindex-0.23.1-py3-none-any.whl:

Publisher: publish.yml on dreamlx/codeindex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page