AI-native code indexing tool for large codebases
Project description
codeindex
๐ฌ๐ง English | ๐จ๐ณ ไธญๆ
Enterprise-grade Code Intelligence Platform โ Make AI agents understand your codebase through semantic navigation, not grep.
codeindex generates AI-readable documentation with two-phase pipeline: structural indexing (AST parsing via tree-sitter) + AI-powered module descriptions. AI agents can browse README_AI.md hierarchy, see module purposes at a glance, and navigate directly to the right code โ across Python, PHP, Java, TypeScript, JavaScript, Swift, and Objective-C. Designed for enterprise environments with intranet isolation.
๐ข Enterprise Ready: โ Intranet compatible โ Self-contained โ Version stable โ Data sovereignty
For LoomGraph Developers:
FOR_LOOMGRAPH.md(quick start) |docs/guides/loomgraph-integration.md(full guide)
Features
Core: Code Understanding for AI Agents
- Two-phase documentation pipeline (v0.23.0) โ Phase 1: structural README_AI.md via SmartWriter; Phase 2: AI generates one-line functional descriptions per module. AI agents can browse README_AI.md hierarchy and find the right module without grep.
- Smart indexing โ Tiered documentation (overview โ navigation โ detailed) optimized for AI agents, โค50KB per file
- Auto-AI enrichment โ When
ai_commandis configured,scan-allautomatically enables AI module descriptions. Use--no-aito opt out - Auto-update hooks โ Post-commit hook automatically regenerates README_AI.md for changed directories. Thin wrapper pattern:
pip upgradeauto-updates hook logic
Parsing & Analysis
- Multi-language AST parsing โ Python, PHP, Java, TypeScript, JavaScript, Swift, Objective-C via tree-sitter (Go, Rust, C# planned)
- Call relationship extraction โ Function/method call graphs across Python, Java, PHP, TypeScript, JavaScript
- Inheritance extraction โ Class hierarchy and interface relationships
- Framework route extraction โ ThinkPHP and Spring Boot route tables (more planned)
- Technical debt analysis โ Detect large files, god classes, symbol overload, test smells
- Single file parse โ
codeindex parse <file>with JSON output for tool integration - Structured JSON output โ
--output jsonfor CI/CD, knowledge graphs, and downstream tools
Developer Experience
- Adaptive symbol extraction โ Dynamic 5โ150 symbols per file based on size
- CLAUDE.md injection โ
codeindex initauto-configures Claude Code integration - Auto-update guide โ Post-install hook automatically updates
~/.claude/CLAUDE.mdafterpip upgrade - Template-based test generation โ YAML + Jinja2 for rapid language support (88โ91% time savings)
- Parallel scanning โ Concurrent directory processing with configurable workers
Use Cases
๐ข Enterprise Intranet (Core Scenario)
Without external tools: When Serena MCP or other cloud-based code intelligence tools are unavailable due to network isolation or security policies, codeindex becomes the primary code understanding tool.
# Enterprise developer workflow
git clone <internal-repo>
codeindex init # Configure project
codeindex scan-all # Structural + AI descriptions (auto)
# AI agent reads README_AI.md โ sees module purposes โ navigates directly
# No grep needed for code discovery
codeindex tech-debt src/ --output review.md # Code quality analysis
Why enterprises choose codeindex:
- โ Semantic navigation โ AI agents understand module purposes from README_AI.md hierarchy
- โ Intranet compatible โ no external dependencies, fully offline
- โ Self-contained โ no upstream MCP servers required
- โ Version stable โ enterprise-controlled release cycle
- โ Data sovereignty โ code never leaves internal network
๐ธ๏ธ Knowledge Graph Integration (LoomGraph)
For enterprise teams: codeindex serves as the core data source for LoomGraph knowledge graphs, enabling semantic code search across the organization.
# Data pipeline
codeindex scan --output json > parse_results.json
loomgraph inject parse_results.json # Build knowledge graph
# Team can now search code using natural language
Three-repo architecture:
codeindex (Parse) โ LoomGraph (Orchestrate) โ LightRAG (Store)
โ ParseResult โ Embeddings โ Semantic Search
AST extraction Knowledge Graph Vector + Graph DB
Without codeindex, LoomGraph cannot function. See LoomGraph Integration Guide.
๐ค Personal Developers (Complementary)
With Serena MCP: For individual developers using Claude Code + Serena MCP, codeindex provides complementary value:
- codeindex (build-time): Semantic architecture map (README_AI.md with module descriptions) + quality analysis
- Serena (real-time): Precise symbol navigation (
find_symbol,find_referencing_symbols)
# Personal developer workflow
codeindex init # Setup CLAUDE.md integration
codeindex scan-all # Structural + AI descriptions (auto)
codeindex hooks install post-commit # Auto-update on commit
# Claude Code reads README_AI.md โ understands module purpose โ uses Serena for details
Relationship: codeindex provides the "map with labels," Serena provides the "GPS navigation."
Installation
codeindex uses lazy loading โ language parsers are only imported when needed.
Quick Install
# All languages (recommended)
pip install ai-codeindex[all]
# Or specific languages only
pip install ai-codeindex[python]
pip install ai-codeindex[php]
pip install ai-codeindex[java]
pip install ai-codeindex[typescript]
pip install ai-codeindex[python,php]
pip install ai-codeindex[swift]
pip install ai-codeindex[ios] # Swift + Objective-C
Using pipx (Recommended for CLI use)
pipx install ai-codeindex[all]
From Source
git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[all]"
Quick Start
1. Initialize Your Project
cd /your/project
codeindex init
This creates:
.codeindex.yamlโ scan configuration (languages, include/exclude patterns)CLAUDE.mdโ injects codeindex instructions so Claude Code uses README_AI.md automaticallyCODEINDEX.mdโ project-level documentation reference
2. Scan Your Codebase
# Scan all directories
# When ai_command is configured โ auto Phase 1 (structural) + Phase 2 (AI descriptions)
# Without ai_command โ Phase 1 only (structural)
codeindex scan-all
# Structural only (skip AI enrichment)
codeindex scan-all --no-ai
# Scan a single directory
codeindex scan ./src/auth
# Full AI-generated README for a single directory
codeindex scan ./src/auth --ai
# Preview AI prompt without executing
codeindex scan ./src/auth --ai --dry-run
3. Check Status
codeindex status
Indexing Status
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
src/auth/
โ
src/utils/
โ ๏ธ src/api/ (no README_AI.md)
Indexed: 2/3 (67%)
4. Generate Indexes
# Global symbol index (PROJECT_SYMBOLS.md)
codeindex symbols
# Module overview (PROJECT_INDEX.md)
codeindex index
# Git change impact analysis
codeindex affected --since HEAD~5
More Commands
| Command | Description | Guide |
|---|---|---|
codeindex scan --output json |
JSON output for tools | JSON Output Guide |
codeindex parse <file> |
Parse single file to JSON | LoomGraph Integration |
codeindex tech-debt ./src |
Code quality analysis (debt + test smells) | Enhanced in v0.22.0 |
codeindex debt-scan ./src |
Alias for tech-debt | Backward compatibility |
codeindex hooks install |
Git hooks for auto-update | Git Hooks Guide |
codeindex config explain <param> |
Parameter help | Configuration Guide |
Claude Code Integration (Personal Developers)
For personal developers using Claude Code + Serena MCP:
v0.17.0: codeindex init automatically injects instructions into your project's CLAUDE.md, so Claude Code reads README_AI.md files first โ no manual setup required.
# One command sets everything up
codeindex init
# Claude Code will now:
# โ
Read README_AI.md for architecture understanding
# โ
Use Serena MCP tools for precise navigation (find_symbol, etc.)
# โ
Apply tech-debt analysis for code quality checks
For enterprise users without Serena: README_AI.md and PROJECT_SYMBOLS.md become your primary code navigation tools.
For manual setup, MCP skills (/mo:arch, /mo:index), and Git hooks integration, see the Claude Code Integration Guide.
Language Support
| Language | Status | Since | Key Features |
|---|---|---|---|
| Python | โ Supported | v0.1.0 | Classes, functions, methods, imports, docstrings, inheritance, calls |
| PHP | โ Supported | v0.5.0 | Classes (extends/implements), methods, properties, PHPDoc, inheritance, calls |
| Java | โ Supported | v0.7.0 | Classes, interfaces, enums, records, annotations, Spring routes, Lombok, calls |
| TypeScript/JS | โ Supported | v0.19.0 | Classes, interfaces, enums, type aliases, arrow functions, JSX/TSX, imports/exports, calls |
| Swift | โ Supported | v0.21.0 | Classes, structs, enums, protocols, extensions, methods, properties |
| Objective-C | โ Supported | v0.21.0 | Classes, protocols, categories, properties, methods (instance/class) |
| Go | ๐ Planned | โ | Packages, interfaces, struct methods |
| Rust | ๐ Planned | โ | Structs, traits, modules |
| C# | ๐ Planned | โ | Classes, interfaces, .NET projects |
Want to add a language? The template-based test system lets you contribute by writing YAML specs โ no Python knowledge required. See CONTRIBUTING.md for details.
Framework Route Extraction
| Framework | Language | Status |
|---|---|---|
| ThinkPHP | PHP | โ Stable (v0.5.0) |
| Spring Boot | Java | โ Stable (v0.8.0) |
| Laravel | PHP | ๐ Planned |
| FastAPI | Python | ๐ Planned |
| Django | Python | ๐ Planned |
| Express.js | JS/TS | ๐ Planned |
Code Quality Analysis
tech-debt: Comprehensive Quality Analysis (Enhanced in v0.22.0)
The tech-debt command provides comprehensive code quality analysis, now including test smells detection:
# JSON output (for LoomGraph integration)
codeindex tech-debt ./src --format json > debt-data.json
# Markdown report (for documentation)
codeindex tech-debt ./src --format markdown > report.md
# Console output (for quick checks)
codeindex tech-debt ./src --format console
# Alias: debt-scan also works (backward compatibility)
codeindex debt-scan ./src --format json
What it detects:
- ๐ด Super large files (>5000 lines), Large files (>2000 lines)
- ๐ด God Classes (>50 methods)
- ๐ด Long methods (>80/150 lines)
- ๐ก High coupling (>8 internal imports)
- ๐ก Symbol overload (>100 symbols, high noise ratio)
- ๐งช Test smells (skipped tests, giant test files) โ New in v0.22.0
- ๐ Quality scoring (0-100 scale per file)
Enhanced JSON output (v0.22.0):
{
"timestamp": "2026-03-06T13:45:39Z",
"summary": {
"total_files": 97,
"giant_files": 0,
"giant_functions": 3,
"test_smells": 64,
"avg_maintainability": 9.9
},
"total_files": 97,
"average_quality_score": 99.4,
"giant_files": [],
"giant_functions": [...],
"test_smells": [
{
"path": "tests/test_example.py",
"type": "skipped_test",
"details": "Skipped test detected: @pytest.mark.skip at line 42",
"line_number": 42
}
],
"file_reports": [...]
}
Key features:
- โ Unified command: Single entry point for all quality checks
- โ Backward compatible: All existing JSON fields preserved
- โ LoomGraph ready: Enhanced summary for knowledge graph integration
- โ Framework-agnostic: Detects test smells across Jest, pytest, JUnit, etc.
- โ KISS design: 90% code reuse, simple regex patterns for test detection
How It Works
Two-Phase Pipeline (v0.23.0)
Phase 1 (Structural):
Directory โ Scanner โ Parser (tree-sitter) โ SmartWriter โ README_AI.md
Phase 2 (AI Enrichment, automatic when ai_command configured):
README_AI.md โ symbol names + file names โ AI โ one-line description โ blockquote injection
Phase 1: Structural generation (always runs)
- Scanner โ walks directories, filters by config patterns
- Parser โ extracts symbols (classes, functions, imports, calls, inheritance) via tree-sitter
- SmartWriter โ generates tiered documentation with size limits (โค50KB)
- Output โ
README_AI.mdoptimized for AI consumption, or JSON for tool integration
Phase 2: AI enrichment (auto-enabled when ai_command configured)
- Generates a one-line functional description for each non-leaf module
- Writes as blockquote:
> ไผๅ็ญ็บง็ฎก็ใ็งฏๅๅ ๆขใๆ็ๅกๅธ - ~200-400 tokens per directory, 10-20x cheaper than full AI generation
- Parent directories read child descriptions for hierarchical navigation
Before vs After: Code Navigation
Before (structural only):
โโโ Application/
โโโ Vip/ โ 48 files | 386 symbols โ AI agent cannot determine purpose
โโโ Pay/ โ 23 files | 178 symbols
โโโ SmallProgramApi/ โ 31 files | 245 symbols
After (structural + AI enrichment):
โโโ Application/
โโโ Vip/ โ ไผๅ็ญ็บง็ฎก็ใ็งฏๅๅ
ๆขใๆ็ๅกๅธ | 48 files
โโโ Pay/ โ ๆฏไป็ฝๅ
ณ๏ผๆฏไปๅฎ/ๅพฎไฟก/้ๆฌพ๏ผ | 23 files
โโโ SmallProgramApi/ โ ๅฐ็จๅบ็ซฏAPI๏ผ็ปๅฝใๅคดๅใๅๅ๏ผ | 31 files
โ AI agent can navigate directly
Three-Repo Architecture (Enterprise Knowledge Graph)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Enterprise Intranet Environment โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ ๐ฆ Code Repository (Git) โ
โ โ โ
โ ๐ codeindex (Parse Layer) โ
โ โโโ scan --output json โ ParseResult โ
โ โโโ README_AI.md โ architecture docs โ
โ โโโ tech-debt โ comprehensive quality scan โ
โ โ โ
โ ๐ธ๏ธ LoomGraph (Orchestration Layer) โ
โ โโโ inject ParseResult โ
โ โโโ generate embeddings โ
โ โโโ build knowledge graph โ
โ โ โ
โ ๐พ LightRAG (Storage Layer) โ
โ โโโ PostgreSQL (graph data) โ
โ โโโ Vector DB (embeddings) โ
โ โโโ Query API (semantic search) โ
โ โ โ
โ ๐ฌ AI Agents (Claude Code, Internal Chat) โ
โ โโโ Natural language code search โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
codeindex role: Bottom layer (data collection & parsing) โ the entire system depends on codeindex providing structured ParseResult data.
Documentation
User Guides
| Guide | Description |
|---|---|
| Getting Started | Installation and first scan |
| Configuration Guide | All config options explained |
| Advanced Usage | Parallel scanning, custom prompts |
| Git Hooks Integration | Automated quality checks and doc updates |
| Claude Code Integration | AI agent setup and MCP skills |
| JSON Output Integration | Machine-readable output for tools |
| LoomGraph Integration | Knowledge graph data pipeline |
Developer Guides
| Guide | Description |
|---|---|
| CONTRIBUTING.md | Development setup, TDD workflow, code style |
| CLAUDE.md | Quick reference for Claude Code and contributors |
| Design Philosophy | Core design principles and architecture |
| Release Automation | 5-minute automated release workflow |
| Multi-Language Support | Adding new language parsers |
| Language Support Contribution | Template-based test generation for new languages |
Planning
- Strategic Roadmap โ long-term vision and priorities
- Changelog โ version history and breaking changes
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
git clone https://github.com/dreamlx/codeindex.git
cd codeindex
pip install -e ".[dev,all]"
make install-hooks
make test
Release Process (Maintainers)
make release VERSION=0.17.0
# GitHub Actions: tests โ PyPI publish โ GitHub Release
See Release Automation Guide for details.
Roadmap
Current version: v0.23.1
Recent milestones:
- v0.23.0 โ AI-Enhanced Module Descriptions: two-phase pipeline, auto-AI enrichment, post-commit thin wrapper
- v0.22.2 โ Auto-update CLAUDE.md on
pip upgrade,/codeindex-update-guideskill - v0.22.0 โ Unified tech-debt + test smells analysis
- v0.21.0 โ Swift & Objective-C language support
- v0.19.0 โ TypeScript/JavaScript support with call extraction
Next:
- Framework routes expansion: Express, Laravel, FastAPI, Django (Epic 17)
- Go, Rust, C# language support
Moved to LoomGraph:
- Code similarity search, refactoring suggestions, team collaboration, IDE integration
See Strategic Roadmap for detailed plans.
License
MIT License โ see LICENSE file for details.
Acknowledgments
- tree-sitter โ fast, incremental parsing
- Claude CLI โ AI integration inspiration
- All contributors and users
Support
- Questions: GitHub Discussions
- Bugs: GitHub Issues
- Feature Requests: GitHub Issues
Made with โค๏ธ by the codeindex team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_codeindex-0.23.1.tar.gz.
File metadata
- Download URL: ai_codeindex-0.23.1.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42962cebb01d5710979268d45c03ce831acbe394a3262010721dfa490692e241
|
|
| MD5 |
062f44f963488c72459ad2392d25b487
|
|
| BLAKE2b-256 |
7d0ee32ccebccb9804279bed5085bd1e7f0dbb2017ff4c1c0c8d8010c81146e1
|
Provenance
The following attestation bundles were made for ai_codeindex-0.23.1.tar.gz:
Publisher:
publish.yml on dreamlx/codeindex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_codeindex-0.23.1.tar.gz -
Subject digest:
42962cebb01d5710979268d45c03ce831acbe394a3262010721dfa490692e241 - Sigstore transparency entry: 1107700520
- Sigstore integration time:
-
Permalink:
dreamlx/codeindex@a9ea03b24f1d0ebf930fb289483be69ec99d893d -
Branch / Tag:
refs/tags/v0.23.1 - Owner: https://github.com/dreamlx
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a9ea03b24f1d0ebf930fb289483be69ec99d893d -
Trigger Event:
push
-
Statement type:
File details
Details for the file ai_codeindex-0.23.1-py3-none-any.whl.
File metadata
- Download URL: ai_codeindex-0.23.1-py3-none-any.whl
- Upload date:
- Size: 211.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
312b0bef0afadd12580633010d79bf3bc742bb78055e7a1578b6325ac4fc7708
|
|
| MD5 |
bab4e14bdc81aac766383d2265db7c16
|
|
| BLAKE2b-256 |
36493b449c52d2a058abecb052dcfa72c23c2bbf7159d79758b82b48c746c115
|
Provenance
The following attestation bundles were made for ai_codeindex-0.23.1-py3-none-any.whl:
Publisher:
publish.yml on dreamlx/codeindex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_codeindex-0.23.1-py3-none-any.whl -
Subject digest:
312b0bef0afadd12580633010d79bf3bc742bb78055e7a1578b6325ac4fc7708 - Sigstore transparency entry: 1107700521
- Sigstore integration time:
-
Permalink:
dreamlx/codeindex@a9ea03b24f1d0ebf930fb289483be69ec99d893d -
Branch / Tag:
refs/tags/v0.23.1 - Owner: https://github.com/dreamlx
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a9ea03b24f1d0ebf930fb289483be69ec99d893d -
Trigger Event:
push
-
Statement type: