Developer intelligence CLI — semantic code search, AI-assisted understanding, and agent tooling for codebases
Project description
CodexA — Developer Intelligence Engine
Semantic code search · AI-assisted understanding · Agent tooling protocol
CodexA is a lightweight developer intelligence engine designed to cooperate with AI coding assistants (GitHub Copilot, Cursor, Cline, etc.) and developer tooling. It indexes codebases locally, performs semantic search, and exposes a structured tool protocol that any AI agent can call over HTTP or CLI.
Features
| Area | What you get |
|---|---|
| Code Indexing | Scan repos, extract functions/classes, generate vector embeddings (sentence-transformers + FAISS), ONNX runtime option, parallel indexing, --watch live re-indexing, .codexaignore support, --add/--inspect per-file control, model-consistency guard, Ctrl+C partial-save |
| Rust Search Engine | Native codexa-core Rust crate via PyO3 — HNSW approximate nearest-neighbour search, BM25 keyword index, tree-sitter AST chunker (10 languages), memory-mapped vector persistence, parallel file scanner, optional ONNX embedding inference, optional Tantivy full-text search |
| Multi-Mode Search | Semantic, keyword (BM25), regex, hybrid (RRF), and raw filesystem grep (ripgrep backend) with full -A/-B/-C/-w/-v/-c/-l/-L/--exclude/--no-ignore flags, --hybrid/--sem shorthands, --scores, --snippet-length, --no-snippet, JSONL streaming |
| RAG Pipeline | 4-stage Retrieval-Augmented Generation — Retrieve → Deduplicate → Re-rank → Assemble with token budget, cross-encoder re-ranking, source citations |
| Code Context | Rich context windows — imports, dependencies, AST-based call graphs, surrounding code |
| Repository Analysis | Language breakdown (codexa languages), module summaries, component detection |
| AI Agent Protocol | 13 built-in tools exposed via HTTP bridge, MCP server (13 tools with pagination/cursors), MCP-over-SSE (--mcp), codexa --serve shorthand, Claude Desktop auto-config (--claude-config), or CLI for any AI agent to invoke |
| Quality & Metrics | Complexity analysis, maintainability scoring, quality gates for CI |
| Multi-Repo Workspaces | Link multiple repos under one workspace for cross-repo search & refactoring |
| Interactive TUI | Terminal REPL with mode switching for interactive exploration |
| Streaming Responses | Token-by-token streaming for chat and investigation commands |
| Plugin System | 22 hooks for extending every layer — from indexing to tool invocation |
| VS Code Extension | 4-panel sidebar (Search, Symbols, Quality, Tools), 8 commands, CodeLens, context menus, status bar |
| Editor Plugins | Zed, JetBrains (IntelliJ/PyCharm), Neovim (telescope.nvim), Vim, Sublime Text, Emacs, Helix, Eclipse -- all sharing the same MCP/bridge protocol |
| Cross-Language Intelligence | FFI pattern detection, polyglot dependency graphs, language-aware search boosting, universal multi-language call graph |
| Multi-Agent Sessions | Concurrent AI agent sessions with shared discovery, semantic diff (rename/move/signature/body detection), RAG code generation |
Quick Start
1. Install
pip install codexa
For semantic indexing and vector search, install the ML extras:
pip install "codexa[ml]"
Or install from source:
git clone https://github.com/M9nx/CodexA.git
cd CodexA
pip install -e ".[dev]"
Alternative installation methods:
# Docker
docker build -t codexa .
docker run --rm -v /path/to/project:/workspace codexa search "auth"
# Homebrew (macOS)
brew install --formula Formula/codexa.rb
2. Initialize a Project
Navigate to any project you want to analyze and run:
cd /path/to/your-project
codexa init
CodexA auto-detects your available RAM and picks the best embedding model. Or choose a model profile explicitly:
codexa init --profile fast # mxbai-embed-xsmall — low RAM (<1 GB)
codexa init --profile balanced # MiniLM — good balance (~2 GB)
codexa init --profile precise # jina-code — best quality (~4 GB)
This creates a .codexa/ directory with configuration, index storage, and session data.
3. Index the Codebase
codexa index .
This parses all source files (Python, JS/TS, Java, Go, Rust, C#, Ruby, C++),
extracts symbols, generates embeddings, and stores them in a local FAISS index.
Semantic indexing requires codexa[ml].
If you need to keep secrets, generated files, or local config files out of the
index, add patterns to .codexaignore at the project root or configure
index.exclude_files in .codexa/config.json.
Typical .codexaignore example:
.env*
secrets/*.json
config/local-*.yml
vendor/*
The default embedding model is small, but the PyTorch backend still needs about 2 GB of available RAM. On lower-memory machines, prefer the ONNX backend.
4. Semantic Search
codexa search "jwt authentication"
codexa search "database connection pool" --json
codexa search "error handling" -k 5
5. Explore More
codexa explain MyClass # Structural explanation of a symbol
codexa context parse_config # Rich AI context window
codexa deps src/auth.py # Import / dependency map
codexa summary # Full repo summary
codexa quality src/ # Code quality analysis
codexa hotspots # High-risk code hotspots
codexa trace handle_request # Execution trace of a symbol
codexa evolve # Self-improving development loop
codexa grep "TODO|FIXME" # Raw filesystem grep (ripgrep or Python)
codexa benchmark # Performance benchmarking
Using CodexA with AI Agents (GitHub Copilot, etc.)
CodexA is designed to be called by AI coding assistants as an external tool. There are three integration modes: CLI tool mode, HTTP bridge server, and in-process Python API.
Option A — CLI Tool Mode (Recommended for Copilot Chat)
Any AI agent that can run shell commands can use CodexA directly:
# List available tools
codexa tool list --json
# Run a tool with arguments
codexa tool run semantic_search --arg query="authentication middleware" --json
codexa tool run explain_symbol --arg symbol_name="UserService" --json
codexa tool run get_call_graph --arg symbol_name="process_payment" --json
codexa tool run get_dependencies --arg file_path="src/auth.py" --json
# Get tool schema (so the agent knows what arguments to pass)
codexa tool schema semantic_search --json
The --json flag ensures machine-readable output. The --pipe flag suppresses
colors and spinners for clean piping.
Option B — HTTP Bridge Server (For MCP / Long-Running Agents)
Start the bridge server to expose all tools over HTTP:
codexa serve --port 24842
The server runs on http://127.0.0.1:24842 and exposes:
| Method | Endpoint | Description |
|---|---|---|
GET |
/capabilities |
Full capability manifest — version, tools, supported requests |
GET |
/health |
Health check → {"status": "ok"} |
GET |
/tools/list |
List all available tools with schemas |
POST |
/tools/invoke |
Execute a tool by name with arguments |
GET |
/tools/stream |
SSE stream — tool discovery + heartbeat |
POST |
/request |
Dispatch any AgentRequest (12 request kinds) |
Example — invoke a tool via HTTP:
curl -X POST http://127.0.0.1:24842/tools/invoke \
-H "Content-Type: application/json" \
-d '{"tool_name": "semantic_search", "arguments": {"query": "error handling"}}'
Example — list capabilities:
curl http://127.0.0.1:24842/capabilities
Option C — Python API (In-Process)
from pathlib import Path
from semantic_code_intelligence.tools.executor import ToolExecutor
from semantic_code_intelligence.tools.protocol import ToolInvocation
executor = ToolExecutor(Path("/path/to/project"))
invocation = ToolInvocation(tool_name="semantic_search", arguments={"query": "auth"})
result = executor.execute(invocation)
print(result.success) # True
print(result.result_payload) # dict with search results
print(result.execution_time_ms) # timing in milliseconds
Setting Up with VS Code + GitHub Copilot
Step 1 — Install CodexA globally
# Clone the repo
git clone https://github.com/M9nx/CodexA.git
# Install it (makes `codexa` available system-wide in your venv)
cd CodexA
pip install -e ".[dev]"
# Verify
codexa --version # → codexa, version 0.5.0
Step 2 — Initialize your target project
cd /path/to/your-project
codexa init --index # Creates .codexa/ and indexes immediately
# Or separately:
codexa init # Creates .codexa/ directory
codexa index . # Index the entire codebase
codexa doctor # Verify everything is healthy
codexa search "main" # Quick sanity check
Step 3 — Add Copilot Custom Instructions (System Prompt)
Create the file .github/copilot-instructions.md in your project root.
This file acts as a system prompt — GitHub Copilot reads it automatically
and follows the instructions in every chat and code generation session.
mkdir -p .github
Then create .github/copilot-instructions.md with this content:
# Copilot Custom Instructions
## CodexA Integration
This project uses **CodexA** — a local developer intelligence engine.
You have access to the `codexa` CLI for semantic code search, symbol
explanation, dependency analysis, and more.
### Available Commands
Before answering questions about this codebase, use CodexA to gather context:
- **Search the codebase:**
```bash
codexa search "<natural language query>" --json
```
- **Explain a symbol (function/class/method):**
```bash
codexa tool run explain_symbol --arg symbol_name="<name>" --json
```
- **Get the call graph of a function:**
```bash
codexa tool run get_call_graph --arg symbol_name="<name>" --json
```
- **Get file dependencies/imports:**
```bash
codexa tool run get_dependencies --arg file_path="<path>" --json
```
- **Find all references to a symbol:**
```bash
codexa tool run find_references --arg symbol_name="<name>" --json
```
- **Get rich context for a symbol:**
```bash
codexa tool run get_context --arg symbol_name="<name>" --json
```
- **Summarize the entire repo:**
```bash
codexa tool run summarize_repo --json
```
- **Explain all symbols in a file:**
```bash
codexa tool run explain_file --arg file_path="<path>" --json
```
### Rules
1. Always use `--json` flag for machine-readable output.
2. When asked about code structure, search with `codexa search` first.
3. When explaining a function or class, use `codexa tool run explain_symbol`.
4. When analyzing impact of changes, use `codexa impact`.
5. When reviewing code, run `codexa quality <path>` first.
6. Prefer CodexA tools over reading large files manually — they provide
structured, indexed results.
Step 4 — Configure Copilot Chat to use CodexA
In VS Code, open Settings (Ctrl+,) and search for:
| Setting | Value | Purpose |
|---|---|---|
github.copilot.chat.codeGeneration.instructions |
Add .github/copilot-instructions.md |
Auto-loads custom instructions |
chat.agent.enabled |
true |
Enables agent mode in Copilot Chat |
Or add this to your .vscode/settings.json:
{
"github.copilot.chat.codeGeneration.instructions": [
{ "file": ".github/copilot-instructions.md" }
]
}
Step 5 — Use Copilot Chat with CodexA
Open Copilot Chat in VS Code (Ctrl+Shift+I or the chat panel) and switch to Agent mode (the dropdown at the top). Now Copilot can run terminal commands and will automatically use CodexA per your instructions.
Example conversations:
You: What does the
process_paymentfunction do and what calls it?Copilot runs:
codexa tool run explain_symbol --arg symbol_name="process_payment" --json codexa tool run get_call_graph --arg symbol_name="process_payment" --jsonThen gives you a structured answer with callers, callees, and explanation.
You: Find all code related to authentication
Copilot runs:
codexa search "authentication" --jsonReturns ranked semantic search results across your entire codebase.
You: What would break if I change
UserService?Copilot runs:
codexa tool run find_references --arg symbol_name="UserService" --json codexa impactShows blast radius and all dependents.
You: Review the code quality of src/api/
Copilot runs:
codexa quality src/api/ --jsonReturns complexity scores, dead code, duplicates, and security issues.
Step 6 — Start the Bridge Server (optional, for MCP)
For persistent connections (MCP servers, custom agent frameworks):
codexa serve --port 24842
The agent can then call http://127.0.0.1:24842/tools/invoke directly.
Step 7 — Configure LLM provider (optional)
For AI-powered commands (codexa ask, codexa review, codexa chat, etc.),
edit .codexa/config.json:
{
"llm": {
"provider": "openai",
"model": "gpt-4",
"api_key": "sk-...",
"temperature": 0.2,
"max_tokens": 2048
}
}
Supported providers: openai, ollama (local), mock (testing).
All CLI Commands
CodexA provides 39 commands (plus subcommands) organized by capability:
Core
| Command | Description |
|---|---|
codexa init [path] |
Initialize project — creates .codexa/ directory (supports --index and --vscode) |
codexa index [path] |
Index codebase for semantic search |
codexa search "<query>" |
Natural-language semantic search |
codexa explain <symbol> |
Structural explanation of a symbol or file |
codexa context <symbol> |
Rich context window for AI consumption |
codexa summary |
Structured repository summary |
codexa deps <file> |
File/project dependency map |
codexa watch |
Background indexing daemon (Rust-backed native file watcher) |
codexa grep "<pattern>" |
Raw filesystem grep — no index required (ripgrep backend) |
codexa benchmark |
Performance benchmarking (indexing, search, memory) |
codexa languages |
List supported tree-sitter languages with grammar status |
AI-Powered
| Command | Description |
|---|---|
codexa ask "<question>" |
Ask a question about the codebase (LLM) |
codexa review <file> |
AI-powered code review |
codexa refactor <file> |
AI-powered refactoring suggestions |
codexa suggest <symbol> |
Intelligent improvement suggestions |
codexa chat |
Multi-turn conversation with session persistence |
codexa investigate <goal> |
Autonomous multi-step code investigation |
Quality & Metrics
| Command | Description |
|---|---|
codexa quality [path] |
Code quality analysis |
codexa metrics |
Code metrics, snapshots, and trends |
codexa hotspots |
Identify high-risk code hotspots |
codexa gate |
Enforce quality gates for CI pipelines |
codexa impact |
Blast radius analysis of code changes |
DevOps & Integration
| Command | Description |
|---|---|
codexa serve |
Start HTTP bridge server for AI agents |
codexa tool list|run|schema |
AI Agent Tooling Protocol commands |
codexa pr-summary |
Generate PR intelligence report |
codexa ci-gen |
Generate CI workflow templates |
codexa web |
Start web interface and REST API |
codexa viz |
Generate Mermaid visualizations |
codexa evolve |
Self-improving development loop |
Workspace & Utilities
| Command | Description |
|---|---|
codexa workspace |
Multi-repo workspace management |
codexa cross-refactor |
Cross-repository refactoring |
codexa trace <symbol> |
Trace execution relationships |
codexa docs |
Generate project documentation |
codexa doctor |
Environment health check |
codexa plugin list|scaffold|discover |
Plugin management |
codexa tui |
Interactive terminal REPL |
codexa mcp |
Start MCP (Model Context Protocol) server |
codexa models list|info|download|switch|profiles|benchmark |
Manage and benchmark embedding models |
VS Code Extension
| Feature | Command / Keybinding |
|---|---|
| Multi-mode search panel (semantic/keyword/hybrid/regex) | Sidebar → Search |
| Symbol explorer (explain, call graph, deps) | Sidebar → Symbols & Graphs |
| Code quality dashboard (quality, metrics, hotspots) | Sidebar → Quality |
| Agent tool runner (doctor, index, models, 13 tools) | Sidebar → Tools |
| Search codebase | Ctrl+Shift+F5 |
| Explain symbol at cursor | Ctrl+Shift+E |
| Code quality analysis | Ctrl+Shift+Q |
| Right-click → Explain / Call Graph | Editor context menu |
Built-in Tools (AI Agent Protocol)
These tools can be invoked via CLI (codexa tool run), HTTP (POST /tools/invoke),
or Python API (ToolExecutor.execute()):
| Tool | Arguments | Description |
|---|---|---|
semantic_search |
query (string) |
Search codebase by natural language |
explain_symbol |
symbol_name (string) |
Structural explanation of a symbol |
explain_file |
file_path (string) |
Explain all symbols in a file |
summarize_repo |
(none) | Full repository summary |
find_references |
symbol_name (string) |
Find all references to a symbol |
get_dependencies |
file_path (string) |
Import / dependency map for a file |
get_call_graph |
symbol_name (string) |
Call graph — callers and callees |
get_context |
symbol_name (string) |
Rich context window for AI tasks |
get_file_context |
file_path, line or symbol_name |
Full-section surrounding code retrieval |
get_quality_score |
file_path (string, optional) |
Code quality analysis — complexity, dead code, duplicates |
find_duplicates |
threshold (float, optional) |
Detect near-duplicate code blocks |
grep_files |
pattern (string) |
Raw filesystem regex search (ripgrep/Python) |
list_languages |
(none) | List supported tree-sitter languages and grammar status |
Additional tools can be registered via the plugin system using the
REGISTER_TOOL hook.
Architecture
┌─────────────────────────────────────────────────────┐
│ CLI Layer (click) │
│ 39 commands · --json · --pipe · --verbose │
├─────────────────────────────────────────────────────┤
│ AI Agent Tooling Protocol │
│ ToolExecutor · ToolInvocation · ToolExecutionResult │
├─────────────────────────────────────────────────────┤
│ Bridge Server (HTTP) │
│ /tools/invoke · /tools/list · /request · SSE stream │
├──────────────┬──────────────┬───────────────────────┤
│ Parsing │ Embedding │ Search │
│ tree-sitter │ sent-trans │ FAISS / Rust HNSW │
├──────────────┼──────────────┴───────────────────────┤
│ Rust Engine │ codexa-core (PyO3) │
│ (optional) │ HNSW · BM25 · AST chunk · mmap · RRF│
├──────────────┼──────────────────────────────────────┤
│ RAG Pipeline │ Retrieve → Dedup → Re-rank → Assemble│
├──────────────┼──────────────────────────────────────┤
│ Evolution │ Self-improving dev loop │
│ engine │ budget · task · patch · test · commit│
├──────────────┴──────────────────────────────────────┤
│ Plugin System (22 hooks) │
├─────────────────────────────────────────────────────┤
│ Storage (.codexa/ — config, index, cache) │
└─────────────────────────────────────────────────────┘
Configuration
After codexa init, your project has .codexa/config.json:
{
"embedding": {
"model_name": "all-MiniLM-L6-v2",
"chunk_size": 512,
"chunk_overlap": 64
},
"search": {
"top_k": 10,
"similarity_threshold": 0.3
},
"index": {
"use_incremental": true,
"extensions": [".py", ".js", ".ts", ".java", ".go", ".rs", ".rb", ".cpp", ".cs"]
},
"llm": {
"provider": "mock",
"model": "",
"api_key": "",
"temperature": 0.2,
"max_tokens": 2048
}
}
Tip: Instead of editing
model_namemanually, usecodexa init --profile fast|balanced|preciseor runcodexa models profilesto see recommended models for your hardware.
Documentation
CodexA ships with a full VitePress documentation site.
# Install docs dependencies
npm install
# Serve locally (live-reload)
npm run docs:dev
# Build static site
npm run docs:build
# Preview the build
npm run docs:preview
Browse the docs at http://localhost:5173 after running npm run docs:dev.
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run all 2657 tests
pytest
# Run with coverage (gate: 70% minimum)
pytest --cov=semantic_code_intelligence
# Run mypy strict type checking
mypy semantic_code_intelligence --exclude "tests/"
# Run specific phase tests
pytest semantic_code_intelligence/tests/test_phase23.py -v
# Run with verbose output
codexa --verbose search "query"
Tech Stack
- Python 3.11+ — No heavy frameworks, stdlib-first design
- Rust (codexa-core) — Native search engine via PyO3 — HNSW (instant-distance), BM25, tree-sitter AST chunking, mmap persistence, parallel scanning (rayon)
- click — CLI framework
- sentence-transformers — Embedding generation (
all-MiniLM-L6-v2) - faiss-cpu — Vector similarity search (with Rust HNSW acceleration)
- tree-sitter — Multi-language code parsing (Python + Rust)
- watchfiles — Rust-backed native file watching (inotify/FSEvents/ReadDirectoryChanges)
- pydantic — Configuration & data models
- rich — Terminal UI and formatting
License
MIT — see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codexa-0.5.0.tar.gz.
File metadata
- Download URL: codexa-0.5.0.tar.gz
- Upload date:
- Size: 383.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
addbc4785f2cb307097d46b1b5f3005de27836ac315a870ebe58bf2c3f0d57a6
|
|
| MD5 |
926fe362bd2861959d1fbe3a8720c561
|
|
| BLAKE2b-256 |
03208b8daa90620b5e0cbb3bf3dea39cfa6853a361f6ab739d27873c44bc1610
|
File details
Details for the file codexa-0.5.0-py3-none-any.whl.
File metadata
- Download URL: codexa-0.5.0-py3-none-any.whl
- Upload date:
- Size: 490.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19e4e840df4a8930ad6ba1041b5db9624c0331d8bbee07da5f125bec11793f3e
|
|
| MD5 |
b4318aa8174e903985575fe6e01caf54
|
|
| BLAKE2b-256 |
ab2601ec5ad887c358166164123e9332767bb19e6dced0034385b0702a8f3d95
|