No project description provided
Project description
ctxai
A semantic code search engine that transforms your codebase into intelligent embeddings for fast, context-aware code retrieval. ctxai uses natural language processing to find code snippets, documentation, and examples through both CLI and MCP Server interfaces.
Available as both an MCP Server and CLI tool, ctxai integrates seamlessly with multi-agent systems and orchestration frameworks, allowing agents to discover relevant code through semantic queries.
TLDR; Intelligent semantic search across your entire codebase
Transform your code into searchable embeddings with advanced chunking and vector database indexing
Quick Start
# 1. Install ctxai
pip install ctxai
# 2. Index your codebase (uses local embeddings by default - no API key needed!)
ctxai index /path/to/your/project "my-project"
# 3. Query your codebase using natural language
ctxai query my-project "Find authentication functions"
# 4. (Optional) Start the web dashboard for interactive exploration
pip install ctxai[dashboard] # Install FastHTML first
ctxai dashboard # Open http://localhost:3000
# 5. (Optional) Configure to use OpenAI embeddings for better results
# Edit .ctxai/config.json in your project:
{
"embedding": {
"provider": "openai",
"model": "text-embedding-3-small"
}
}
# Then set: export OPENAI_API_KEY=your-api-key-here
Features
- Multiple Embedding Providers: Choose between local (default), OpenAI, or HuggingFace embeddings
- No API Key Required: Uses local sentence-transformers by default - works offline!
- MCP Server Integration: Works with any agent that supports MCP protocol (coming soon)
- Smart Code Search: Converts your code into searchable vectors using AI
- Natural Language Queries: Find code by describing what you want, not just keywords
- CLI and Agent Ready: Use from command line or integrate with AI agents
- Fast Indexing: Quickly processes large codebases with size limits and validation
- Configurable: Customize embedding providers, chunk sizes, and project limits
Usage
Prerequisites
No API key needed for default local embeddings!
For OpenAI embeddings (optional, better quality):
export OPENAI_API_KEY=your-api-key-here
Or configure in .ctxai/config.json:
{
"embedding": {
"provider": "openai",
"api_key": "your-api-key-here"
}
}
Indexing Your Codebase
Index your project to enable semantic search:
# Basic usage
ctxai index /path/to/codebase "index_name"
# With Python module
python -m ctxai index /path/to/codebase "index_name"
# Include only specific file patterns
ctxai index /path/to/codebase "my-index" --include "*.py" --include "*.js"
# Exclude additional patterns beyond .gitignore
ctxai index /path/to/codebase "my-index" --exclude "*.test.js" --exclude "migrations/*"
# Don't follow .gitignore
ctxai index /path/to/codebase "my-index" --no-follow-gitignore
The indexing process will:
- Traverse your codebase recursively (respecting .gitignore by default)
- Parse code using tree-sitter for semantic understanding
- Chunk code intelligently (functions, classes, etc.)
- Generate embeddings using OpenAI's embedding API
- Store in a local ChromaDB vector database (
.ctxaidirectory)
CLI Commands
View all available commands:
ctxai --help
Available commands:
index- Index a codebase for semantic searchquery- Query an indexed codebase using natural languagedashboard- Start the web dashboard for browsing and queryingserver- Start the MCP server for AI agents
Querying Your Codebase
Once you've indexed a codebase, you can query it using natural language:
# Basic query
ctxai query my-project "Find authentication functions"
# Limit number of results
ctxai query my-project "How to connect to database" --n-results 3
# Show only metadata (no code content)
ctxai query my-project "Find error handling code" --no-content
The query command will:
- Generate an embedding for your query
- Search the vector database for similar code
- Display results with:
- File paths and line numbers
- Chunk types (function, class, etc.)
- Similarity scores
- Syntax-highlighted code previews
Web Dashboard
Start the interactive web dashboard to manage your indexes:
# Start dashboard (default port 3000)
ctxai dashboard
# Use custom port
ctxai dashboard --port 8080
The dashboard provides:
- ๐ View all indexes with statistics (chunk count, size, timestamps)
- ๐ Query interface with natural language search
- ๐ Browse all chunks with metadata
- โ๏ธ View configuration and CTXAI_HOME settings
- ๐จ Beautiful, dark-themed UI
Open your browser to http://localhost:3000 to access the dashboard.
Note: Dashboard requires FastHTML. Install it with:
pip install ctxai[dashboard]
# Or install all optional dependencies
pip install ctxai[all]
MCP Server for AI Agents
Start the MCP server to expose ctxai functionality to AI agents like Claude:
# Start MCP server
ctxai server
# With custom project path
ctxai server --project-path /path/to/project
The MCP server provides tools for LLMs to:
- ๐ List available indexes
- ๐ Index new codebases
- ๐ Query code with natural language
- ๐ Get index statistics
Claude Desktop Configuration:
Add to your Claude Desktop config file:
{
"mcpServers": {
"ctxai": {
"command": "ctxai",
"args": ["server"]
}
}
}
Then you can ask Claude:
- "List all available code indexes"
- "Index my project at /path/to/project"
- "Search the project index for authentication code"
Note: MCP server requires the MCP package. Install it with:
pip install ctxai[mcp]
# Or install all optional dependencies
pip install ctxai[all]
See docs/MCP_SERVER.md for complete documentation.
Configuration
ctxai stores configuration in .ctxai/config.json. By default, this is in your project directory, but you can customize the location using the CTXAI_HOME environment variable.
CTXAI_HOME Environment Variable
Control where ctxai stores its configuration and indexes:
# Use a global .ctxai directory (shared across all projects)
export CTXAI_HOME=~/.ctxai
# Or use a custom location
export CTXAI_HOME=/path/to/my/.ctxai
# Default (no env var): uses project_directory/.ctxai
Benefits of CTXAI_HOME:
- ๐ Share configuration across multiple projects
- ๐ฆ Centralize all indexes in one location
- ๐ง Easier backup and management
- ๐ Consistent settings everywhere
Priority:
CTXAI_HOMEenvironment variable (if set)- Project directory
.ctxai(default)
Embedding Providers
Local (Default - No API Key Required)
{
"embedding": {
"provider": "local",
"model": "all-MiniLM-L6-v2"
}
}
OpenAI (Better Quality)
{
"embedding": {
"provider": "openai",
"model": "text-embedding-3-small",
"api_key": "sk-..."
}
}
HuggingFace
{
"embedding": {
"provider": "huggingface",
"model": "sentence-transformers/all-MiniLM-L6-v2",
"api_key": "hf_..."
}
}
Project Size Limits
Prevent indexing overly large projects:
{
"indexing": {
"max_files": 10000,
"max_total_size_mb": 500,
"max_file_size_mb": 5,
"chunk_size": 1000,
"chunk_overlap": 100
}
}
These limits help:
- Prevent accidentally indexing huge projects
- Control embedding costs (for cloud providers)
- Ensure reasonable performance
MCP Server Configuration
Configure the MCP server by creating an mcp.json file:
{
"inputs": [],
"servers": {
"ctxai": {
"command": "python",
"args": ["-m", "ctxai.server", "--index", "index_name"]
}
}
}
Querying with GitHub Copilot
Use natural language queries through GitHub Copilot's Agent mode:
@ctxai find code for updating profile images
Installation
Pre-requisites:
- Python 3.10+
- (Optional) OpenAI API key for better embeddings - local embeddings work without it!
# Basic installation (includes local embeddings)
pip install ctxai
# With OpenAI support
pip install ctxai[openai]
# With HuggingFace support
pip install ctxai[huggingface]
# With all providers
pip install ctxai[all]
# OR using uv
uv pip install ctxai
# OR run directly with uvx
uvx ctxai
First Time Setup
On first run, ctxai creates a .ctxai/config.json file with default settings:
{
"version": "1.0",
"embedding": {
"provider": "local",
"model": null,
"api_key": null,
"batch_size": 100,
"max_tokens": null
},
"indexing": {
"max_files": 10000,
"max_total_size_mb": 500,
"max_file_size_mb": 5,
"chunk_size": 1000,
"chunk_overlap": 100
}
}
You can edit this file to customize embedding providers and project limits.
Running
# Run with uv
uv run ctxai index /path/to/codebase "index-name"
# Or install and run directly
pip install ctxai
ctxai --help
Architecture
ctxai uses a multi-stage pipeline to transform your codebase into searchable vectors:
- Traversal: Recursively walks through your codebase, respecting
.gitignorepatterns and custom include/exclude rules - Parsing: Uses tree-sitter to parse code and understand its structure (functions, classes, methods, etc.)
- Chunking: Intelligently splits code into semantic chunks while preserving context and meaning
- Embedding: Generates vector embeddings using OpenAI's embedding API
- Storage: Stores embeddings in a local ChromaDB vector database (in
.ctxaidirectory)
Components
traversal.py: File system traversal with gitignore supportchunking.py: Tree-sitter based intelligent code chunkingembeddings.py: OpenAI embedding generationvector_store.py: ChromaDB vector database managementcommands/index_command.py: Orchestrates the indexing pipeline
Storage
Indexed codebases are stored locally in the .ctxai/indexes/<index-name> directory within your project. This directory contains:
- ChromaDB vector database
- Chunk metadata and embeddings
- Index configuration
Development
Setup Development Environment
# Clone the repository
git clone https://github.com/vs4vijay/ctxai.git
cd ctxai
# Install dependencies with uv
uv sync
# Or with pip
pip install -e ".[dev]"
Running Tests
# Run all tests
pytest
# Run specific test file
pytest tests/test_indexing.py
# Run with coverage
pytest --cov=ctxai
Code Quality
# Run linter
ruff check src/
# Format code
ruff format src/
# Type checking (if mypy is added)
mypy src/
uv version --bump patch
Project Structure
ctxai/
โโโ src/ctxai/
โ โโโ __init__.py
โ โโโ __main__.py
โ โโโ app.py # Typer CLI app
โ โโโ chunking.py # Code chunking logic
โ โโโ embeddings.py # Embedding generation
โ โโโ traversal.py # File system traversal
โ โโโ vector_store.py # Vector DB management
โ โโโ server.py # MCP server (coming soon)
โ โโโ commands/
โ โโโ __init__.py
โ โโโ index_command.py
โโโ tests/
โ โโโ __init__.py
โ โโโ test_server.py
โ โโโ test_indexing.py
โโโ examples/
โ โโโ example_usage.py
โโโ pyproject.toml
โโโ README.md
Releasing
- Bump version in pyproject.toml and push to main
- create a new release with tags pattern
vx.y.ze.g. v0.0.1 - It would create a release on github and start a github action which would publish on pypi
Troubleshooting
Embedding Provider Issues
Local embeddings (default)
- First run downloads the model (~80MB) - this is normal
- No internet required after first download
- Slower than cloud APIs but free and private
OpenAI API Key Error
If you configured OpenAI but get an API key error:
export OPENAI_API_KEY=your-api-key-here # Linux/Mac
set OPENAI_API_KEY=your-api-key-here # Windows CMD
$env:OPENAI_API_KEY="your-api-key-here" # Windows PowerShell
Or add to .ctxai/config.json:
{
"embedding": {
"provider": "openai",
"api_key": "sk-..."
}
}
Switching Providers
Edit .ctxai/config.json to change providers:
{
"embedding": {
"provider": "local" // or "openai", "huggingface"
}
}
Project Size Errors
If you get "project too large" errors:
-
Use include patterns to filter files:
ctxai index ./project "index" --include "*.py" --include "*.js"
-
Increase limits in
.ctxai/config.json:{ "indexing": { "max_files": 20000, "max_total_size_mb": 1000 } }
-
Exclude large directories:
ctxai index ./project "index" --exclude "node_modules/*" --exclude "dist/*"
No Files Found to Index
If the indexing process finds no files:
- Check your include/exclude patterns
- Verify the path is correct
- Use
--no-follow-gitignoreif files are being ignored - Check that files are not binary
Tree-sitter Parse Errors
If you see warnings about parsing errors:
- These are usually non-critical
- The tool will fall back to simple text chunking
- Only affects the semantic understanding, not the search capability
Memory Issues with Large Codebases
For very large codebases:
- Index in smaller batches using include patterns
- Reduce
max_chunk_sizein the chunker - Monitor the
.ctxaidirectory size
Contributing
We welcome all contributions to the project! Before submitting your pull request, please ensure you have run the tests and linters locally. This helps us maintain the quality of the project and makes the review process faster for everyone.
All contributions should adhere to the project's code of conduct. Let's work together to create a welcoming and inclusive environment for everyone.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ctxai-0.0.2.tar.gz.
File metadata
- Download URL: ctxai-0.0.2.tar.gz
- Upload date:
- Size: 750.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d0e9401a92ec2da68b691a185239dc9028a685b9dd5c8cf9d2315af77c820eb
|
|
| MD5 |
0b1f0d3321e8fe5aa83cf05a134961fe
|
|
| BLAKE2b-256 |
13b8a6885a2ea34c6ade37314568ccaacb0cd3d00da682d51154067becf46a82
|
Provenance
The following attestation bundles were made for ctxai-0.0.2.tar.gz:
Publisher:
release.yml on vs4vijay/ctxai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ctxai-0.0.2.tar.gz -
Subject digest:
7d0e9401a92ec2da68b691a185239dc9028a685b9dd5c8cf9d2315af77c820eb - Sigstore transparency entry: 585680200
- Sigstore integration time:
-
Permalink:
vs4vijay/ctxai@3fc8479ce3fd6677bed21bbe78eef11dc7a956f8 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/vs4vijay
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3fc8479ce3fd6677bed21bbe78eef11dc7a956f8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ctxai-0.0.2-py3-none-any.whl.
File metadata
- Download URL: ctxai-0.0.2-py3-none-any.whl
- Upload date:
- Size: 42.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a21b4ca07e17559f2e537e2b5fdb86e6538adaad45ec62791349339281af234f
|
|
| MD5 |
c7e22472de940092109b59a58e60ceeb
|
|
| BLAKE2b-256 |
45f7a131c12457e4bd253212269e0b40429bf2a30353b153915d0305c59bb6c1
|
Provenance
The following attestation bundles were made for ctxai-0.0.2-py3-none-any.whl:
Publisher:
release.yml on vs4vijay/ctxai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ctxai-0.0.2-py3-none-any.whl -
Subject digest:
a21b4ca07e17559f2e537e2b5fdb86e6538adaad45ec62791349339281af234f - Sigstore transparency entry: 585680225
- Sigstore integration time:
-
Permalink:
vs4vijay/ctxai@3fc8479ce3fd6677bed21bbe78eef11dc7a956f8 -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/vs4vijay
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3fc8479ce3fd6677bed21bbe78eef11dc7a956f8 -
Trigger Event:
push
-
Statement type: