Skip to main content

No project description provided

Project description

ctxai

A semantic code search engine that transforms your codebase into intelligent embeddings for fast, context-aware code retrieval. ctxai uses natural language processing to find code snippets, documentation, and examples through both CLI and MCP Server interfaces.

Available as both an MCP Server and CLI tool, ctxai integrates seamlessly with multi-agent systems and orchestration frameworks, allowing agents to discover relevant code through semantic queries.

TLDR; Intelligent semantic search across your entire codebase

Transform your code into searchable embeddings with advanced chunking and vector database indexing

Quick Start

# 1. Install ctxai
pip install ctxai

# 2. Index your codebase (uses local embeddings by default - no API key needed!)
ctxai index /path/to/your/project "my-project"

# 3. Query your codebase using natural language
ctxai query my-project "Find authentication functions"

# 4. (Optional) Start the web dashboard for interactive exploration
pip install ctxai[dashboard]  # Install FastHTML first
ctxai dashboard  # Open http://localhost:3000

# 5. (Optional) Configure to use OpenAI embeddings for better results
# Edit .ctxai/config.json in your project:
{
  "embedding": {
    "provider": "openai",
    "model": "text-embedding-3-small"
  }
}
# Then set: export OPENAI_API_KEY=your-api-key-here

Features

  • Multiple Embedding Providers: Choose between local (default), OpenAI, or HuggingFace embeddings
  • No API Key Required: Uses local sentence-transformers by default - works offline!
  • MCP Server Integration: Works with any agent that supports MCP protocol (coming soon)
  • Smart Code Search: Converts your code into searchable vectors using AI
  • Natural Language Queries: Find code by describing what you want, not just keywords
  • CLI and Agent Ready: Use from command line or integrate with AI agents
  • Fast Indexing: Quickly processes large codebases with size limits and validation
  • Configurable: Customize embedding providers, chunk sizes, and project limits

Usage

help command

index command

index output

Prerequisites

No API key needed for default local embeddings!

For OpenAI embeddings (optional, better quality):

export OPENAI_API_KEY=your-api-key-here

Or configure in .ctxai/config.json:

{
  "embedding": {
    "provider": "openai",
    "api_key": "your-api-key-here"
  }
}

Indexing Your Codebase

Index your project to enable semantic search:

# Basic usage
ctxai index /path/to/codebase "index_name"

# With Python module
python -m ctxai index /path/to/codebase "index_name"

# Include only specific file patterns
ctxai index /path/to/codebase "my-index" --include "*.py" --include "*.js"

# Exclude additional patterns beyond .gitignore
ctxai index /path/to/codebase "my-index" --exclude "*.test.js" --exclude "migrations/*"

# Don't follow .gitignore
ctxai index /path/to/codebase "my-index" --no-follow-gitignore

The indexing process will:

  1. Traverse your codebase recursively (respecting .gitignore by default)
  2. Parse code using tree-sitter for semantic understanding
  3. Chunk code intelligently (functions, classes, etc.)
  4. Generate embeddings using OpenAI's embedding API
  5. Store in a local ChromaDB vector database (.ctxai directory)

CLI Commands

View all available commands:

ctxai --help

Available commands:

  • index - Index a codebase for semantic search
  • query - Query an indexed codebase using natural language
  • dashboard - Start the web dashboard for browsing and querying
  • server - Start the MCP server for AI agents

Querying Your Codebase

Once you've indexed a codebase, you can query it using natural language:

# Basic query
ctxai query my-project "Find authentication functions"

# Limit number of results
ctxai query my-project "How to connect to database" --n-results 3

# Show only metadata (no code content)
ctxai query my-project "Find error handling code" --no-content

The query command will:

  1. Generate an embedding for your query
  2. Search the vector database for similar code
  3. Display results with:
    • File paths and line numbers
    • Chunk types (function, class, etc.)
    • Similarity scores
    • Syntax-highlighted code previews

Web Dashboard

Start the interactive web dashboard to manage your indexes:

# Start dashboard (default port 3000)
ctxai dashboard

# Use custom port
ctxai dashboard --port 8080

The dashboard provides:

  • ๐Ÿ“Š View all indexes with statistics (chunk count, size, timestamps)
  • ๐Ÿ” Query interface with natural language search
  • ๐Ÿ“„ Browse all chunks with metadata
  • โš™๏ธ View configuration and CTXAI_HOME settings
  • ๐ŸŽจ Beautiful, dark-themed UI

Open your browser to http://localhost:3000 to access the dashboard.

Note: Dashboard requires FastHTML. Install it with:

pip install ctxai[dashboard]
# Or install all optional dependencies
pip install ctxai[all]

MCP Server for AI Agents

Start the MCP server to expose ctxai functionality to AI agents like Claude:

# Start MCP server
ctxai server

# With custom project path
ctxai server --project-path /path/to/project

The MCP server provides tools for LLMs to:

  • ๐Ÿ“‹ List available indexes
  • ๐Ÿ“Š Index new codebases
  • ๐Ÿ” Query code with natural language
  • ๐Ÿ“ˆ Get index statistics

Claude Desktop Configuration:

Add to your Claude Desktop config file:

{
  "mcpServers": {
    "ctxai": {
      "command": "ctxai",
      "args": ["server"]
    }
  }
}

Then you can ask Claude:

  • "List all available code indexes"
  • "Index my project at /path/to/project"
  • "Search the project index for authentication code"

Note: MCP server requires the MCP package. Install it with:

pip install ctxai[mcp]
# Or install all optional dependencies
pip install ctxai[all]

See docs/MCP_SERVER.md for complete documentation.

Configuration

ctxai stores configuration in .ctxai/config.json. By default, this is in your project directory, but you can customize the location using the CTXAI_HOME environment variable.

CTXAI_HOME Environment Variable

Control where ctxai stores its configuration and indexes:

# Use a global .ctxai directory (shared across all projects)
export CTXAI_HOME=~/.ctxai

# Or use a custom location
export CTXAI_HOME=/path/to/my/.ctxai

# Default (no env var): uses project_directory/.ctxai

Benefits of CTXAI_HOME:

  • ๐ŸŒ Share configuration across multiple projects
  • ๐Ÿ“ฆ Centralize all indexes in one location
  • ๐Ÿ”ง Easier backup and management
  • ๐Ÿš€ Consistent settings everywhere

Priority:

  1. CTXAI_HOME environment variable (if set)
  2. Project directory .ctxai (default)

Embedding Providers

Local (Default - No API Key Required)

{
  "embedding": {
    "provider": "local",
    "model": "all-MiniLM-L6-v2"
  }
}

OpenAI (Better Quality)

{
  "embedding": {
    "provider": "openai",
    "model": "text-embedding-3-small",
    "api_key": "sk-..."
  }
}

HuggingFace

{
  "embedding": {
    "provider": "huggingface",
    "model": "sentence-transformers/all-MiniLM-L6-v2",
    "api_key": "hf_..."
  }
}

Project Size Limits

Prevent indexing overly large projects:

{
  "indexing": {
    "max_files": 10000,
    "max_total_size_mb": 500,
    "max_file_size_mb": 5,
    "chunk_size": 1000,
    "chunk_overlap": 100
  }
}

These limits help:

  • Prevent accidentally indexing huge projects
  • Control embedding costs (for cloud providers)
  • Ensure reasonable performance

MCP Server Configuration

Configure the MCP server by creating an mcp.json file:

{
  "inputs": [],
  "servers": {
    "ctxai": {
      "command": "python",
      "args": ["-m", "ctxai.server", "--index", "index_name"]
    }
  }
}

Querying with GitHub Copilot

Use natural language queries through GitHub Copilot's Agent mode:

@ctxai find code for updating profile images

Installation

Pre-requisites:

  • Python 3.10+
  • (Optional) OpenAI API key for better embeddings - local embeddings work without it!
# Basic installation (includes local embeddings)
pip install ctxai

# With OpenAI support
pip install ctxai[openai]

# With HuggingFace support  
pip install ctxai[huggingface]

# With all providers
pip install ctxai[all]

# OR using uv
uv pip install ctxai

# OR run directly with uvx
uvx ctxai

First Time Setup

On first run, ctxai creates a .ctxai/config.json file with default settings:

{
  "version": "1.0",
  "embedding": {
    "provider": "local",
    "model": null,
    "api_key": null,
    "batch_size": 100,
    "max_tokens": null
  },
  "indexing": {
    "max_files": 10000,
    "max_total_size_mb": 500,
    "max_file_size_mb": 5,
    "chunk_size": 1000,
    "chunk_overlap": 100
  }
}

You can edit this file to customize embedding providers and project limits.

Running

# Run with uv
uv run ctxai index /path/to/codebase "index-name"

# Or install and run directly
pip install ctxai
ctxai --help

Architecture

ctxai uses a multi-stage pipeline to transform your codebase into searchable vectors:

  1. Traversal: Recursively walks through your codebase, respecting .gitignore patterns and custom include/exclude rules
  2. Parsing: Uses tree-sitter to parse code and understand its structure (functions, classes, methods, etc.)
  3. Chunking: Intelligently splits code into semantic chunks while preserving context and meaning
  4. Embedding: Generates vector embeddings using OpenAI's embedding API
  5. Storage: Stores embeddings in a local ChromaDB vector database (in .ctxai directory)

Components

  • traversal.py: File system traversal with gitignore support
  • chunking.py: Tree-sitter based intelligent code chunking
  • embeddings.py: OpenAI embedding generation
  • vector_store.py: ChromaDB vector database management
  • commands/index_command.py: Orchestrates the indexing pipeline

Storage

Indexed codebases are stored locally in the .ctxai/indexes/<index-name> directory within your project. This directory contains:

  • ChromaDB vector database
  • Chunk metadata and embeddings
  • Index configuration

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/vs4vijay/ctxai.git
cd ctxai

# Install dependencies with uv
uv sync

# Or with pip
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest

# Run specific test file
pytest tests/test_indexing.py

# Run with coverage
pytest --cov=ctxai

Code Quality

# Run linter
ruff check src/

# Format code
ruff format src/

# Type checking (if mypy is added)
mypy src/


uv version --bump patch

Project Structure

ctxai/
โ”œโ”€โ”€ src/ctxai/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ __main__.py
โ”‚   โ”œโ”€โ”€ app.py              # Typer CLI app
โ”‚   โ”œโ”€โ”€ chunking.py         # Code chunking logic
โ”‚   โ”œโ”€โ”€ embeddings.py       # Embedding generation
โ”‚   โ”œโ”€โ”€ traversal.py        # File system traversal
โ”‚   โ”œโ”€โ”€ vector_store.py     # Vector DB management
โ”‚   โ”œโ”€โ”€ server.py           # MCP server (coming soon)
โ”‚   โ””โ”€โ”€ commands/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ””โ”€โ”€ index_command.py
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ test_server.py
โ”‚   โ””โ”€โ”€ test_indexing.py
โ”œโ”€โ”€ examples/
โ”‚   โ””โ”€โ”€ example_usage.py
โ”œโ”€โ”€ pyproject.toml
โ””โ”€โ”€ README.md

Releasing

  • Bump version in pyproject.toml and push to main
  • create a new release with tags pattern vx.y.z e.g. v0.0.1
  • It would create a release on github and start a github action which would publish on pypi

Troubleshooting

Embedding Provider Issues

Local embeddings (default)

  • First run downloads the model (~80MB) - this is normal
  • No internet required after first download
  • Slower than cloud APIs but free and private

OpenAI API Key Error

If you configured OpenAI but get an API key error:

export OPENAI_API_KEY=your-api-key-here  # Linux/Mac
set OPENAI_API_KEY=your-api-key-here     # Windows CMD
$env:OPENAI_API_KEY="your-api-key-here"  # Windows PowerShell

Or add to .ctxai/config.json:

{
  "embedding": {
    "provider": "openai",
    "api_key": "sk-..."
  }
}

Switching Providers

Edit .ctxai/config.json to change providers:

{
  "embedding": {
    "provider": "local"  // or "openai", "huggingface"
  }
}

Project Size Errors

If you get "project too large" errors:

  1. Use include patterns to filter files:

    ctxai index ./project "index" --include "*.py" --include "*.js"
    
  2. Increase limits in .ctxai/config.json:

    {
      "indexing": {
        "max_files": 20000,
        "max_total_size_mb": 1000
      }
    }
    
  3. Exclude large directories:

    ctxai index ./project "index" --exclude "node_modules/*" --exclude "dist/*"
    

No Files Found to Index

If the indexing process finds no files:

  • Check your include/exclude patterns
  • Verify the path is correct
  • Use --no-follow-gitignore if files are being ignored
  • Check that files are not binary

Tree-sitter Parse Errors

If you see warnings about parsing errors:

  • These are usually non-critical
  • The tool will fall back to simple text chunking
  • Only affects the semantic understanding, not the search capability

Memory Issues with Large Codebases

For very large codebases:

  • Index in smaller batches using include patterns
  • Reduce max_chunk_size in the chunker
  • Monitor the .ctxai directory size

Contributing

We welcome all contributions to the project! Before submitting your pull request, please ensure you have run the tests and linters locally. This helps us maintain the quality of the project and makes the review process faster for everyone.

All contributions should adhere to the project's code of conduct. Let's work together to create a welcoming and inclusive environment for everyone.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctxai-0.0.2.tar.gz (750.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ctxai-0.0.2-py3-none-any.whl (42.5 kB view details)

Uploaded Python 3

File details

Details for the file ctxai-0.0.2.tar.gz.

File metadata

  • Download URL: ctxai-0.0.2.tar.gz
  • Upload date:
  • Size: 750.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ctxai-0.0.2.tar.gz
Algorithm Hash digest
SHA256 7d0e9401a92ec2da68b691a185239dc9028a685b9dd5c8cf9d2315af77c820eb
MD5 0b1f0d3321e8fe5aa83cf05a134961fe
BLAKE2b-256 13b8a6885a2ea34c6ade37314568ccaacb0cd3d00da682d51154067becf46a82

See more details on using hashes here.

Provenance

The following attestation bundles were made for ctxai-0.0.2.tar.gz:

Publisher: release.yml on vs4vijay/ctxai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ctxai-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: ctxai-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 42.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ctxai-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a21b4ca07e17559f2e537e2b5fdb86e6538adaad45ec62791349339281af234f
MD5 c7e22472de940092109b59a58e60ceeb
BLAKE2b-256 45f7a131c12457e4bd253212269e0b40429bf2a30353b153915d0305c59bb6c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for ctxai-0.0.2-py3-none-any.whl:

Publisher: release.yml on vs4vijay/ctxai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page