No project description provided

These details have not been verified by PyPI

Project description

ctxai

A semantic code search engine that transforms your codebase into intelligent embeddings for fast, context-aware code retrieval. ctxai uses natural language processing to find code snippets, documentation, and examples through both CLI and MCP Server interfaces.

Available as both an MCP Server and CLI tool, ctxai integrates seamlessly with multi-agent systems and orchestration frameworks, allowing agents to discover relevant code through semantic queries.

TLDR; Intelligent semantic search across your entire codebase

Transform your code into searchable embeddings with advanced chunking and vector database indexing

Quick Start

# 1. Install ctxai
pip install ctxai

# 2. Index your codebase (uses local embeddings by default - no API key needed!)
ctxai index /path/to/your/project "my-project"

# 3. Query your codebase using natural language
ctxai query my-project "Find authentication functions"

# 4. (Optional) Start the web dashboard for interactive exploration
pip install ctxai[dashboard]  # Install FastHTML first
ctxai dashboard  # Open http://localhost:3000

# 5. (Optional) Configure to use OpenAI embeddings for better results
# Edit .ctxai/config.json in your project:
{
  "embedding": {
    "provider": "openai",
    "model": "text-embedding-3-small"
  }
}
# Then set: export OPENAI_API_KEY=your-api-key-here

Features

Multiple Embedding Providers: Choose between local (default), OpenAI, or HuggingFace embeddings
No API Key Required: Uses local sentence-transformers by default - works offline!
MCP Server Integration: Works with any agent that supports MCP protocol (coming soon)
Smart Code Search: Converts your code into searchable vectors using AI
Natural Language Queries: Find code by describing what you want, not just keywords
CLI and Agent Ready: Use from command line or integrate with AI agents
Fast Indexing: Quickly processes large codebases with size limits and validation
Configurable: Customize embedding providers, chunk sizes, and project limits

Usage

help command

index command

index output

Prerequisites

No API key needed for default local embeddings!

For OpenAI embeddings (optional, better quality):

export OPENAI_API_KEY=your-api-key-here

Or configure in .ctxai/config.json:

{
  "embedding": {
    "provider": "openai",
    "api_key": "your-api-key-here"
  }
}

Indexing Your Codebase

Index your project to enable semantic search:

# Basic usage
ctxai index /path/to/codebase "index_name"

# With Python module
python -m ctxai index /path/to/codebase "index_name"

# Include only specific file patterns
ctxai index /path/to/codebase "my-index" --include "*.py" --include "*.js"

# Exclude additional patterns beyond .gitignore
ctxai index /path/to/codebase "my-index" --exclude "*.test.js" --exclude "migrations/*"

# Don't follow .gitignore
ctxai index /path/to/codebase "my-index" --no-follow-gitignore

The indexing process will:

Traverse your codebase recursively (respecting .gitignore by default)
Parse code using tree-sitter for semantic understanding
Chunk code intelligently (functions, classes, etc.)
Generate embeddings using OpenAI's embedding API
Store in a local ChromaDB vector database (.ctxai directory)

CLI Commands

View all available commands:

ctxai --help

Available commands:

index - Index a codebase for semantic search
query - Query an indexed codebase using natural language
dashboard - Start the web dashboard for browsing and querying
server - Start the MCP server for AI agents

Querying Your Codebase

Once you've indexed a codebase, you can query it using natural language:

# Basic query
ctxai query my-project "Find authentication functions"

# Limit number of results
ctxai query my-project "How to connect to database" --n-results 3

# Show only metadata (no code content)
ctxai query my-project "Find error handling code" --no-content

The query command will:

Generate an embedding for your query
Search the vector database for similar code
Display results with:
- File paths and line numbers
- Chunk types (function, class, etc.)
- Similarity scores
- Syntax-highlighted code previews

Web Dashboard

Start the interactive web dashboard to manage your indexes:

# Start dashboard (default port 3000)
ctxai dashboard

# Use custom port
ctxai dashboard --port 8080

The dashboard provides:

📊 View all indexes with statistics (chunk count, size, timestamps)
🔍 Query interface with natural language search
📄 Browse all chunks with metadata
⚙️ View configuration and CTXAI_HOME settings
🎨 Beautiful, dark-themed UI

Open your browser to http://localhost:3000 to access the dashboard.

Note: Dashboard requires FastHTML. Install it with:

pip install ctxai[dashboard]
# Or install all optional dependencies
pip install ctxai[all]

MCP Server for AI Agents

Start the MCP server to expose ctxai functionality to AI agents like Claude:

# Start MCP server
ctxai server

# With custom project path
ctxai server --project-path /path/to/project

The MCP server provides tools for LLMs to:

📋 List available indexes
📊 Index new codebases
🔍 Query code with natural language
📈 Get index statistics

Claude Desktop Configuration:

Add to your Claude Desktop config file:

{
  "mcpServers": {
    "ctxai": {
      "command": "ctxai",
      "args": ["server"]
    }
  }
}

Then you can ask Claude:

"List all available code indexes"
"Index my project at /path/to/project"
"Search the project index for authentication code"

Note: MCP server requires the MCP package. Install it with:

pip install ctxai[mcp]
# Or install all optional dependencies
pip install ctxai[all]

See docs/MCP_SERVER.md for complete documentation.

Configuration

ctxai stores configuration in .ctxai/config.json. By default, this is in your project directory, but you can customize the location using the CTXAI_HOME environment variable.

CTXAI_HOME Environment Variable

Control where ctxai stores its configuration and indexes:

# Use a global .ctxai directory (shared across all projects)
export CTXAI_HOME=~/.ctxai

# Or use a custom location
export CTXAI_HOME=/path/to/my/.ctxai

# Default (no env var): uses project_directory/.ctxai

Benefits of CTXAI_HOME:

🌍 Share configuration across multiple projects
📦 Centralize all indexes in one location
🔧 Easier backup and management
🚀 Consistent settings everywhere

Priority:

CTXAI_HOME environment variable (if set)
Project directory .ctxai (default)

Embedding Providers

Local (Default - No API Key Required)

{
  "embedding": {
    "provider": "local",
    "model": "all-MiniLM-L6-v2"
  }
}

OpenAI (Better Quality)

{
  "embedding": {
    "provider": "openai",
    "model": "text-embedding-3-small",
    "api_key": "sk-..."
  }
}

HuggingFace

{
  "embedding": {
    "provider": "huggingface",
    "model": "sentence-transformers/all-MiniLM-L6-v2",
    "api_key": "hf_..."
  }
}

Project Size Limits

Prevent indexing overly large projects:

{
  "indexing": {
    "max_files": 10000,
    "max_total_size_mb": 500,
    "max_file_size_mb": 5,
    "chunk_size": 1000,
    "chunk_overlap": 100
  }
}

These limits help:

Prevent accidentally indexing huge projects
Control embedding costs (for cloud providers)
Ensure reasonable performance

MCP Server Configuration

Configure the MCP server by creating an mcp.json file:

{
  "inputs": [],
  "servers": {
    "ctxai": {
      "command": "python",
      "args": ["-m", "ctxai.server", "--index", "index_name"]
    }
  }
}

Querying with GitHub Copilot

Use natural language queries through GitHub Copilot's Agent mode:

@ctxai find code for updating profile images

Installation

Pre-requisites:

Python 3.10+
(Optional) OpenAI API key for better embeddings - local embeddings work without it!

# Basic installation (includes local embeddings)
pip install ctxai

# With OpenAI support
pip install ctxai[openai]

# With HuggingFace support  
pip install ctxai[huggingface]

# With all providers
pip install ctxai[all]

# OR using uv
uv pip install ctxai

# OR run directly with uvx
uvx ctxai

First Time Setup

On first run, ctxai creates a .ctxai/config.json file with default settings:

{
  "version": "1.0",
  "embedding": {
    "provider": "local",
    "model": null,
    "api_key": null,
    "batch_size": 100,
    "max_tokens": null
  },
  "indexing": {
    "max_files": 10000,
    "max_total_size_mb": 500,
    "max_file_size_mb": 5,
    "chunk_size": 1000,
    "chunk_overlap": 100
  }
}

You can edit this file to customize embedding providers and project limits.

Running

# Run with uv
uv run ctxai index /path/to/codebase "index-name"

# Or install and run directly
pip install ctxai
ctxai --help

Architecture

ctxai uses a multi-stage pipeline to transform your codebase into searchable vectors:

Traversal: Recursively walks through your codebase, respecting .gitignore patterns and custom include/exclude rules
Parsing: Uses tree-sitter to parse code and understand its structure (functions, classes, methods, etc.)
Chunking: Intelligently splits code into semantic chunks while preserving context and meaning
Embedding: Generates vector embeddings using OpenAI's embedding API
Storage: Stores embeddings in a local ChromaDB vector database (in .ctxai directory)

Components

traversal.py: File system traversal with gitignore support
chunking.py: Tree-sitter based intelligent code chunking
embeddings.py: OpenAI embedding generation
vector_store.py: ChromaDB vector database management
commands/index_command.py: Orchestrates the indexing pipeline

Storage

Indexed codebases are stored locally in the .ctxai/indexes/<index-name> directory within your project. This directory contains:

ChromaDB vector database
Chunk metadata and embeddings
Index configuration

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/vs4vijay/ctxai.git
cd ctxai

# Install dependencies with uv
uv sync

# Or with pip
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest

# Run specific test file
pytest tests/test_indexing.py

# Run with coverage
pytest --cov=ctxai

Code Quality

# Run linter
ruff check src/

# Format code
ruff format src/

# Type checking (if mypy is added)
mypy src/


uv version --bump patch

Project Structure

ctxai/
├── src/ctxai/
│   ├── __init__.py
│   ├── __main__.py
│   ├── app.py              # Typer CLI app
│   ├── chunking.py         # Code chunking logic
│   ├── embeddings.py       # Embedding generation
│   ├── traversal.py        # File system traversal
│   ├── vector_store.py     # Vector DB management
│   ├── server.py           # MCP server (coming soon)
│   └── commands/
│       ├── __init__.py
│       └── index_command.py
├── tests/
│   ├── __init__.py
│   ├── test_server.py
│   └── test_indexing.py
├── examples/
│   └── example_usage.py
├── pyproject.toml
└── README.md

Releasing

Bump version in pyproject.toml and push to main
create a new release with tags pattern vx.y.z e.g. v0.0.1
It would create a release on github and start a github action which would publish on pypi

Troubleshooting

Embedding Provider Issues

Local embeddings (default)

First run downloads the model (~80MB) - this is normal
No internet required after first download
Slower than cloud APIs but free and private

OpenAI API Key Error

If you configured OpenAI but get an API key error:

export OPENAI_API_KEY=your-api-key-here  # Linux/Mac
set OPENAI_API_KEY=your-api-key-here     # Windows CMD
$env:OPENAI_API_KEY="your-api-key-here"  # Windows PowerShell

Or add to .ctxai/config.json:

{
  "embedding": {
    "provider": "openai",
    "api_key": "sk-..."
  }
}

Switching Providers

Edit .ctxai/config.json to change providers:

{
  "embedding": {
    "provider": "local"  // or "openai", "huggingface"
  }
}

Project Size Errors

If you get "project too large" errors:

Use include patterns to filter files:

ctxai index ./project "index" --include "*.py" --include "*.js"

Increase limits in .ctxai/config.json:

{
  "indexing": {
    "max_files": 20000,
    "max_total_size_mb": 1000
  }
}

Exclude large directories:

ctxai index ./project "index" --exclude "node_modules/*" --exclude "dist/*"

No Files Found to Index

If the indexing process finds no files:

Check your include/exclude patterns
Verify the path is correct
Use --no-follow-gitignore if files are being ignored
Check that files are not binary

Tree-sitter Parse Errors

If you see warnings about parsing errors:

These are usually non-critical
The tool will fall back to simple text chunking
Only affects the semantic understanding, not the search capability

Memory Issues with Large Codebases

For very large codebases:

Index in smaller batches using include patterns
Reduce max_chunk_size in the chunker
Monitor the .ctxai directory size

Contributing

We welcome all contributions to the project! Before submitting your pull request, please ensure you have run the tests and linters locally. This helps us maintain the quality of the project and makes the review process faster for everyone.

All contributions should adhere to the project's code of conduct. Let's work together to create a welcoming and inclusive environment for everyone.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.2

Oct 6, 2025

0.0.1

Oct 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctxai-0.0.2.tar.gz (750.7 kB view details)

Uploaded Oct 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ctxai-0.0.2-py3-none-any.whl (42.5 kB view details)

Uploaded Oct 6, 2025 Python 3

File details

Details for the file ctxai-0.0.2.tar.gz.

File metadata

Download URL: ctxai-0.0.2.tar.gz
Upload date: Oct 6, 2025
Size: 750.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ctxai-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`7d0e9401a92ec2da68b691a185239dc9028a685b9dd5c8cf9d2315af77c820eb`
MD5	`0b1f0d3321e8fe5aa83cf05a134961fe`
BLAKE2b-256	`13b8a6885a2ea34c6ade37314568ccaacb0cd3d00da682d51154067becf46a82`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ctxai-0.0.2.tar.gz:

Publisher: release.yml on vs4vijay/ctxai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ctxai-0.0.2.tar.gz
- Subject digest: 7d0e9401a92ec2da68b691a185239dc9028a685b9dd5c8cf9d2315af77c820eb
- Sigstore transparency entry: 585680200
- Sigstore integration time: Oct 6, 2025
Source repository:
- Permalink: vs4vijay/ctxai@3fc8479ce3fd6677bed21bbe78eef11dc7a956f8
- Branch / Tag: refs/tags/v0.0.2
- Owner: https://github.com/vs4vijay
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3fc8479ce3fd6677bed21bbe78eef11dc7a956f8
- Trigger Event: push

File details

Details for the file ctxai-0.0.2-py3-none-any.whl.

File metadata

Download URL: ctxai-0.0.2-py3-none-any.whl
Upload date: Oct 6, 2025
Size: 42.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ctxai-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a21b4ca07e17559f2e537e2b5fdb86e6538adaad45ec62791349339281af234f`
MD5	`c7e22472de940092109b59a58e60ceeb`
BLAKE2b-256	`45f7a131c12457e4bd253212269e0b40429bf2a30353b153915d0305c59bb6c1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ctxai-0.0.2-py3-none-any.whl:

Publisher: release.yml on vs4vijay/ctxai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ctxai-0.0.2-py3-none-any.whl
- Subject digest: a21b4ca07e17559f2e537e2b5fdb86e6538adaad45ec62791349339281af234f
- Sigstore transparency entry: 585680225
- Sigstore integration time: Oct 6, 2025
Source repository:
- Permalink: vs4vijay/ctxai@3fc8479ce3fd6677bed21bbe78eef11dc7a956f8
- Branch / Tag: refs/tags/v0.0.2
- Owner: https://github.com/vs4vijay
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3fc8479ce3fd6677bed21bbe78eef11dc7a956f8
- Trigger Event: push

ctxai 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

ctxai

Quick Start

Features

Usage

Prerequisites

Indexing Your Codebase

CLI Commands

Querying Your Codebase

Web Dashboard

MCP Server for AI Agents

Configuration

CTXAI_HOME Environment Variable

Embedding Providers

Project Size Limits

MCP Server Configuration

Querying with GitHub Copilot

Installation

First Time Setup

Running

Architecture

Components

Storage

Development

Setup Development Environment

Running Tests

Code Quality

Project Structure

Releasing

Troubleshooting

Embedding Provider Issues

Project Size Errors

No Files Found to Index

Tree-sitter Parse Errors

Memory Issues with Large Codebases

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance