Skip to main content

MCP server for indexing and querying codebases using CocoIndex

Project description

cocoindex code

light weight MCP for code that just works

effect

A super light-weight, effective embedded MCP (AST-based) that understand and searches your codebase that just works! Using CocoIndex - an Rust-based ultra performant data transformation engine. No blackbox. Works for Claude, Codex, Cursor - any coding agent.

  • Instant token saving by 70%.
  • 1 min setup - Just claude/codex mcp add works!

Discord GitHub Documentation License

PyPI Downloads CI release

🌟 Please help star CocoIndex if you like this project!

Deutsch | English | Español | français | 日本語 | 한국어 | Português | Русский | 中文

Get Started - zero config, let's go!!

Using pipx:

pipx install cocoindex-code       # first install
pipx upgrade cocoindex-code       # upgrade

Using uv:

uv tool install --upgrade cocoindex-code --prerelease explicit --with "cocoindex>=1.0.0a24"

Claude

claude mcp add cocoindex-code -- cocoindex-code

Codex

codex mcp add cocoindex-code -- cocoindex-code

OpenCode

opencode mcp add

Enter MCP server name: cocoindex-code Select MCP server type: local Enter command to run: cocoindex-code

Or use opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "cocoindex-code": {
      "type": "local",
      "command": [
        "cocoindex-code"
      ]
    }
  }
}

Build the Index

For large codebases, we recommend running the indexer once before using the MCP so you can see the progress:

cocoindex-code index

This lets you monitor the indexing process and ensure everything is ready. After the initial build, the MCP server will automatically keep the index up-to-date in the background as files change.

For small projects you can skip this step — the MCP server will build the index automatically on first use.

When Is the MCP Triggered?

Once configured, your coding agent (Claude Code, Codex, Cursor, etc.) automatically decides when semantic code search is helpful — especially for finding code by description, exploring unfamiliar codebases, fuzzy/conceptual matches, or locating implementations without knowing exact names.

You can also nudge the agent explicitly, e.g. "Use the cocoindex-code MCP to find how user sessions are managed." For persistent instructions, add guidance to your project's AGENTS.md or CLAUDE.md:

Use the cocoindex-code MCP server for semantic code search when:
- Searching for code by meaning or description rather than exact text
- Exploring unfamiliar parts of the codebase
- Looking for implementations without knowing exact names
- Finding similar code patterns or related functionality

Features

  • Semantic Code Search: Find relevant code using natural language queries when grep doesn't work well, and save tokens immediately.
  • Ultra Performant to code changes:⚡ Built on top of ultra performant Rust indexing engine. Only re-indexes changed files for fast updates.
  • Multi-Language Support: Python, JavaScript/TypeScript, Rust, Go, Java, C/C++, C#, SQL, Shell
  • Embedded: Portable and just works, no database setup required!
  • Flexible Embeddings: By default, no API key required with Local SentenceTransformers - totally free! You can customize 100+ cloud providers.

Configuration

Variable Description Default
COCOINDEX_CODE_ROOT_PATH Root path of the codebase Auto-discovered (see below)
COCOINDEX_CODE_EMBEDDING_MODEL Embedding model (see below) sbert/sentence-transformers/all-MiniLM-L6-v2
COCOINDEX_CODE_BATCH_SIZE Max batch size for local embedding model 16
COCOINDEX_CODE_EXTRA_EXTENSIONS Additional file extensions to index (comma-separated, e.g. "inc:php,yaml,toml" — use ext:lang to override language detection) (none)
COCOINDEX_CODE_EXCLUDED_PATTERNS Additional glob patterns to exclude from indexing as a JSON array (e.g. '["**/migration.sql", "{**/*.md,**/*.txt}"]') (none)

Root Path Discovery

If COCOINDEX_CODE_ROOT_PATH is not set, the codebase root is discovered by:

  1. Finding the nearest parent directory containing .cocoindex_code/
  2. Finding the nearest parent directory containing .git/
  3. Falling back to the current working directory

Embedding model

By default - this project use a local SentenceTransformers model (sentence-transformers/all-MiniLM-L6-v2). No API key required and completely free!

Use a code specific embedding model can achieve better semantic understanding for your results, this project supports all models on Ollama and 100+ cloud providers.

Set COCOINDEX_CODE_EMBEDDING_MODEL to any LiteLLM-supported model, along with the provider's API key:

Ollama (Local)
claude mcp add cocoindex-code \
  -e COCOINDEX_CODE_EMBEDDING_MODEL=ollama/nomic-embed-text \
  -- cocoindex-code

Set OLLAMA_API_BASE if your Ollama server is not at http://localhost:11434.

OpenAI
claude mcp add cocoindex-code \
  -e COCOINDEX_CODE_EMBEDDING_MODEL=text-embedding-3-small \
  -e OPENAI_API_KEY=your-api-key \
  -- cocoindex-code
Azure OpenAI
claude mcp add cocoindex-code \
  -e COCOINDEX_CODE_EMBEDDING_MODEL=azure/your-deployment-name \
  -e AZURE_API_KEY=your-api-key \
  -e AZURE_API_BASE=https://your-resource.openai.azure.com \
  -e AZURE_API_VERSION=2024-06-01 \
  -- cocoindex-code
Gemini
claude mcp add cocoindex-code \
  -e COCOINDEX_CODE_EMBEDDING_MODEL=gemini/text-embedding-004 \
  -e GEMINI_API_KEY=your-api-key \
  -- cocoindex-code
Mistral
claude mcp add cocoindex-code \
  -e COCOINDEX_CODE_EMBEDDING_MODEL=mistral/mistral-embed \
  -e MISTRAL_API_KEY=your-api-key \
  -- cocoindex-code
Voyage (Code-Optimized)
claude mcp add cocoindex-code \
  -e COCOINDEX_CODE_EMBEDDING_MODEL=voyage/voyage-code-3 \
  -e VOYAGE_API_KEY=your-api-key \
  -- cocoindex-code
Cohere
claude mcp add cocoindex-code \
  -e COCOINDEX_CODE_EMBEDDING_MODEL=cohere/embed-english-v3.0 \
  -e COHERE_API_KEY=your-api-key \
  -- cocoindex-code
AWS Bedrock
claude mcp add cocoindex-code \
  -e COCOINDEX_CODE_EMBEDDING_MODEL=bedrock/amazon.titan-embed-text-v2:0 \
  -e AWS_ACCESS_KEY_ID=your-access-key \
  -e AWS_SECRET_ACCESS_KEY=your-secret-key \
  -e AWS_REGION_NAME=us-east-1 \
  -- cocoindex-code
Nebius
claude mcp add cocoindex-code \
  -e COCOINDEX_CODE_EMBEDDING_MODEL=nebius/BAAI/bge-en-icl \
  -e NEBIUS_API_KEY=your-api-key \
  -- cocoindex-code

Any model supported by LiteLLM works — see the full list of embedding providers.

Local SentenceTransformers models

Use the sbert/ prefix to load any SentenceTransformers model locally (no API key required).

Example — general purpose text model:

claude mcp add cocoindex-code \
  -e COCOINDEX_CODE_EMBEDDING_MODEL=sbert/nomic-ai/nomic-embed-text-v1 \
  -- cocoindex-code

GPU-optimised code retrieval:

nomic-ai/CodeRankEmbed delivers significantly better code retrieval than the default model. It is 137M parameters, requires ~1 GB VRAM, and has an 8192-token context window.

claude mcp add cocoindex-code \
  -e COCOINDEX_CODE_EMBEDDING_MODEL=sbert/nomic-ai/CodeRankEmbed \
  -e COCOINDEX_CODE_BATCH_SIZE=16 \
  -- cocoindex-code

Note: Switching models requires re-indexing your codebase (the vector dimensions differ).

MCP Tools

search

Search the codebase using semantic similarity.

search(
    query: str,               # Natural language query or code snippet
    limit: int = 10,          # Maximum results (1-100)
    offset: int = 0,          # Pagination offset
    refresh_index: bool = True  # Refresh index before querying
)

The refresh_index parameter controls whether the index is refreshed before searching:

  • True (default): Refreshes the index to include any recent changes
  • False: Skip refresh for faster consecutive queries

Returns matching code chunks with:

  • File path
  • Language
  • Code content
  • Line numbers (start/end)
  • Similarity score

Supported Languages

Language Aliases File Extensions
c .c
cpp c++ .cpp, .cc, .cxx, .h, .hpp
csharp csharp, cs .cs
css .css, .scss
dtd .dtd
fortran f, f90, f95, f03 .f, .f90, .f95, .f03
go golang .go
html .html, .htm
java .java
javascript js .js
json .json
kotlin .kt, .kts
lua .lua
markdown md .md, .mdx
pascal pas, dpr, delphi .pas, .dpr
php .php
python .py
r .r
ruby .rb
rust rs .rs
scala .scala
solidity .sol
sql .sql
swift .swift
toml .toml
tsx .tsx
typescript ts .ts
xml .xml
yaml .yaml, .yml

Common generated directories are automatically excluded:

  • __pycache__/
  • node_modules/
  • target/
  • dist/
  • vendor/ (Go vendored dependencies, matched by domain-based child paths)

Troubleshooting

sqlite3.Connection object has no attribute enable_load_extension

Some Python installations (e.g. the one pre-installed on macOS) ship with a SQLite library that doesn't enable extensions.

macOS fix: Install Python through Homebrew:

brew install python3

Then re-install cocoindex-code (see Get Started for install options):

Using pipx:

pipx install cocoindex-code       # first install
pipx upgrade cocoindex-code       # upgrade

Using uv (install or upgrade):

uv tool install --upgrade cocoindex-code --prerelease explicit --with "cocoindex>=1.0.0a24"

Large codebase / Enterprise

CocoIndex is an ultra effecient indexing engine that also works on large codebase at scale on XXX G for enterprises. In enterprise scenarios it is a lot more effecient to do index share with teammates when there are large repo or many repos. We also have advanced features like branch dedupe etc designed for enterprise users.

If you need help with remote setup, please email our maintainer linghua@cocoindex.io, happy to help!!

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex_code-0.1.14.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cocoindex_code-0.1.14-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file cocoindex_code-0.1.14.tar.gz.

File metadata

  • Download URL: cocoindex_code-0.1.14.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cocoindex_code-0.1.14.tar.gz
Algorithm Hash digest
SHA256 4e00addaa2d47af90c44051d8e9cfc3eb2a19f4654590fd2749381011d0d83a1
MD5 0663bf467765a75bdd021c462d425331
BLAKE2b-256 8a1bd09aa36acf6f41af893b9e65ac317dfdd0b6e0eb3daeea038d1381b5c058

See more details on using hashes here.

Provenance

The following attestation bundles were made for cocoindex_code-0.1.14.tar.gz:

Publisher: release.yml on cocoindex-io/cocoindex-code

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cocoindex_code-0.1.14-py3-none-any.whl.

File metadata

File hashes

Hashes for cocoindex_code-0.1.14-py3-none-any.whl
Algorithm Hash digest
SHA256 74b7f1d972cde9b3dfcdb92e1039733725f44e376a72cf9139052e2eb15d73f4
MD5 81c80642bc33800300fd236ce5dc6d7a
BLAKE2b-256 2b8d0142b4b00e6cc751d0ec155232500e51e16b939877423b6bb790d1debfa3

See more details on using hashes here.

Provenance

The following attestation bundles were made for cocoindex_code-0.1.14-py3-none-any.whl:

Publisher: release.yml on cocoindex-io/cocoindex-code

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page