Skip to main content

Embeds your codebase and makes it available for quick LLM lookups via MCP.

Reason this release was yanked:

CLI crashed on import when optional OpenAI dependencies not installed. Fixed in v0.1.1.

Project description

CodeEmbed

Embeds your codebase into a local vector database and exposes it as an MCP tool, giving AI assistants like Claude Code fast semantic search over your code.

Particularly useful for questions like:

  • How is X implemented in this repo?
  • Where is X defined or used?
  • Does this repo already have X?

For other questions, the agent will fall back to normal lookups. CodeEmbed can improve lookup speed and accuracy, especially for finding existing implementations before writing new ones. Note that the biggest bottleneck in coding agents is LLM thinking and token generation — solid prompts and follow-up questions still matter.

Uses ChromaDB for local vector storage and either Ollama or OpenAI (including OpenAI models via Azure AI Foundry) for LLM analysis.

Prerequisites

  • Python 3.11+
  • uv
  • One of:
    • Ollama running locally, or
    • An OpenAI API key or Azure OpenAI endpoint

Installation

With Ollama:

uv tool install codeembed

With OpenAI / Azure OpenAI:

uv tool install 'codeembed[openai]'

Supply chain safety: To reduce the risk of newly-published malicious packages, consider adding exclude-newer = "7 days" to your global uv.toml. This prevents uv from installing packages published in the last 7 days.

Manual installation (from source)

If CodeEmbed is not published to PyPI, install it directly from source:

git clone https://github.com/robino16/codeembed
cd codeembed

# With Ollama
uv tool install .

# With OpenAI support
uv tool install '.[openai]'

Then run codeembed init inside of your target repository.

Upgrading

uv tool upgrade codeembed

Usage

CodeEmbed is intended to be used within a single project — run all commands from your project root. Each project gets its own local vector database stored in .codeembed/.

Supported file types: .py, .md, .ts, .tsx, .js, .jsx.

1. Initialize (run once in your project root):

codeembed init

Creates a codeembed.toml config and configures your .gitignore. You'll be prompted to select a provider (Ollama or OpenAI) and a model. You'll also be offered the option to automatically configure Claude Code and/or GitHub Copilot.

2. Pre-populate the index:

codeembed embed

Run this before starting the server to pre-populate the index. Searches will return empty results until the first file has been embedded.

CodeEmbed respects your project's .gitignore and also excludes typical environment directories and files (.env, venv, node_modules, etc.) by default.

3. Start the MCP server:

codeembed serve

Starts the MCP server. If the MCP server is added to Claude or GitHub Copilot, you do not need to do this.

The serve command will embed your codebase in the background - by default it will scan for changes every 60 seconds.

Configuring OpenAI

If you use the OpenAI provider, credentials are read from environment variables. The recommended approach is a .env file. codeembed init will ask for the path, and it will be stored in codeembed.toml so codeembed serve and codeembed embed loads the .env file automatically.

Standard OpenAI

OPENAI_API_KEY=...

Optionally override the endpoint (for compatible APIs like vLLM, LM Studio, OpenRouter):

OPENAI_API_KEY=...
OPENAI_BASE_URL=...

Azure OpenAI — API key

AZURE_OPENAI_ENDPOINT=https://<your-resource>.openai.azure.com/openai/v1/
AZURE_OPENAI_API_KEY=...

Azure OpenAI — RBAC / Entra ID (keyless)

Set only the endpoint; CodeEmbed will use DefaultAzureCredential, which automatically tries multiple credential sources in order — service principals (via env vars), workload identity, managed identity, VS Code Azure sign-in, az login, Azure PowerShell, and azd auth login — falling back to an interactive browser window if none are found automatically:

AZURE_OPENAI_ENDPOINT=https://<your-resource>.openai.azure.com/openai/v1/

Add to Claude Code or GitHub Copilot

codeembed init will offer to configure these automatically. If you prefer to do it manually:

Claude Code — add to .mcp.json in your project root:

{
  "mcpServers": {
    "codeembed": {
      "command": "codeembed",
      "args": ["serve"]
    }
  }
}

And add to .claude/settings.local.json to enable and pre-approve the tool:

{
  "enabledMcpjsonServers": ["codeembed"],
  "permissions": {
    "allow": ["mcp__codeembed__search"]
  }
}

GitHub Copilot — add to .vscode/mcp.json:

{
  "servers": {
    "codeembed": {
      "command": "codeembed",
      "args": ["serve"]
    }
  }
}

The MCP server exposes a single search(query) tool for semantic search over your codebase.

Contributing

Clone this repo with:

git clone git@github.com:robino16/codeembed.git
cd codeembed
uv sync

Check for dependency conflicts with:

uv pip check

Check for package vulnerabilities with:

uv run pip-audit

(Optional) Add Ruff pre-commit with:

pre-commit install

Update init files:

uv run --no-sync scripts/generate_init_files.py

Run linter:

ruff check . --fix

Run formatter:

ruff format .

Run tests:

uv run --no-sync pytest

Build with:

uv build

Validate build with:

uv run twine check dist/*

--no-sync is required for local dev commands when the MCP server is running, as uv holds a lock that blocks sync operations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codeembed-0.1.0.tar.gz (210.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codeembed-0.1.0-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file codeembed-0.1.0.tar.gz.

File metadata

  • Download URL: codeembed-0.1.0.tar.gz
  • Upload date:
  • Size: 210.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for codeembed-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ba58a7606fca9a71e0865498aaa549b9121debe4d642bb2897845037a9253d5a
MD5 ee403e0077ea5e45335b0d37e49c66d0
BLAKE2b-256 a0987a492df69f9923898b8217c344a0ce772295e442000cbda32be560dca786

See more details on using hashes here.

File details

Details for the file codeembed-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: codeembed-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for codeembed-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a8f411ea051489adef923eebccb6566f7f9fc1be3c0997918ef2d942ed3baf0c
MD5 1848babe1aeb7fac321d56ed40155b35
BLAKE2b-256 261d3d930ae6ca5be6db2e17612739daf4bb8f4072c68ad2380acc4245584359

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page