Skip to main content

Semantic code search for your repo, as a CLI and an MCP server. Bring any OpenAI-compatible embedding model. Zero dependencies.

Project description

codeseek

PyPI CI License: MIT Python

Semantic search over your codebase — as a CLI and an MCP server. Zero dependencies.

codeseek indexes a repository into a local vector store and lets you search it by meaning, not just by string match. Use it from the terminal, or run it as an MCP server so an AI coding assistant or editor can ask your codebase questions directly.

It brings no embedding model of its own: point it at the OpenAI API, or at any OpenAI-compatible endpoint such as a local llama.cpp server, so private code can be embedded without leaving your machine. Storage is plain SQLite; search is brute-force cosine. The whole thing is the Python standard library and nothing else.

Install

pip install codeseek
# or:
pipx install codeseek

Requires Python 3.8+.

Quick start

export OPENAI_API_KEY=sk-...

# 1. Index the current repository
codeseek index .

# 2. Search it
codeseek search "where do we validate the auth token?"
codeseek search "retry with backoff" -k 3

Results come back as markdown, each with a file path, line range, and similarity score:

### src/auth/token.py:40-70  (score 0.812)
​```
def verify_token(raw: str) -> Claims:
    ...
​```

Use a local or alternative provider

codeseek index . --base-url http://localhost:8080/v1 --model nomic-embed-text

As an MCP server

codeseek serve speaks MCP over stdio and exposes one tool, search_code.

After indexing a repo, register it with your MCP client. A typical mcpServers configuration looks like this:

{
  "mcpServers": {
    "codeseek": {
      "command": "codeseek",
      "args": ["serve", "--db", "/path/to/your/repo/.codeseek.db"],
      "env": { "OPENAI_API_KEY": "sk-..." }
    }
  }
}

The assistant can then call search_code to pull relevant code into its context on demand, instead of you pasting files by hand.

Commands

Command What it does
codeseek index [PATH] Index a directory (default .) into --db.
codeseek search QUERY Search the index; -k sets result count.
codeseek serve Run the MCP server over stdio.

Shared options: --db, --model, --base-url, --api-key (each with an environment-variable default).

How it works

  1. Source — files are walked and read (sensible code/text extensions, common build and vendor directories skipped).
  2. Chunking — each file is split into overlapping line windows.
  3. Embedding — chunks are embedded in batches via your provider.
  4. Storage — vectors land in a local SQLite database.
  5. Search — your query is embedded and compared against every chunk by cosine similarity; the top matches are returned.

The document source is pluggable: the engine only consumes Document objects, so the same indexing and search machinery can be pointed at things other than code.

Privacy note

Indexing sends file contents to whichever embeddings provider you configure. For private code, prefer a self-hosted model via --base-url.

Scope

Search is a linear scan, which is plenty fast for a single repository (a few thousand chunks). Indexing very large monorepos would want a real approximate vector index — a natural next step, not today's goal.

Development

pip install -e ".[test]"
python -m pytest

All tests run offline; the embedding and HTTP layers accept injectable fakes.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codeseek-0.1.0.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codeseek-0.1.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file codeseek-0.1.0.tar.gz.

File metadata

  • Download URL: codeseek-0.1.0.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for codeseek-0.1.0.tar.gz
Algorithm Hash digest
SHA256 550a146d99a29e9412d70efd0b8e694b290e9aaf58b911565f5c42ede0fcb233
MD5 1e4b7201611b6a7286d61b963b182719
BLAKE2b-256 5c998ff042e978bf7c1a3986e90832f8279f40136334d1f4ba39def06a664512

See more details on using hashes here.

File details

Details for the file codeseek-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: codeseek-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for codeseek-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 145bef6ecb30cf64e39c0733e4f895cc7a13980a80f0683b497116765fe23691
MD5 25b069ea561631ac47f9574261770109
BLAKE2b-256 a3d6ed403e66d25c5696937807f9b50cf5f0354fd9cdad4eb69498b1c8339c99

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page