Semantic code search for your repo, as a CLI and an MCP server. Bring any OpenAI-compatible embedding model. Zero dependencies.
Project description
codeseek
Semantic search over your codebase — as a CLI and an MCP server. Zero dependencies.
codeseek indexes a repository into a local vector store and lets you search it
by meaning, not just by string match. Use it from the terminal, or run it as an
MCP server so an AI coding assistant or editor
can ask your codebase questions directly.
It brings no embedding model of its own: point it at the OpenAI API, or at any
OpenAI-compatible endpoint such as a local llama.cpp server, so private code
can be embedded without leaving your machine. Storage is plain SQLite; search is
brute-force cosine. The whole thing is the Python standard library and nothing
else.
Install
pip install codeseek
# or:
pipx install codeseek
Requires Python 3.8+.
Quick start
export OPENAI_API_KEY=sk-...
# 1. Index the current repository
codeseek index .
# 2. Search it
codeseek search "where do we validate the auth token?"
codeseek search "retry with backoff" -k 3
Results come back as markdown, each with a file path, line range, and similarity score:
### src/auth/token.py:40-70 (score 0.812)
```
def verify_token(raw: str) -> Claims:
...
```
Use a local or alternative provider
codeseek index . --base-url http://localhost:8080/v1 --model nomic-embed-text
As an MCP server
codeseek serve speaks MCP over stdio and exposes one tool, search_code.
After indexing a repo, register it with your MCP client. A typical mcpServers
configuration looks like this:
{
"mcpServers": {
"codeseek": {
"command": "codeseek",
"args": ["serve", "--db", "/path/to/your/repo/.codeseek.db"],
"env": { "OPENAI_API_KEY": "sk-..." }
}
}
}
The assistant can then call search_code to pull relevant code into its context
on demand, instead of you pasting files by hand.
Commands
| Command | What it does |
|---|---|
codeseek index [PATH] |
Index a directory (default .) into --db. |
codeseek search QUERY |
Search the index; -k sets result count. |
codeseek serve |
Run the MCP server over stdio. |
Shared options: --db, --model, --base-url, --api-key (each with an
environment-variable default).
How it works
- Source — files are walked and read (sensible code/text extensions, common build and vendor directories skipped).
- Chunking — each file is split into overlapping line windows.
- Embedding — chunks are embedded in batches via your provider.
- Storage — vectors land in a local SQLite database.
- Search — your query is embedded and compared against every chunk by cosine similarity; the top matches are returned.
The document source is pluggable: the engine only consumes Document objects, so
the same indexing and search machinery can be pointed at things other than code.
Privacy note
Indexing sends file contents to whichever embeddings provider you configure. For
private code, prefer a self-hosted model via --base-url.
Scope
Search is a linear scan, which is plenty fast for a single repository (a few thousand chunks). Indexing very large monorepos would want a real approximate vector index — a natural next step, not today's goal.
Development
pip install -e ".[test]"
python -m pytest
All tests run offline; the embedding and HTTP layers accept injectable fakes.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codeseek-0.1.0.tar.gz.
File metadata
- Download URL: codeseek-0.1.0.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
550a146d99a29e9412d70efd0b8e694b290e9aaf58b911565f5c42ede0fcb233
|
|
| MD5 |
1e4b7201611b6a7286d61b963b182719
|
|
| BLAKE2b-256 |
5c998ff042e978bf7c1a3986e90832f8279f40136334d1f4ba39def06a664512
|
File details
Details for the file codeseek-0.1.0-py3-none-any.whl.
File metadata
- Download URL: codeseek-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
145bef6ecb30cf64e39c0733e4f895cc7a13980a80f0683b497116765fe23691
|
|
| MD5 |
25b069ea561631ac47f9574261770109
|
|
| BLAKE2b-256 |
a3d6ed403e66d25c5696937807f9b50cf5f0354fd9cdad4eb69498b1c8339c99
|