Add your description here
Project description
Codemine
A Python project to embed Git repositories into a Pinecone vector database for semantic search.
Features
- Embeds code chunks from Git repositories.
- Supports incremental updates (remove outdated chunks).
- Uses Tree-sitter for language-aware code splitting.
- Search embedded code chunks using natural language queries.
Prerequisites
- Python >= 3.12
- uv (recommended) or pip
Setup
-
Clone the repository:
git clone <repository_url> cd codemine
-
Install dependencies:
Using
uv:uv syncUsing
pip:pip install -e .
-
Configure Environment Variables:
Create a
.envfile in the root directory with the following variables:PINECONE_API_KEY=your_pinecone_api_key GITHUB_TOKEN=your_github_token OPENAI_API_KEY=your_openai_api_key # Optional: Defaults to "https://openrouter.ai/api/v1" # OPENAI_BASE_URL=https://api.openai.com/v1
Usage
The project installs a codemine CLI command.
Embed a Repository
To embed a repository into the vector store:
codemine embed-repo \
--repo-owner <owner> \
--repo-name <repo> \
[--create-index] \
[--remove-outdated-chunks] \
[--ignore-glob "**/tests/**"]
Options:
--repo-owner: The owner of the GitHub repository (required).--repo-name: The name of the GitHub repository (required).--create-index: Create a new Pinecone index if it doesn't exist.--remove-outdated-chunks: Remove chunks that are no longer in the repository.--ignore-glob: Glob pattern to ignore files (can be used multiple times).
Example:
codemine embed-repo \
--repo-owner jackharrington \
--repo-name codebase-embedding \
--create-index \
--ignore-glob "**/__pycache__/**"
Search Chunks
To search for code chunks:
codemine search-chunks --query "How does the embedding client work?"
Options:
--query: The search query (required).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codemine-0.1.0.tar.gz.
File metadata
- Download URL: codemine-0.1.0.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24962552d87cdb3ffbe4aef1633c01746b1df0eec9c488ab717b406ea8b02f1d
|
|
| MD5 |
f6eb97b714c6aa24a2b30cdbac7d06f2
|
|
| BLAKE2b-256 |
e9a663d68b179f308b62671bd912853a95faedfe7da5688b19445b0bd37c843f
|
File details
Details for the file codemine-0.1.0-py3-none-any.whl.
File metadata
- Download URL: codemine-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0a95247ebb203bff7fd7369757059fbe380914b97a39297d212414676354a1b
|
|
| MD5 |
58c2820aa69648089e7341bab99a048d
|
|
| BLAKE2b-256 |
e8674b6f11352520f380af25526488315b42e3fc3b1dcf6e68f7e00a948f4d01
|