Skip to main content

Add your description here

Project description

Codemine

A Python project to embed Git repositories into a Pinecone vector database for semantic search.

Features

  • Embeds code chunks from Git repositories.
  • Supports incremental updates (remove outdated chunks).
  • Uses Tree-sitter for language-aware code splitting.
  • Search embedded code chunks using natural language queries.

Prerequisites

  • Python >= 3.12
  • uv (recommended) or pip

Setup

  1. Clone the repository:

    git clone <repository_url>
    cd codemine
    
  2. Install dependencies:

    Using uv:

    uv sync
    

    Using pip:

    pip install -e .
    
  3. Configure Environment Variables:

    Create a .env file in the root directory with the following variables:

    PINECONE_API_KEY=your_pinecone_api_key
    GITHUB_TOKEN=your_github_token
    OPENAI_API_KEY=your_openai_api_key
    # Optional: Defaults to "https://openrouter.ai/api/v1"
    # OPENAI_BASE_URL=https://api.openai.com/v1
    

Usage

The project installs a codemine CLI command.

Embed a Repository

To embed a repository into the vector store:

codemine embed-repo \
  --repo-owner <owner> \
  --repo-name <repo> \
  [--create-index] \
  [--remove-outdated-chunks] \
  [--ignore-glob "**/tests/**"]

Options:

  • --repo-owner: The owner of the GitHub repository (required).
  • --repo-name: The name of the GitHub repository (required).
  • --create-index: Create a new Pinecone index if it doesn't exist.
  • --remove-outdated-chunks: Remove chunks that are no longer in the repository.
  • --ignore-glob: Glob pattern to ignore files (can be used multiple times).

Example:

codemine embed-repo \
  --repo-owner jackharrington \
  --repo-name codebase-embedding \
  --create-index \
  --ignore-glob "**/__pycache__/**"

Search Chunks

To search for code chunks:

codemine search-chunks --query "How does the embedding client work?"

Options:

  • --query: The search query (required).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codemine-0.1.0.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codemine-0.1.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file codemine-0.1.0.tar.gz.

File metadata

  • Download URL: codemine-0.1.0.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for codemine-0.1.0.tar.gz
Algorithm Hash digest
SHA256 24962552d87cdb3ffbe4aef1633c01746b1df0eec9c488ab717b406ea8b02f1d
MD5 f6eb97b714c6aa24a2b30cdbac7d06f2
BLAKE2b-256 e9a663d68b179f308b62671bd912853a95faedfe7da5688b19445b0bd37c843f

See more details on using hashes here.

File details

Details for the file codemine-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: codemine-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for codemine-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f0a95247ebb203bff7fd7369757059fbe380914b97a39297d212414676354a1b
MD5 58c2820aa69648089e7341bab99a048d
BLAKE2b-256 e8674b6f11352520f380af25526488315b42e3fc3b1dcf6e68f7e00a948f4d01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page