Skip to main content

A useful module for handling Git data.

Project description

Git2Vec

Git2Vec is a text analysis tool that loads files from a Git repository, processes and embeds the text using OpenAI, and stores the embeddings in a Pinecone index for efficient retrieval and analysis.

Dependencies

The following packages are required to use Git2Vec:

  • langchain
  • pinecone-client
  • tiktoken
  • gitpython
  • turbo_docs
  • toml
  • setuptools
  • wheel
  • twine

Install them with the following commands:

pip install -r requirements.txt
pip install -r requirements.dev.txt

Repo Structure

  • exclude.toml: Configuration file for excluding specific files for the turbo_docs package.
  • requirements.dev.txt: Development dependencies.
  • requirements.txt: Main dependencies.
  • git2vec/loader.py: Functions for loading documents from a Git repository using the TurboGitLoader class.
  • git2vec/vectordb.py: Functions for creating a Pinecone index, embedding documents and upserting them in the index.

Usage

  1. First, create a Pinecone API key and save it as the PINECONE_API_KEY environment variable in your .env file.
  2. Create an OpenAI API key and save it as the OPENAI_API_KEY environment variable in your .env file.
  3. Load the Git repository data:
from git2vec.loader import load

repo_data = load(repo="https://github.com/your/repository.git", branch="main")
  1. Create and insert embeddings into the Pinecone index:
from git2vec.vectordb import create_vectorstore

create_vectorstore(repo_data)
  1. Get the Pinecone vector store instance for text retrieval:
from git2vec.vectordb import get_vectorstore

vector_store = get_vectorstore()
  1. Use the vector store for information retrieval and analysis.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

git2vec-0.1.1.tar.gz (4.9 kB view hashes)

Uploaded Source

Built Distribution

git2vec-0.1.1-py3-none-any.whl (5.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page