Skip to main content

A useful module for handling Git data.

Project description

Git2Vec

Git2Vec is a text analysis tool that loads files from a Git repository, processes and embeds the text using OpenAI, and stores the embeddings in a Pinecone index for efficient retrieval and analysis.

Dependencies

The following packages are required to use Git2Vec:

  • langchain
  • pinecone-client
  • tiktoken
  • gitpython
  • turbo_docs
  • toml
  • setuptools
  • wheel
  • twine

Install them with the following commands:

pip install -r requirements.txt
pip install -r requirements.dev.txt

Repo Structure

  • exclude.toml: Configuration file for excluding specific files for the turbo_docs package.
  • requirements.dev.txt: Development dependencies.
  • requirements.txt: Main dependencies.
  • git2vec/loader.py: Functions for loading documents from a Git repository using the TurboGitLoader class.
  • git2vec/vectordb.py: Functions for creating a Pinecone index, embedding documents and upserting them in the index.

Usage

  1. First, create a Pinecone API key and save it as the PINECONE_API_KEY environment variable in your .env file.
  2. Create an OpenAI API key and save it as the OPENAI_API_KEY environment variable in your .env file.
  3. Load the Git repository data:
from git2vec.loader import load

repo_data = load(repo="https://github.com/your/repository.git", branch="main")
  1. Create and insert embeddings into the Pinecone index:
from git2vec.vectordb import create_vectorstore

create_vectorstore(repo_data)
  1. Get the Pinecone vector store instance for text retrieval:
from git2vec.vectordb import get_vectorstore

vector_store = get_vectorstore()
  1. Use the vector store for information retrieval and analysis.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

git2vec-0.1.1.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

git2vec-0.1.1-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file git2vec-0.1.1.tar.gz.

File metadata

  • Download URL: git2vec-0.1.1.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.1

File hashes

Hashes for git2vec-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e6a53f00ef5fd9d0f6372f1675598c68f0f35d5c7c3c0a9cb60bd03c710d30dc
MD5 3c442e841c25ee8a10efffc87c6618fd
BLAKE2b-256 e752b0455b515e6dfb97af6617b68f8d9a019ddcfb06e128db0ef9296bff605f

See more details on using hashes here.

File details

Details for the file git2vec-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: git2vec-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.1

File hashes

Hashes for git2vec-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c6146a7f64cfff6e6bea2052c03b456a24d0ecf5074a3e9e2a92b25a6a2870e0
MD5 8e7ed540e7e8ef6cae44d7df92f5271a
BLAKE2b-256 1fd3ab8a9c1fce77fd740b7383f5d0fdbd9f16b6be0eec28ddae70cc2c2baef2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page