A useful module for handling Git data.
Project description
Git2Vec
Git2Vec is a text analysis tool that loads files from a Git repository, processes and embeds the text using OpenAI, and stores the embeddings in a Pinecone index for efficient retrieval and analysis.
Dependencies
The following packages are required to use Git2Vec:
- langchain
- pinecone-client
- tiktoken
- gitpython
- turbo_docs
- toml
- setuptools
- wheel
- twine
Install them with the following commands:
pip install -r requirements.txt
pip install -r requirements.dev.txt
Repo Structure
exclude.toml
: Configuration file for excluding specific files for theturbo_docs
package.requirements.dev.txt
: Development dependencies.requirements.txt
: Main dependencies.git2vec/loader.py
: Functions for loading documents from a Git repository using the TurboGitLoader class.git2vec/vectordb.py
: Functions for creating a Pinecone index, embedding documents and upserting them in the index.
Usage
- First, create a Pinecone API key and save it as the
PINECONE_API_KEY
environment variable in your.env
file. - Create an OpenAI API key and save it as the
OPENAI_API_KEY
environment variable in your.env
file. - Load the Git repository data:
from git2vec.loader import load
repo_data = load(repo="https://github.com/your/repository.git", branch="main")
- Create and insert embeddings into the Pinecone index:
from git2vec.vectordb import create_vectorstore
create_vectorstore(repo_data)
- Get the Pinecone vector store instance for text retrieval:
from git2vec.vectordb import get_vectorstore
vector_store = get_vectorstore()
- Use the vector store for information retrieval and analysis.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
git2vec-0.1.1.tar.gz
(4.9 kB
view details)
Built Distribution
File details
Details for the file git2vec-0.1.1.tar.gz
.
File metadata
- Download URL: git2vec-0.1.1.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6a53f00ef5fd9d0f6372f1675598c68f0f35d5c7c3c0a9cb60bd03c710d30dc |
|
MD5 | 3c442e841c25ee8a10efffc87c6618fd |
|
BLAKE2b-256 | e752b0455b515e6dfb97af6617b68f8d9a019ddcfb06e128db0ef9296bff605f |
File details
Details for the file git2vec-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: git2vec-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6146a7f64cfff6e6bea2052c03b456a24d0ecf5074a3e9e2a92b25a6a2870e0 |
|
MD5 | 8e7ed540e7e8ef6cae44d7df92f5271a |
|
BLAKE2b-256 | 1fd3ab8a9c1fce77fd740b7383f5d0fdbd9f16b6be0eec28ddae70cc2c2baef2 |