A useful module for handling Git data.
Project description
Git2Vec Repo
Git2Vec is a Python module that allows you to load text files from a Git repository and create a searchable vector database using Langchain, Pinecone, and OpenAI.
Installation
Install from https://pypi.org/project/git2vec/ using pip
pip install git2vec
If cloned, you will need to install the following packages:
pip install -r requirements.txt
If you are a developer, you will need to install the development requirements:
pip install -r requirements.dev.txt
Usage
Loading a Git repository
To load a Git repository, use the git2vec.loader.load()
function:
from git2vec import loader
repo_name = "https://github.com/voynow/turbo-docs"
# Returns a list of Document objects
repo_data = loader.load(repo_name)
# Or return a string of all the raw text
raw_repo = loader.load(repo_name, return_str=True)
Creating and managing a vector database
To create a vector database from the loaded Git repository, use the following functions:
from git2vec import vectordb
# Create a vector store from the Git repo
vectorstore = vectordb.create_vectorstore(repo_name)
# Retrieve the vector store from Pinecone index
vectorstore = vectordb.get_vectorstore()
Modules
loader.py
The loader.py
module contains the TurboGitLoader
class which can be used to load text files from a Git repository. The load()
function takes a repository URL and returns a list of Document
objects or a string containing all the raw text.
vectordb.py
The vectordb.py
module provides functions to create and manage a vector database using Langchain, Pinecone, and OpenAI. It contains functions for initializing Pinecone, upserting data, processing data, embedding and upserting, creating a vectorstore, and retrieving a vectorstore.
Contributing
If you find any issues or have any suggestions, feel free to open an issue or submit a pull request. We welcome any contributions!
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file git2vec-0.1.4.tar.gz
.
File metadata
- Download URL: git2vec-0.1.4.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc827f4f0e6c27ac3042ab60e853ee260fe374c92cf8534a213574de69f4e5a6 |
|
MD5 | 600b3a33b385a864a1bcb667c4598f12 |
|
BLAKE2b-256 | d10e6c060f8c873cc3aeba979ad220ce3e88fffff8bfddc73f72a4e3653977a4 |
File details
Details for the file git2vec-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: git2vec-0.1.4-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d65c34423894dd9553394b3705f8189fa8a96d50365d650380422d3a0bb891c8 |
|
MD5 | 36fbfd6085ddebf439028e1473d0f20b |
|
BLAKE2b-256 | be4ba4ea4d786bb9f249c8ce8f144045c929548462d38ef2753a7675b52ee194 |