Skip to main content

A useful module for handling Git data.

Project description

Git2Vec Repo

Git2Vec is a Python module that allows you to load text files from a Git repository and create a searchable vector database using Langchain, Pinecone, and OpenAI.

Installation

Install from https://pypi.org/project/git2vec/ using pip

pip install git2vec

If cloned, you will need to install the following packages:

pip install -r requirements.txt

If you are a developer, you will need to install the development requirements:

pip install -r requirements.dev.txt

Usage

Loading a Git repository

To load a Git repository, use the git2vec.loader.load() function:

from git2vec import loader

repo_name = "https://github.com/voynow/turbo-docs"

# Returns a list of Document objects
repo_data = loader.load(repo_name)

# Or return a string of all the raw text
raw_repo = loader.load(repo_name, return_str=True)

Creating and managing a vector database

To create a vector database from the loaded Git repository, use the following functions:

from git2vec import vectordb

# Create a vector store from the Git repo
vectorstore = vectordb.create_vectorstore(repo_name)

# Retrieve the vector store from Pinecone index
vectorstore = vectordb.get_vectorstore()

Modules

loader.py

The loader.py module contains the TurboGitLoader class which can be used to load text files from a Git repository. The load() function takes a repository URL and returns a list of Document objects or a string containing all the raw text.

vectordb.py

The vectordb.py module provides functions to create and manage a vector database using Langchain, Pinecone, and OpenAI. It contains functions for initializing Pinecone, upserting data, processing data, embedding and upserting, creating a vectorstore, and retrieving a vectorstore.

Contributing

If you find any issues or have any suggestions, feel free to open an issue or submit a pull request. We welcome any contributions!

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

git2vec-0.1.3.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

git2vec-0.1.3-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file git2vec-0.1.3.tar.gz.

File metadata

  • Download URL: git2vec-0.1.3.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.1

File hashes

Hashes for git2vec-0.1.3.tar.gz
Algorithm Hash digest
SHA256 252f6b06d34cc4f433f6f5cab47a6a4d11efd409c600e69ec860dd64eda70ac1
MD5 792a7cc08d990a1f0cba617fd81980a8
BLAKE2b-256 4fe9338e191a4bee3c4f223953d5e496f1500f82402b99c2a60380dd043ec3b6

See more details on using hashes here.

File details

Details for the file git2vec-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: git2vec-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.1

File hashes

Hashes for git2vec-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e51d0e11f104efd98caf64bd35bfcf26e2a4a951d3242e8b9bdd16c52ceb7b0e
MD5 2dca7f65c00baaf48bbde8163ac25790
BLAKE2b-256 a492d95ba80334de8ee7e0cc9876d1ce9eb62f333a4ddfe3ffd7942b42b89401

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page