Skip to main content

A simple package for generating and querying Vector Databases for Generative AI as well any other reason

Project description

VDBforGenAI

VDBforGenAI is a Python package for building vector databases of text for use in natural language processing applications.

Usage

To use VDBforGenAI, first install the package and its dependencies:

pip install git+https://github.com/JakubJDolezal/VDBforGenAI.git

Next, create an instance of the VectorDatabase class by passing in a list of strings, which represent the context you care about. Each string can contain multiple sentences.

Minimal example

You instantiate a database and then tell it where to load

from VDBforGenAI.VectorDatabase import VectorDatabase

vdb = VectorDatabase(splitting_choice="length")
vdb.load_all_in_directory('./ExampleFolder')

Once you have a VectorDatabase instance, you can use the get_context_from_entire_database method to retrieve the context that is most similar to a given input text.

context = vdb.get_context_from_entire_database('What does parma ham go well with?')

print(context)

This retrieves the most similar piece of text to "What does parma ham go well with?" from your indexed directory You can also specify which level and which directory on that level you wish to search

context_selection=vdb.get_index_and_context_from_selection('Who made this?', 2, 'SubfolderOfLies')

The directory level and value structure is saved in

print(vdb.dlv)

Dependencies

VDBforGenAI has the following dependencies:

        "faiss-cpu",
        "transformers",
        "torch",
        "numpy","PyPDF2",'docx','python-docx

Contributions are welcome! If you have any suggestions or issues, please create an issue or pull request on the GitHub repository. License

VDBforGenAI is licensed under the MIT License.

More Usage -

How to add new strings

Passing an encoder and tokenizer from Hugging Face's Transformers library:

from transformers import AutoTokenizer, AutoModel
from VDBforGenAI import VectorDatabase

[//]: # ( Initialize the tokenizer and encoder)
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
encoder = AutoModel.from_pretrained('bert-base-uncased')

[//]: # ( Initialize the VectorDatabase)
vdb = VectorDatabase( encoder=encoder, tokenizer=tokenizer)

Similarly, you can pass your own encoder as a torch model if you provide a tokenizer and the 0th output is the encoding.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

VDBforGenAI-0.23.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

VDBforGenAI-0.23-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file VDBforGenAI-0.23.tar.gz.

File metadata

  • Download URL: VDBforGenAI-0.23.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for VDBforGenAI-0.23.tar.gz
Algorithm Hash digest
SHA256 a0d937d7831d633418b42a14897ff2f3380e0675fb02cd4cf223adc75fcd0c52
MD5 cb9b052a7c3eb78ad670287463068c01
BLAKE2b-256 7034d6826848fd5dddd5c1596e2db93242a2f8727ef0edffc2c9a82f3192fba4

See more details on using hashes here.

File details

Details for the file VDBforGenAI-0.23-py3-none-any.whl.

File metadata

  • Download URL: VDBforGenAI-0.23-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for VDBforGenAI-0.23-py3-none-any.whl
Algorithm Hash digest
SHA256 3fd0a8ef081542d46a3561531b1e1cfcc366d15924d384a80c5e247c856705da
MD5 548860511be4fe0fba8b7d05dbcc219e
BLAKE2b-256 3b0928d8e86d17c6e0e74041007d89c52fdf5c9d8276d64ea79776f55619480d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page