A simple package for generating and querying Vector Databases for Generative AI as well any other reason
Project description
VDBforGenAI
VDBforGenAI is a Python package for building vector databases of text for use in natural language processing applications.
Usage
To use VDBforGenAI, first install the package and its dependencies:
pip install git+https://github.com/JakubJDolezal/VDBforGenAI.git
Next, create an instance of the VectorDatabase class by passing in a list of strings, which represent the context you care about. Each string can contain multiple sentences.
Minimal example
You instantiate a database and then tell it where to load
from VDBforGenAI.VectorDatabase import VectorDatabase
vdb = VectorDatabase(splitting_choice="length")
vdb.load_all_in_directory('./ExampleFolder')
Once you have a VectorDatabase instance, you can use the get_context_from_entire_database method to retrieve the context that is most similar to a given input text.
context = vdb.get_context_from_entire_database('What does parma ham go well with?')
print(context)
This retrieves the most similar piece of text to "What does parma ham go well with?" from your indexed directory You can also specify which level and which directory on that level you wish to search
context_selection=vdb.get_index_and_context_from_selection('Who made this?', 2, 'SubfolderOfLies')
The directory level and value structure is saved in
print(vdb.dlv)
Dependencies
VDBforGenAI has the following dependencies:
"faiss-cpu",
"transformers",
"torch",
"numpy","PyPDF2",'docx','python-docx
Contributions are welcome! If you have any suggestions or issues, please create an issue or pull request on the GitHub repository. License
VDBforGenAI is licensed under the MIT License.
More Usage -
How to add new strings
Passing an encoder and tokenizer from Hugging Face's Transformers library:
from transformers import AutoTokenizer, AutoModel
from VDBforGenAI import VectorDatabase
[//]: # ( Initialize the tokenizer and encoder)
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
encoder = AutoModel.from_pretrained('bert-base-uncased')
[//]: # ( Initialize the VectorDatabase)
vdb = VectorDatabase( encoder=encoder, tokenizer=tokenizer)
Similarly, you can pass your own encoder as a torch model if you provide a tokenizer and the 0th output is the encoding.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file VDBforGenAI-0.23.tar.gz
.
File metadata
- Download URL: VDBforGenAI-0.23.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0d937d7831d633418b42a14897ff2f3380e0675fb02cd4cf223adc75fcd0c52 |
|
MD5 | cb9b052a7c3eb78ad670287463068c01 |
|
BLAKE2b-256 | 7034d6826848fd5dddd5c1596e2db93242a2f8727ef0edffc2c9a82f3192fba4 |
File details
Details for the file VDBforGenAI-0.23-py3-none-any.whl
.
File metadata
- Download URL: VDBforGenAI-0.23-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fd0a8ef081542d46a3561531b1e1cfcc366d15924d384a80c5e247c856705da |
|
MD5 | 548860511be4fe0fba8b7d05dbcc219e |
|
BLAKE2b-256 | 3b0928d8e86d17c6e0e74041007d89c52fdf5c9d8276d64ea79776f55619480d |