A simple package for generating and querying Vector Databases for Generative AI as well any other reason

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

VDBforGenAI

VDBforGenAI is a Python package for building vector databases of text for use in natural language processing applications.

Usage

To use VDBforGenAI, first install the package and its dependencies:

pip install git+https://github.com/JakubJDolezal/VDBforGenAI.git

Next, create an instance of the VectorDatabase class by passing in a list of strings, which represent the context you care about. Each string can contain multiple sentences.

Minimal example

You instantiate a database and then tell it where to load

from VDBforGenAI.VectorDatabase import VectorDatabase

vdb = VectorDatabase(splitting_choice="length")
vdb.load_all_in_directory('./ExampleFolder')

Once you have a VectorDatabase instance, you can use the get_context_from_entire_database method to retrieve the context that is most similar to a given input text.

context = vdb.get_context_from_entire_database('What does parma ham go well with?')

print(context)

This retrieves the most similar piece of text to "What does parma ham go well with?" from your indexed directory You can also specify which level and which directory on that level you wish to search

context_selection=vdb.get_index_and_context_from_selection('Who made this?', 2, 'SubfolderOfLies')

The directory level and value structure is saved in

print(vdb.dlv)

Dependencies

VDBforGenAI has the following dependencies:

        "faiss-cpu",
        "transformers",
        "torch",
        "numpy","PyPDF2",'docx','python-docx

Contributions are welcome! If you have any suggestions or issues, please create an issue or pull request on the GitHub repository. License

VDBforGenAI is licensed under the MIT License.

More Usage -

How to add new strings

Passing an encoder and tokenizer from Hugging Face's Transformers library:

from transformers import AutoTokenizer, AutoModel
from VDBforGenAI import VectorDatabase

[//]: # ( Initialize the tokenizer and encoder)
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
encoder = AutoModel.from_pretrained('bert-base-uncased')

[//]: # ( Initialize the VectorDatabase)
vdb = VectorDatabase( encoder=encoder, tokenizer=tokenizer)

Similarly, you can pass your own encoder as a torch model if you provide a tokenizer and the 0th output is the encoding.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.23

May 3, 2023

0.22

May 1, 2023

0.21

May 1, 2023

0.2

Apr 29, 2023

0.1 yanked

Apr 29, 2023

Reason this release was yanked:

Wrong description and bug in splitting code

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

VDBforGenAI-0.23.tar.gz (10.5 kB view details)

Uploaded May 3, 2023 Source

Built Distribution

VDBforGenAI-0.23-py3-none-any.whl (16.5 kB view details)

Uploaded May 3, 2023 Python 3

File details

Details for the file VDBforGenAI-0.23.tar.gz.

File metadata

Download URL: VDBforGenAI-0.23.tar.gz
Upload date: May 3, 2023
Size: 10.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for VDBforGenAI-0.23.tar.gz
Algorithm	Hash digest
SHA256	`a0d937d7831d633418b42a14897ff2f3380e0675fb02cd4cf223adc75fcd0c52`
MD5	`cb9b052a7c3eb78ad670287463068c01`
BLAKE2b-256	`7034d6826848fd5dddd5c1596e2db93242a2f8727ef0edffc2c9a82f3192fba4`

See more details on using hashes here.

File details

Details for the file VDBforGenAI-0.23-py3-none-any.whl.

File metadata

Download URL: VDBforGenAI-0.23-py3-none-any.whl
Upload date: May 3, 2023
Size: 16.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for VDBforGenAI-0.23-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3fd0a8ef081542d46a3561531b1e1cfcc366d15924d384a80c5e247c856705da`
MD5	`548860511be4fe0fba8b7d05dbcc219e`
BLAKE2b-256	`3b0928d8e86d17c6e0e74041007d89c52fdf5c9d8276d64ea79776f55619480d`

See more details on using hashes here.

VDBforGenAI 0.23

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VDBforGenAI

Usage

Minimal example

More Usage -

How to add new strings

Passing an encoder and tokenizer from Hugging Face's Transformers library:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes