Vector Vault: Customize ChatGPT and unleash the full potential of generative AI with Vector Vault
Project description
By combining vector similarity search with generative ai chat, new possibilities for conversation emerge. For example, product information can be added to the Vault, and when a customer asks a product question, the right product information can be instantly retreived and seamlessly used in conversation by ChatGPT for an accurate response. This capability allows for informed conversation that range from ai automated customer support, to new ways to get news, to ai code reviews that reference source documentation, to ai domain experts for specific knowledge sets, and much more. Vector Vault was built to allow you to tap into ^ this potential.
Vector Vault is a vector database cloud service built to make generative ai chat quick and easy. It allows you to seamlessly vectorize data and access it from the cloud. It's scalable to both small projects and large applications with millions of users. Vector Vault has been designed with a user-friendly code interface to make the process of working with vector search easy and let you focus on what matters. Vector Vault ensures secure and isolated data handling and enables you to create and interact with vector databases - aka "vaults" - in under one second response times, from our serverless cloud architecture backed by Google.
We've integrated all the chat options people like to use with LangChain. By combining vector databases with OpenAI's chat in the vectorvault
package, we've been able to hide a lot of the complexity in the background and make it really easy to build the kind of custom chat experience you want to build.
With Vector Vault, integrating vector search results into your chat app is not only easy, it's the default. If you have been looking for an easy and reliable way to use vector databases with ChatGPT, then Vector Vault is for you. You will need an api key in order to access the Vault Cloud. If you don't already have one, you can sign up for a free account at VectorVault.io
Full Python API:
pip install vector-vault
: install
from vectorvault import Vault
: import
v = Vault(user='your_eamil', api_key='your_api_key')
: Open a Vault instance
v.add(text, meta=None, name='', split=False, split_size=1000)
: Loads data to be added to the Vault, with automatic text splitting for long texts. text
is a text string. meta
is a dictionary. split=True
will split your text input, based on your split_size
, which will be the approximate size of each split. For each split, a new item will automatically be created. name
is a shortcut to adding a name field to the meta without creating a dictionary. If you don't create a dictionary, one with generic information will be created. If you don't assign a name, a generic one will be created. text
is the only required input.
v.get_vectors()
: Retrieves vectors embeddings for all loaded data. (No parameters)
v.save()
: Saves all loaded data with embeddings to the Vault (cloud), along with any metadata. (No parameters)
v.delete()
: Deletes the current Vault and all contents. (No parameters)
v.get_vaults()
: Retrieves a list of Vaults within the current Vault directory. (No parameters)
v.get_similar(text, n)
: Retrieves similar texts from the Vault for a given input text - Processes vectors in the cloud. text
is required. n
is optional, default = 4.
v.get_similar_local(text, n)
: Retrieves similar texts from the Vault for a given input text - Processes vectors locally. text
is required. n
is optional, default = 4. Local version for speed optimized local production.
v.get_total_items()
: Returns the total number of items in the Vault
v.clear_cache()
: Clears the cache for all the loaded items - add()
loads an item
v.get_items_by_vector(vector, n)
: Returns vector similar items. Requires input vector, returns similar items. n
is number of items you want returned, default = 4
v.get_distance(id1, id2)
: For getting the vector distance between two items id1
and id2
in the Vault.
Items can be retrieved from the Vault with a nearest neighbor search using
get_similar()
and the item_ids can be found in the metadata. Item_ids are numeric and sequential, so accessing all items in the Vault can be done by iterating from beginning to end - e.g.for i in range vault.get_total_items():
v.get_item_vector(id)
: returns the vector for item id
in the Vault.
v.get_items(ids)
: returns a list containing your item(s). ids
is a list of ids, one or many
v.cloud_stream(function)
: For cloud application yielding the chat stream, like a flask app. Called like v.cloud_stream(v.get_chat_stream('some_text'))
in the return of a flask app.
v.print_stream(function)
: For locally printing the chat stream. Called like v.print_stream(v.get_chat_stream('some_text'))
. You can also assign a variable to it like reply = v.print_stream()
It still streams to the console, but the final complete text will also be available in the reply
variable.
v.get_chat()
: Retrieves a response from ChatGPT, with parameters for handling conversation history, summarizing responses, and retrieving context-based responses that reference similar data in the vault. (See dedicated section below on using this function and its' parameters)
v.get_chat_stream()
: Retrieves a response from ChatGPT in stream format, with parameters for handling conversation history, summarizing responses, and retrieving context-based responses that reference similar data in the Vault. (See dedicated section below on using this function and its' parameters)
get_vectors()
utilizes OpenAI's embeddings api, internally batches vector embeddings with the text-embeddings-ada-002 model, and comes with auto rate-limiting and concurrent requests for maximum processing speed
Access The Vault:
Install Vector Vault:
pip install vector-vault
Build The Vault:
Set your openai key as an envorionment variable
os.environ['OPENAI_API_KEY'] = 'your_openai_api_key'
- Create a Vault instance
- Gather some text data we want to store
- Add the data to the Vault
- Get vectors embeddings
- Save to the Vault Cloud
from vectorvault import Vault
vault = Vault(user='YOUR_EMAIL', api_key='YOU_API_KEY', vault='NAME_OF_VAULT')
# a new vault will be created if the name does not already exist
# so you can create a Vault or connect to an exisiting Vault
# by calling this Vault instance
text_data = 'some data'
vault.add(text_data)
vault.get_vectors()
vault.save()
vault.add()
is very versitile. You can add any length of text, even a full book...and it will be all automatically split and processed. vault.get_vectors()
is also extremely flexible. You can vault.add()
as much as you want, and then when you're done, process all the vectors at once with a single vault.get_vectors()
call - Which internally batches vector embeddings with OpenAI's text-embeddings-ada-002, and comes with auto rate-limiting and concurrent requests for maximum processing speed.
vault.add(very_large_text)
vault.get_vectors()
vault.save()
# these three lines execute fast and can be called mid-conversation before a reply
Small save loads are usually finished in less than a second. Large loads depend on total data size.
A test was done adding the full text of 37 books at once. The
get_vectors()
function took 8 minutes and 56 seconds. (For comparison, processing one at a time via OpenAI's embedding function would take roughly two days)
Use The Vault:
From command line:
curl -X POST "https://api.vectorvault.io/get_similar" \
-H "Content-Type: application/json" \
-d '{
"user": "your_username",
"api_key": "your_api_key",
"vault": "your_vault_name",
"text": "Your text input"
}'
[{"data":"NASA Mars Exploration... (shortend for brevity)","metadata":{"created_at":"2023-05-29T19:21:20.846023","item_id":0,"name":"webdump-0","updated_at":"2023-05-29T19:21:20.846028"}}]
In Python:
# The same exact call, but in Python:
similar_data = vault.get_similar("Your text input")
for result in similar_data:
print(result['data'])
NASA Mars Exploration... NASA To Host Briefing... Program studies Mars... A Look at a Steep North Polar...
The metadata:
print(similar_data[0]['metadata']) # printing from only the first result
{"created_at":"2023-05-29T19:21:20.846023","item_id":0,"name":"webdump-0","updated_at":"2023-05-29T19:21:20.846028"}
Printing the data and metadata together:
for result in similar_data:
print(result['data'])
print(result['metadata'])
NASA Mars Exploration... {"created_at":"2023-05-29T19...} NASA To Host Briefing... {"created_at":"2023-05-29T19...} Program studies Mars... {"created_at":"2023-05-29T19...} A Look at a Steep North Polar... {"created_at":"2023-05-29T19...}
Metadata Made Easy
# To add metadata to your vault, just include the meta as a parameter in `add()`. Meta is always a dict, and you can add any fields you want.
meta = {
'name': 'Lifestyle in LA',
'country': 'United State',
'city': 'LA'
}
vault.add(text, meta)
vault.get_vectors()
vault.save()
# To add just the 'name' field to the metadata:
vault.add(text, name='Lifestyle in LA')
vault.get_vectors()
vault.save()
# To find the name later:
similar_data = vault.get_similar("Your text input")
print(similar_data[0]['metadata'])
Lifestyle in LA
Any Fields:
# Add any fields you want to the metadata:
with open('1984.txt', 'r') as file:
text = file.read()
book_metadata = {
'title': '1984',
'author': 'George Orwell',
'genre': 'Dystopian',
'publication_year': 1949,
'publisher': 'Secker & Warburg',
'ISBN': '978-0451524935',
'language': 'English',
'page_count': 328
}
vault.add(text, book_metadata)
vault.get_vectors()
vault.save()
# Later you can get all those fields
similar_data = vault.get_similar("How will the government control you in the future?")
for result in similar_data:
print(result['metadata']['title'])
print(result['metadata']['author'])
print(result['metadata']['genre'])
1984 George Orwell Dystopian 1984 George Orwell Dystopian 1984 George Orwell Dystopian 1984 George Orwell Dystopian
# list is always returned, so you can do a for loop like above or numerically like this
similar_data = vault.get_similar("How will the government control you in the future?")
print(similar_data[0]['metadata']['title'])
1984
Change Vaults
# print the list of vaults inside the current vault directory
science_vault = Vault(user='your_user_id', api_key='your_api_key', vault='science')
print(science_vault.get_vaults())
['biology', 'physics', 'chemistry']
Access vaults within vaults with
# biology vault within science vault
biology_vault = Vault(user='YOUR_EMAIL', api_key='YOUR_API_KEY', vault='science/biology')
# chemistry vault within science vault
chemistry_vault = Vault(user='YOUR_EMAIL', api_key='YOUR_API_KEY', vault='science/chemistry')
print(chemistry_vault.get_vaults())
['reactions', 'formulas', 'lab notes']
# lab notes vault within chemistry vault
lab_notes_vault = Vault(user='YOUR_EMAIL', api_key='YOUR_API_KEY', vault='science/chemistry/lab notes')
Use get_chat()
with get_context=True
to get response from chatgpt referencing vault data
question = "Should I use Vector Vault for my next generative ai application?"
answer = vault.get_chat(question, get_context=True)
print(answer)
Vector Vault makes building generative ai easy, so you should consider using Vector Vault for your next generative ai project. Additionally, it is important to keep in mind your specific use cases and the other technologies you are working with. However, given the fact that Vector Vault can be integrated in any work flow and be isolated in a cloud environment, it is an ideal package to integrate into any application that you want to utilize generative ai with. To do so, just send the text inputs to your Vector Vault implementation and return the response. With this in mind, it is likely that Vector Vault would make building your next generative ai application both faster and easier.
To integrate vault data in the response, you need to pass get_context=True
# this will get context from the vault, then ask chatgpt the question
answer = vault.get_chat(question, get_context=True)
# this will send to chatgpt only and not interact with the Vault in any way
answer = vault.get_chat(question)
ChatGPT
Use ChatGPT with get_chat()
Get chat response from OpenAI's ChatGPT. Rate limiting, auto retries, and chat histroy slicing auto-built-in so you can create complex chat capability without getting complicated. Enter your text, optionally add chat history, and optionally choose a summary response instead (default: summmary=False)
The get_chat() function:
get_chat(self, text: str, history: str = None, summary: bool = False, get_context = False, n_context = 4, return_context = False, history_search = False, model='gpt-3.5-turbo', include_context_meta=False, custom_prompt=False)
-
Example Signle Usage:
response = vault.get_chat(text)
-
Example Chat:
response = vault.get_chat(text, chat_history)
-
Example Summary:
summary = vault.get_chat(text, summary=True)
-
Example Context-Based Response:
response = vault.get_chat(text, get_context=True)
-
Example Context-Based Response w/ Chat History:
response = vault.get_chat(text, chat_history, get_context=True)
-
Example Context-Response with Context Samples Returned:
vault_response = vault.get_chat(text, get_context=True, return_context=True)
Response is a string, unless return_context == True, then response will be a dictionary -
Example Custom Prompt:
response = vault.get_chat(text, chat_history, get_context=True, custom_prompt=my_prompt)
custom_prompt
overrides the stock prompt we provide. Check ai.py to see the originals we provide.
llm
and llm_stream
models manage history internally, so the content
is the only variable to be included and formattable in the prompt.
Example WIHTOUT Vault Context:
my_prompt = """Answer this question as if you were a financial advisor: "{content}". """
response = vault.get_chat(text, chat_history, get_context=True, custom_prompt=my_prompt)
Getting context from the Vault is usually the goal when customizing text generation, and doing that requires additional prompt variables.
llm_w_context
and llm__w_context_stream
models inject the history, context, and user input all in one prompt. In this case, your custom prompt needs to have history
, context
and question
formattable in the prompt like so:
Example WITH Vault Context:
custom_prompt = """
Use the following Context to answer the Question at the end.
Answer as if you were the modern voice of the context, without referencing the context or mentioning that fact any context has been given. Make sure to not just repeat what is referenced. Don't preface or give any warnings at the end.
Chat History (if any): {history}
Additional Context: {context}
Question: {question}
(Respond to the Question directly. Be the voice of the context, and most importantly: be interesting, engaging, and helpful)
Answer:
"""
response = vault.get_chat(text, chat_history, get_context=True, custom_prompt=my_prompt)
Normal Usage:
# connect to the vault you want to use
vault = Vault(user='YOUR_EMAIL', api_key='YOUR_API_KEY', vault='philosophy')
# text input
question = "How do you find happiness?"
# get response
answer = vault.get_chat(question, get_context=True)
print(answer)
The answer to finding happiness is not one-size-fits-all, as it can mean different things to different people. However, it has been found that happiness comes from living and working in line with your values and virtues, and finding pleasure in the actions that accord with them. Additionally, having good friends who share your values and provide support and companionship enhances happiness. It is important to remember that happiness cannot be solely dependent on external factors such as material possessions or fleeting pleasures, as they are subject to change and instability. Rather, true happiness may come from an inner sense of mastery and control over yourself and your actions, as well as a sense of purpose and meaning in life.
Summarize Anything:
You can summarize any text, no matter how large - even an entire book all at once. Long texts are split into the largest possible chunk sizes and a summary is generated for each chunk. When all summaries are finished, they are concatenated and returned as one.
# get summary, no matter how large the input text
summary = vault.get_chat(text, summary=True)
Want to make it a certain length?
# make a summary under a legnth of 1000 characters
summary = vault.get_chat(text, summary=True)
while len(summary) > 1000:
summary = vault.get_chat(summary, summary=True)
Streaming:
Use the built-in streaming functionality to get interactive chat streaming. Here's an app we built to showcase what you can do with Vector Vault:
get_chat_stream():
See it in action. Check our examples folder that has Colab notebooks you can be running in the browser seconds from now.
The get_chat()
function returns the whole message at once, whereas the get_chat_stream()
yields each word as it's received. Other than that, they are nearly identical and have nearly the same input parameters. Streaming is a much better experience and the preferred option for front end applications users interact with.
## get_chat()
print(vault.get_chat(text, history))
## get_chat_stream()
for word in vault.get_chat_stream(text, history):
print(word)
# But it's best to use the built in print function: print_stream()
vault.print_stream(vault.get_chat_stream(text, history))
# With print_stream() final answer is returned after streaming completes, so you can make it a variable
answer = vault.print_stream(vault.get_chat_stream(text, history))
The get_chat_stream() function:
get_chat_stream(self, text: str, history: str = None, summary: bool = False, get_context = False, n_context = 4, return_context = False, history_search = False, model='gpt-3.5-turbo', include_context_meta=False, metatag=False, metatag_prefixes=False, metatag_suffixes=False, custom_prompt=False)
Always use this get_chat_stream() wrapped by either print_stream(), or cloud_stream(). cloud_stream() is for cloud functions, like a flask app serving a front end elsewhere. print_stream() is for local console printing
-
Example Signle Usage:
response = vault.print_stream(vault.get_chat_stream(text))
-
Example Chat:
response = vault.print_stream(vault.get_chat_stream(text, chat_history))
-
Example Summary:
summary = vault.print_stream(vault.get_chat_stream(text, summary=True))
-
Example Context-Based Response:
response = vault.print_stream(vault.get_chat_stream(text, get_context = True))
-
Example Context-Based Response w/ Chat History:
response = vault.print_stream(vault.get_chat_stream(text, chat_history, get_context = True))
-
Example Context-Response with Context Samples Returned:
vault_response = vault.print_stream(vault.get_chat_stream(text, get_context = True, return_context = True))
-
Example Custom Prompt:
response = vault.get_chat(text, chat_history, get_context=True, custom_prompt=my_prompt)
custom_prompt
overrides the stock prompt we provide. Check ai.py to see the originals we provide.
llm
and llm_stream
models manage history internally, so the content
is the only variable to be included and formattable in the prompt. Visit the get_chat_stream() function in vault.py for more information on metatags or check out our examples folder streaming tutorial.
Example WIHTOUT Vault Context:
my_prompt = """Answer this question as if you were a financial advisor: "{content}". """
response = vault.print_stream(vault.get_chat_stream(text, chat_history, get_context = True, custom_prompt=my_prompt))
Getting context from the Vault is usually the goal when customizing text generation, and doing that requires additional prompt variables.
llm_w_context
and llm__w_context_stream
models inject the history, context, and user input all in one prompt. In this case, your custom prompt needs to have history
, context
and question
formattable in the prompt like so:
Example WITH Vault Context:
custom_prompt = """
Use the following Context to answer the Question at the end.
Answer as if you were the modern voice of the context, without referencing the context or mentioning that fact any context has been given. Make sure to not just repeat what is referenced. Don't preface or give any warnings at the end.
Chat History (if any): {history}
Additional Context: {context}
Question: {question}
(Respond to the Question directly. Be the voice of the context, and most importantly: be interesting, engaging, and helpful)
Answer:
"""
response = vault.print_stream(vault.get_chat_stream(text, chat_history, get_context = True, custom_prompt=my_prompt))
Streaming is a key for front end applications, so we also built a cloud_stream
function to make cloud streaming to your front end app easy. In a flask app, all you need to do is recieve the customer text, then call the vault in the return like this:
# Stream from a flask app in one line
return Response(vault.cloud_stream(vault.get_chat_stream(text, history, get_context=True)), mimetype='text/event-stream')
This makes going live with a high level app extremely fast and easy, plus your infrastructure will be scalable and robust. Now you can build impressive applications in record time! If have any questions, message in Discord. Check out our Colab notebooks in the examples folder you can run in the browser right now.
Build an AI Cusomter Service Chat Bot
In the following code, we will add all of a company's past support conversations to a cloud Vault. (We load the company support texts from a .txt file, vectorize them, then add them to the Vault). As new people message in, we will vector search the Vault for similar questions and answers. We take the past answers returned from the Vault and instruct ChatGPT to use those previous answers to answer this new question. (NOTE: This will also work based on a customer FAQ, or customer support response templates).
Create the Customer Service Vault
from vectorvault import Vault
os.environ['OPENAI_API_KEY'] = 'your_openai_api_key'
vault = Vault(user='your_email', api_key='your_api_key', vault='Customer Service')
with open('customer_service.txt', 'r') as f:
vault.add(f.read())
vault.get_vectors()
vault.save()
And just like that, in a only a few lines of code we created a customer service vault. Now whenever you want to use it in production, just use the get_chat()
with get_context=True
, which will take the customer's question, search the vault to find the most similar questions and answers, then have ChatGPT reply to the customer using that information.
customer_question = "I just bought your XD2000 remote and I'm having trouble syncing it to my tv"
chatbot_answer = vault.get_chat(customer_question, get_context=True)
That's all it takes to create an AI customer service chatbot that responds as well as any human support rep!
Getting Started:
Open the examples folder and try out the Google Colab tutorials we have! They will show you a lot, plus they are in Google Colab, so no local set up required, just open them up and press play.
Contact:
If have any questions, drop a message in the Vector Vault Discord channel, happy to help.
Happy coding!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for vector_vault-2.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0dbdc7a34d87dbbcb956f48f0934a3ea099640e087695a24787ee1eb2dfd01b2 |
|
MD5 | 7345bc5021c8e425b05f6a9875b494e7 |
|
BLAKE2b-256 | fdede6be83faaeaa012f4f28e62dbbc3208a5262cf37cf32f394d7301ba96d21 |