No project description provided
Project description
Vexpresso
Vexpresso is a simple and scalable multi-modal vector database built with Daft
Features
🍵 Simple: Vexpresso is lightweight and is very easy to get started!
🔌 Flexible: Unlike many other vector databases, Vexpresso supports arbitrary datatypes. This means that you can query muti-modal objects (images, audio, video, etc...)
🌐 Scalable: Because Vexpresso uses Daft, it can be scaled using Ray to multi-gpu / cpu clusters.
📚 Persistent: Easy Saving and Loading functionality: Vexpresso has easily accessible functions for saving / loading to huggingface datasets.
Installation
To install from PyPi:
pip install vexpresso
To install from source:
git clone git@github.com:shyamsn97/vexpresso.git
cd vexpresso
pip install -e .
Usage
🔥 Check out our Showcase notebook for a more detailed walkthrough!
In this simple example, we create a simple collection and embed using huggingface sentence transformers.
from typing import List, Any
import vexpresso
# import embedding functions from vexpresso
import vexpresso.embedding_functions as ef
# creating a collection object!
collection = vexpresso.create(
data = {
"documents":[
"This is document1",
"This is document2",
"This is document3",
"This is document4",
"This is document5",
"This is document6"
],
"source":["notion", "google-docs", "google-docs", "notion", "google-docs", "google-docs"],
"num_lines":[10, 20, 30, 40, 50, 60]
}
# backend="ray" # turn this flag on to start / connect to a ray cluster!
)
# create a simple embedding function from sentence_transformers
def hf_embed_fn(content: List[Any]):
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
return model.encode(content, convert_to_tensor=True).detach().cpu().numpy()
# or use a langchain embedding function
def langchain_embed_fn(content: List[Any]):
from langchain.embeddings import OpenAIEmbeddings
embeddings_model = OpenAIEmbeddings()
return embeddings_model.embed_documents(content)
# embed function creates a column in the collection with embeddings. There can be more than one embedding column!
# lazy execution until .execute is called
collection = collection.embed(
"documents",
embedding_fn=hf_embed_fn,
to="document_embeddings",
# lazy=False # if this is false, execute doesn't need to be called
).execute()
# creating a queried collection with a subset of content closest to query
queried_collection = collection.query(
"document_embeddings",
query="query document6",
k = 4, # return 2 closest
lazy=False
# query_embedding=[query1, query2, ...]
# filter_conditions={"metadata_field":{"operator, ex: 'eq'":"value"}} # optional metadata filter
)
# batch query -- return a list of collections
# batch_queried_collection = collection.batch_query(
# "document_embeddings",
# queries=["doc1", "doc2"],
# k = 2
# )
# filter collection for documents with num_lines less than or equal to 30
filtered_collection = queried_collection.filter(
{
"num_lines": {"lte":30}
}
).execute()
# show dataframe
filtered_collection.show()
# convert to dictionary
filtered_dict = filtered_collection.to_dict()
documents = filtered_dict["documents"]
# add an entry!
collection = collection.add(
[
{"documents":"new documents 1", "source":"notion", "num_lines":2},
{"documents":"new documents 2", "source":"google-docs", "num_lines":40}
]
)
collection = collection.execute()
Resources
Contributing
Feel free to make a pull request or open an issue for a feature!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vexpresso-0.0.2.tar.gz
.
File metadata
- Download URL: vexpresso-0.0.2.tar.gz
- Upload date:
- Size: 21.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 731cfe20fb46de7982e2fdfc3d3dd69bd14832de473d70a94f71828af14ad4b1 |
|
MD5 | b6e2e697be4b5368a327905f17df7f3d |
|
BLAKE2b-256 | 7812de71f35240e77b4e1876c0c09a89fabaa5f0c3ae260d8ed3f23bf34cfac8 |
File details
Details for the file vexpresso-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: vexpresso-0.0.2-py3-none-any.whl
- Upload date:
- Size: 23.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1d7b95afd90fc7f5b0a4ab21a5847906c969cdac9f43dc39a108725b8d9c4cc |
|
MD5 | 6d5c036174386135784f293aef331e79 |
|
BLAKE2b-256 | 9bbbe2c2292b4b9f0eae1411832f904488bee582f9cd1622424d68cdd7cf4510 |