Skip to main content

No project description provided

Project description

drawingVexpresso

Vexpresso is a simple and scalable multi-modal vector database built with Daft

Querying Pokemon with images and text

Features

🍵 Simple: Vexpresso is lightweight and is very easy to get started!

🔌 Flexible: Unlike many other vector databases, Vexpresso supports arbitrary datatypes. This means that you can query muti-modal objects (images, audio, video, etc...)

🌐 Scalable: Because Vexpresso uses Daft, it can be scaled using Ray to multi-gpu / cpu clusters.

📚 Persistent: Easy Saving and Loading functionality: Vexpresso has easily accessible functions for saving / loading to huggingface datasets.

Installation

To install from PyPi:

pip install vexpresso

To install from source:

git clone git@github.com:shyamsn97/vexpresso.git
cd vexpresso
pip install -e .

Usage

🔥 Check out our Showcase notebook for a more detailed walkthrough!

In this simple example, we create a simple collection and embed using huggingface sentence transformers.

from typing import List, Any
import vexpresso
# import embedding functions from vexpresso
import vexpresso.embedding_functions as ef

# creating a collection object!
collection = vexpresso.create(
    data = {
        "documents":[
            "This is document1",
            "This is document2",
            "This is document3",
            "This is document4",
            "This is document5",
            "This is document6"
        ],
        "source":["notion", "google-docs", "google-docs", "notion", "google-docs", "google-docs"],
        "num_lines":[10, 20, 30, 40, 50, 60]
    }
    # backend="ray" # turn this flag on to start / connect to a ray cluster!
)

# create a simple embedding function from sentence_transformers
def hf_embed_fn(content: List[Any]):
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
    return model.encode(content, convert_to_tensor=True).detach().cpu().numpy()

# or use a langchain embedding function
def langchain_embed_fn(content: List[Any]):
    from langchain.embeddings import OpenAIEmbeddings
    embeddings_model = OpenAIEmbeddings()
    return embeddings_model.embed_documents(content)

# embed function creates a column in the collection with embeddings. There can be more than one embedding column!
# lazy execution until .execute is called
collection = collection.embed(
    "documents",
    embedding_fn=hf_embed_fn,
    to="document_embeddings",
    # lazy=False # if this is false, execute doesn't need to be called
).execute()

# creating a queried collection with a subset of content closest to query
queried_collection = collection.query(
    "document_embeddings",
    query="query document6",
    k = 4, # return 2 closest
    lazy=False
    # query_embedding=[query1, query2, ...]
    # filter_conditions={"metadata_field":{"operator, ex: 'eq'":"value"}} # optional metadata filter
)

# batch query -- return a list of collections
# batch_queried_collection = collection.batch_query(
#     "document_embeddings",
#     queries=["doc1", "doc2"],
#     k = 2
# )

# filter collection for documents with num_lines less than or equal to 30
filtered_collection = queried_collection.filter(
    {
        "num_lines": {"lte":30}
    }
).execute()

# show dataframe
filtered_collection.show()

# convert to dictionary
filtered_dict = filtered_collection.to_dict()
documents = filtered_dict["documents"]

# add an entry!
collection = collection.add(
    [
        {"documents":"new documents 1", "source":"notion", "num_lines":2},
        {"documents":"new documents 2", "source":"google-docs", "num_lines":40}
    ]
)
collection = collection.execute()

Resources

Contributing

Feel free to make a pull request or open an issue for a feature!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vexpresso-0.0.2.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

vexpresso-0.0.2-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file vexpresso-0.0.2.tar.gz.

File metadata

  • Download URL: vexpresso-0.0.2.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for vexpresso-0.0.2.tar.gz
Algorithm Hash digest
SHA256 731cfe20fb46de7982e2fdfc3d3dd69bd14832de473d70a94f71828af14ad4b1
MD5 b6e2e697be4b5368a327905f17df7f3d
BLAKE2b-256 7812de71f35240e77b4e1876c0c09a89fabaa5f0c3ae260d8ed3f23bf34cfac8

See more details on using hashes here.

File details

Details for the file vexpresso-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: vexpresso-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for vexpresso-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d1d7b95afd90fc7f5b0a4ab21a5847906c969cdac9f43dc39a108725b8d9c4cc
MD5 6d5c036174386135784f293aef331e79
BLAKE2b-256 9bbbe2c2292b4b9f0eae1411832f904488bee582f9cd1622424d68cdd7cf4510

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page