A simple adapter connection for any Streamlit LLM-powered app to use ChromaDB vector database.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

📂 ChromaDBConnection

Screenshot

Connection for Chroma vector database, ChromaDBConnection, has been released which makes it easy to connect any Streamlit LLM-powered app to.

With st.connection(), connecting to a Chroma vector database becomes just a few lines of code:

import streamlit as st
from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection

configuration = {
    "client": "PersistentClient",
    "path": "/tmp/.chroma"
}

collection_name = "documents_collection"

conn = st.connection("chromadb",
                    type=ChromaDBConnection,
                    **configuration)
documents_collection_df = conn.get_collection_data(collection_name)
st.dataframe(documents_collection_df)

📑 ChromaDBConnection API

_connect()

There are 2 ways to connect to a Chroma client:

PersistentClient: Data will be persisted to a local machine

import streamlit as st
from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection

configuration = {
    "client": "PersistentClient",
    "path": "/tmp/.chroma"
}

conn = st.connection(name="persistent_chromadb",
                    type=ChromadbConnection,
                    **configuration)

HttpClient: Data will be persisted to a cloud server where Chroma resides

import streamlit as st
from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection

configuration = {
    "client": "HttpClient",
    "host": "localhost",
    "port": 8000,
}

conn = st.connection(name="http_connection",
                     type=ChromadbConnection,
                     **configuration)

create_collection()

In order to create a Chroma collection, one needs to supply a collection_name and embedding_function_name, embedding_config and (optional) metadata.

There are current possible options for embedding_function_name:

DefaultEmbeddingFunction
SentenceTransformerEmbeddingFunction
OpenAIEmbeddingFunction
CohereEmbeddingFunction
GooglePalmEmbeddingFunction
GoogleVertexEmbeddingFunction
HuggingFaceEmbeddingFunction
InstructorEmbeddingFunction
Text2VecEmbeddingFunction
ONNXMiniLM_L6_V2

For DefaultEmbeddingFunction, the embedding_config argument can be left as an empty string. However, for other embedding functions such as OpenAIEmbeddingFunction, one needs to provide configuration such as:

embedding_config = {
    api_key: "{OPENAI_API_KEY}",
    model_name: "{OPENAI_MODEL}",
}

One can also change the distance function by changing the metadata argument, such as:

metadata = {"hnsw:space": "l2"} # Squared L2 norm
metadata = {"hnsw:space": "cosine"} # Cosine similarity
metadata = {"hnsw:space": "ip"} # Inner product

Sample code to create connection:

collection_name = "documents_collection"
embedding_function_name = "DefaultEmbeddingFunction"
conn.create_collection(collection_name=collection_name,
                       embedding_function_name=embedding_function_name,
                       embedding_config={},
                       metadata = {"hnsw:space": "cosine"})

get_collection_data()

This method returns a dataframe that consists of the embeddings and documents of a collection. The attributes argument is a list of attributes to be included in the DataFrame. The following code snippet will return all data in a collection in the form of a DataFrame, with 2 columns: documents and embeddings.

collection_name = "documents_collection"
conn.get_collection_data(collection_name=collection_name,
                        attributes= ["documents", "embeddings"])

delete_collection()

This method deletes the stated collection name.

collection_name = "documents_collection"
conn.delete_collection(collection_name=collection_name)

upload_document()

This method uploads documents to a collection. If embeddings are not provided, the method will embed the documents using the embedding function specified in the collection.

collection_name = "documents_collection"
conn.upload_document(collection_name=collection_name,
                     documents=["lorem ipsum", "doc2", "doc3"],
                     metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
                     ids=["id1", "id2", "id3"],
                     embeddings=None)

query()

This method retrieves top k relevant document based on a list of queries supplied. The result will be in a dataframe where each row will shows the top k relevant documents of each query.

collection_name = "documents_collection"
conn.upload_document(collection_name=collection_name,
                     documents=["lorem ipsum", "doc2", "doc3"],
                     metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
                     ids=["id1", "id2", "id3"],
                     embeddings=None)

queried_data = conn.query(collection_name=collection_name,
                          query=["random_query1", "random_query2"],
                          num_results_limit=10,
                          attributes=["documents", "embeddings", "metadatas", "data"])

Metadata and document filters are also provided in where_metadata_filter and where_document_filter arguments respectively for more relevant search. For better understanding on the usage of where filters, please refer to: https://docs.trychroma.com/usage-guide#using-where-filters

queried_data = conn.query(collection_name=collection_name,
                         query=["this is"],
                         num_results_limit=10,
                         attributes=["documents", "embeddings", "metadatas", "data"],
                         where_metadata_filter={"chapter": "3"})

🎉 That's it! ChromaDBConnection is ready to be used with st.connection(). 🎉

Contribution 🔥

author={Vu Quang Minh},
github={Dev317},
year={2023}

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.0.5

Jul 12, 2024

1.0.4

Jul 10, 2024

1.0.3

Dec 5, 2023

1.0.2

Dec 4, 2023

This version

1.0.0

Dec 4, 2023

0.0.5

Dec 4, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamlit_chromadb_connection-1.0.0.tar.gz (9.9 kB view details)

Uploaded Dec 4, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

streamlit_chromadb_connection-1.0.0-py3-none-any.whl (7.8 kB view details)

Uploaded Dec 4, 2023 Python 3

File details

Details for the file streamlit_chromadb_connection-1.0.0.tar.gz.

File metadata

Download URL: streamlit_chromadb_connection-1.0.0.tar.gz
Upload date: Dec 4, 2023
Size: 9.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for streamlit_chromadb_connection-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`15eb85834e17451fbbb9b9640273afccfbd524a7e068e7c5d49ed3d047adf1ea`
MD5	`ffd71397071900e16557e0774a1d4e88`
BLAKE2b-256	`ea3eca094958339e2cf270f50576ff7688a9c2338b72c8d2ed87c07cfa700c73`

See more details on using hashes here.

File details

Details for the file streamlit_chromadb_connection-1.0.0-py3-none-any.whl.

File metadata

Download URL: streamlit_chromadb_connection-1.0.0-py3-none-any.whl
Upload date: Dec 4, 2023
Size: 7.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for streamlit_chromadb_connection-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b5cc11f62fa9fe93af69037946e2eb2952609868588ece747d5d6d3b7be99efb`
MD5	`11dddc9c3c4e59acd964376d9144ef2b`
BLAKE2b-256	`cdc52e1cd5c53297068cad005e5398b1de18f761e6122c99557d7ad65c1359fb`

See more details on using hashes here.

streamlit-chromadb-connection 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

📂 ChromaDBConnection

📑 ChromaDBConnection API

_connect()

create_collection()

get_collection_data()

delete_collection()

upload_document()

query()

Contribution 🔥

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes