A simple adapter connection for any Streamlit LLM-powered app to use ChromaDB vector database.
Project description
📂 ChromaDBConnection
Connection for Chroma vector database, ChromaDBConnection
, has been released which makes it easy to connect any Streamlit LLM-powered app to.
With st.connection()
, connecting to a Chroma vector database becomes just a few lines of code:
import streamlit as st
from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection
configuration = {
"client": "PersistentClient",
"path": "/tmp/.chroma"
}
collection_name = "documents_collection"
conn = st.connection("chromadb",
type=ChromaDBConnection,
**configuration)
documents_collection_df = conn.get_collection_data(collection_name)
st.dataframe(documents_collection_df)
📑 ChromaDBConnection API
_connect()
There are 2 ways to connect to a Chroma client:
-
PersistentClient: Data will be persisted to a local machine
import streamlit as st from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection configuration = { "client": "PersistentClient", "path": "/tmp/.chroma" } conn = st.connection(name="persistent_chromadb", type=ChromadbConnection, **configuration)
-
HttpClient: Data will be persisted to a cloud server where Chroma resides
import streamlit as st from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection configuration = { "client": "HttpClient", "host": "localhost", "port": 8000, } conn = st.connection(name="http_connection", type=ChromadbConnection, **configuration)
create_collection()
In order to create a Chroma collection, one needs to supply a collection_name
and embedding_function_name
, embedding_config
and (optional) metadata
.
There are current possible options for embedding_function_name
:
- DefaultEmbeddingFunction
- SentenceTransformerEmbeddingFunction
- OpenAIEmbeddingFunction
- CohereEmbeddingFunction
- GooglePalmEmbeddingFunction
- GoogleVertexEmbeddingFunction
- HuggingFaceEmbeddingFunction
- InstructorEmbeddingFunction
- Text2VecEmbeddingFunction
- ONNXMiniLM_L6_V2
For DefaultEmbeddingFunction
, the embedding_config
argument can be left as an empty string. However, for other embedding functions such as OpenAIEmbeddingFunction
, one needs to provide configuration such as:
embedding_config = {
api_key: "{OPENAI_API_KEY}",
model_name: "{OPENAI_MODEL}",
}
One can also change the distance function by changing the metadata
argument, such as:
metadata = {"hnsw:space": "l2"} # Squared L2 norm
metadata = {"hnsw:space": "cosine"} # Cosine similarity
metadata = {"hnsw:space": "ip"} # Inner product
Sample code to create connection:
collection_name = "documents_collection"
embedding_function_name = "DefaultEmbeddingFunction"
conn.create_collection(collection_name=collection_name,
embedding_function_name=embedding_function_name,
embedding_config={},
metadata = {"hnsw:space": "cosine"})
get_collection_data()
This method returns a dataframe that consists of the embeddings and documents of a collection.
The attributes
argument is a list of attributes to be included in the DataFrame.
The following code snippet will return all data in a collection in the form of a DataFrame, with 2 columns: documents
and embeddings
.
collection_name = "documents_collection"
conn.get_collection_data(collection_name=collection_name,
attributes= ["documents", "embeddings"])
delete_collection()
This method deletes the stated collection name.
collection_name = "documents_collection"
conn.delete_collection(collection_name=collection_name)
upload_documents()
This method uploads documents to a collection. If embeddings are not provided, the method will embed the documents using the embedding function specified in the collection.
collection_name = "documents_collection"
conn.upload_documents(collection_name=collection_name,
documents=["lorem ipsum", "doc2", "doc3"],
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
ids=["id1", "id2", "id3"])
update_collection_data()
This method updates documents in a collection based on their ids.
conn.upload_documents(collection_name=collection_name,
documents=["this is a", "this is b", "this is c"],
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
ids=["id1", "id2", "id3"])
conn.update_collection_data(collection_name=collection_name,
documents=["this is b", "this is c", "this is d"],
ids=["id1", "id2", "id3"])
query()
This method retrieves top k relevant document based on a list of queries supplied. The result will be in a dataframe where each row will shows the top k relevant documents of each query.
collection_name = "documents_collection"
conn.upload_documents(collection_name=collection_name,
documents=["lorem ipsum", "doc2", "doc3"],
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
ids=["id1", "id2", "id3"],
embeddings=None)
queried_data = conn.query(collection_name=collection_name,
query=["random_query1", "random_query2"],
num_results_limit=10,
attributes=["documents", "embeddings", "metadatas", "data"])
Metadata and document filters are also provided in where_metadata_filter
and where_document_filter
arguments respectively for more relevant search. For better understanding on the usage of where filters, please refer to: https://docs.trychroma.com/usage-guide#using-where-filters
queried_data = conn.query(collection_name=collection_name,
query=["this is"],
num_results_limit=10,
attributes=["documents", "embeddings", "metadatas", "data"],
where_metadata_filter={"chapter": "3"})
🎉 That's it! ChromaDBConnection
is ready to be used with st.connection()
. 🎉
Contribution 🔥
author={Vu Quang Minh},
github={Dev317},
year={2023}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for streamlit_chromadb_connection-1.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2608c99fda67eec01846a13b6f5bb6c87d6a3cbdf07d88d238088bbe43c9198b |
|
MD5 | b1aec1184591572f437b2469e8ff7d1e |
|
BLAKE2b-256 | 2723b6f4c90c94cae055add43bd26c667cf627a35efd37866bf5e5ec057f7d39 |
Hashes for streamlit_chromadb_connection-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a9757570c95f3a940fcc56b8d8100c363c70a90124798136ef0fb20a91a76ce |
|
MD5 | 73b192883bc69bd81edf7b94db2e936f |
|
BLAKE2b-256 | ef47ddb94b25ab7e6d66420f73708565c1eceab951c4d654df6a2dbeffdec900 |