A simple adapter connection for any Streamlit LLM-powered app to use ChromaDB vector database.
Project description
📂 ChromaDBConnection
Connection for Chroma vector database, ChromaDBConnection
, has been released which makes it easy to connect any Streamlit LLM-powered app to.
With st.connection()
, connecting to a Chroma vector database becomes just a few lines of code:
import streamlit as st
from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection
configuration = {
"client": "PersistentClient",
"path": "/tmp/.chroma"
}
collection_name = "documents_collection"
conn = st.connection("chromadb",
type=ChromaDBConnection,
**configuration)
documents_collection_df = conn.get_collection_data(collection_name)
st.dataframe(documents_collection_df)
📑 ChromaDBConnection API
_connect()
There are 2 ways to connect to a Chroma client:
-
PersistentClient: Data will be persisted to a local machine
import streamlit as st from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection configuration = { "client": "PersistentClient", "path": "/tmp/.chroma" } conn = st.connection(name="persistent_chromadb", type=ChromadbConnection, **configuration)
-
HttpClient: Data will be persisted to a cloud server where Chroma resides
import streamlit as st from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection configuration = { "client": "HttpClient", "host": "localhost", "port": 8000, } conn = st.connection(name="http_connection", type=ChromadbConnection, **configuration)
create_collection()
In order to create a Chroma collection, one needs to supply a collection_name
and embedding_function_name
, embedding_config
and (optional) metadata
.
There are current possible options for embedding_function_name
:
- DefaultEmbeddingFunction
- SentenceTransformerEmbeddingFunction
- OpenAIEmbeddingFunction
- CohereEmbeddingFunction
- GooglePalmEmbeddingFunction
- GoogleVertexEmbeddingFunction
- HuggingFaceEmbeddingFunction
- InstructorEmbeddingFunction
- Text2VecEmbeddingFunction
- ONNXMiniLM_L6_V2
For DefaultEmbeddingFunction
, the embedding_config
argument can be left as an empty string. However, for other embedding functions such as OpenAIEmbeddingFunction
, one needs to provide configuration such as:
embedding_config = {
api_key: "{OPENAI_API_KEY}",
model_name: "{OPENAI_MODEL}",
}
One can also change the distance function by changing the metadata
argument, such as:
metadata = {"hnsw:space": "l2"} # Squared L2 norm
metadata = {"hnsw:space": "cosine"} # Cosine similarity
metadata = {"hnsw:space": "ip"} # Inner product
Sample code to create connection:
collection_name = "documents_collection"
embedding_function_name = "DefaultEmbeddingFunction"
conn.create_collection(collection_name=collection_name,
embedding_function_name=embedding_function_name,
embedding_config={},
metadata = {"hnsw:space": "cosine"})
get_collection_data()
This method returns a dataframe that consists of the embeddings and documents of a collection.
The attributes
argument is a list of attributes to be included in the DataFrame.
The following code snippet will return all data in a collection in the form of a DataFrame, with 2 columns: documents
and embeddings
.
collection_name = "documents_collection"
conn.get_collection_data(collection_name=collection_name,
attributes= ["documents", "embeddings"])
delete_collection()
This method deletes the stated collection name.
collection_name = "documents_collection"
conn.delete_collection(collection_name=collection_name)
upload_document()
This method uploads documents to a collection. If embeddings are not provided, the method will embed the documents using the embedding function specified in the collection.
collection_name = "documents_collection"
conn.upload_document(collection_name=collection_name,
documents=["lorem ipsum", "doc2", "doc3"],
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
ids=["id1", "id2", "id3"],
embeddings=None)
query()
This method retrieves top k relevant document based on a list of queries supplied. The result will be in a dataframe where each row will shows the top k relevant documents of each query.
collection_name = "documents_collection"
conn.upload_document(collection_name=collection_name,
documents=["lorem ipsum", "doc2", "doc3"],
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
ids=["id1", "id2", "id3"],
embeddings=None)
queried_data = conn.query(collection_name=collection_name,
query=["random_query1", "random_query2"],
num_results_limit=10,
attributes=["documents", "embeddings", "metadatas", "data"])
Metadata and document filters are also provided in where_metadata_filter
and where_document_filter
arguments respectively for more relevant search. For better understanding on the usage of where filters, please refer to: https://docs.trychroma.com/usage-guide#using-where-filters
queried_data = conn.query(collection_name=collection_name,
query=["this is"],
num_results_limit=10,
attributes=["documents", "embeddings", "metadatas", "data"],
where_metadata_filter={"chapter": "3"})
🎉 That's it! ChromaDBConnection
is ready to be used with st.connection()
. 🎉
Contribution 🔥
author={Vu Quang Minh},
github={Dev317},
year={2023}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for streamlit_chromadb_connection-1.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | bee2c5f9df73bc1190ef544dfe9a6e660c9bdc9f93548c7286fcf7a0466d0a90 |
|
MD5 | 868296935ef257c72ac0992b888ea406 |
|
BLAKE2b-256 | 7d330eb50ab085182d45412e06b43b332234ef2ab98e21270e141d57b79dd123 |
Hashes for streamlit_chromadb_connection-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b3757f5426ceab1915e62b6a391f7d607ca64b785a0f543e4e4fb2513ad6396 |
|
MD5 | 53daa23cedcce8678c93423d53e8d76b |
|
BLAKE2b-256 | a2a88c8cd5add8094c5f5272834938d3438a40ed27185d77ef585478d4a83665 |