A simple adapter connection for any Streamlit LLM-powered app to use ChromaDB vector database.
Project description
📂 ChromaDBConnection
Connection for Chroma vector database, ChromaDBConnection
, has been released which makes it easy to connect any Streamlit LLM-powered app to.
With st.connection()
, connecting to a Chroma vector database becomes just a few lines of code:
import streamlit as st
from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection
configuration = {
"client": "PersistentClient",
"path": "/tmp/.chroma"
}
collection_name = "documents_collection"
conn = st.connection("chromadb",
type=ChromaDBConnection,
**configuration)
documents_collection_df = conn.get_collection_data(collection_name)
st.dataframe(documents_collection_df)
📑 ChromaDBConnection API
_connect()
There are 2 ways to connect to a Chroma client:
-
PersistentClient: Data will be persisted to a local machine
import streamlit as st from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection configuration = { "client": "PersistentClient", "path": "/tmp/.chroma" } conn = st.connection(name="persistent_chromadb", type=ChromadbConnection, **configuration)
-
HttpClient: Data will be persisted to a cloud server where Chroma resides
import streamlit as st from streamlit_chromadb_connection.chromadb_connection import ChromadbConnection configuration = { "client": "HttpClient", "host": "localhost", "port": 8000, } conn = st.connection(name="http_connection", type=ChromadbConnection, **configuration)
create_collection()
In order to create a Chroma collection, one needs to supply a collection_name
and embedding_function_name
, embedding_config
and (optional) metadata
.
There are current possible options for embedding_function_name
:
- DefaultEmbeddingFunction
- SentenceTransformerEmbeddingFunction
- OpenAIEmbeddingFunction
- CohereEmbeddingFunction
- GooglePalmEmbeddingFunction
- GoogleVertexEmbeddingFunction
- HuggingFaceEmbeddingFunction
- InstructorEmbeddingFunction
- Text2VecEmbeddingFunction
- ONNXMiniLM_L6_V2
For DefaultEmbeddingFunction
, the embedding_config
argument can be left as an empty string. However, for other embedding functions such as OpenAIEmbeddingFunction
, one needs to provide configuration such as:
embedding_config = {
api_key: "{OPENAI_API_KEY}",
model_name: "{OPENAI_MODEL}",
}
One can also change the distance function by changing the metadata
argument, such as:
metadata = {"hnsw:space": "l2"} # Squared L2 norm
metadata = {"hnsw:space": "cosine"} # Cosine similarity
metadata = {"hnsw:space": "ip"} # Inner product
Sample code to create connection:
collection_name = "documents_collection"
embedding_function_name = "DefaultEmbeddingFunction"
conn.create_collection(collection_name=collection_name,
embedding_function_name=embedding_function_name,
embedding_config={},
metadata = {"hnsw:space": "cosine"})
get_collection_data()
This method returns a dataframe that consists of the embeddings and documents of a collection.
The attributes
argument is a list of attributes to be included in the DataFrame.
The following code snippet will return all data in a collection in the form of a DataFrame, with 2 columns: documents
and embeddings
.
collection_name = "documents_collection"
conn.get_collection_data(collection_name=collection_name,
attributes= ["documents", "embeddings"])
delete_collection()
This method deletes the stated collection name.
collection_name = "documents_collection"
conn.delete_collection(collection_name=collection_name)
upload_documents()
This method uploads documents to a collection. If embeddings are not provided, the method will embed the documents using the embedding function specified in the collection.
collection_name = "documents_collection"
embedding_function_name = "DefaultEmbeddingFunction"
embedding_config = {}
conn.upload_documents(collection_name=collection_name,
documents=["lorem ipsum", "doc2", "doc3"],
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
embeeding_function_name=embedding_function_name,
embedding_config=embedding_config,
ids=["id1", "id2", "id3"])
update_collection_data()
This method updates documents in a collection based on their ids.
embedding_function_name = "DefaultEmbeddingFunction"
embedding_config = {}
conn.upload_documents(collection_name=collection_name,
documents=["this is a", "this is b", "this is c"],
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
embeeding_function_name=embedding_function_name,
embedding_config=embedding_config,
ids=["id1", "id2", "id3"])
conn.update_collection_data(collection_name=collection_name,
documents=["this is b", "this is c", "this is d"],
embeeding_function_name=embedding_function_name,
embedding_config=embedding_config,
ids=["id1", "id2", "id3"])
query()
This method retrieves top k relevant document based on a list of queries supplied. The result will be in a dataframe where each row will shows the top k relevant documents of each query.
collection_name = "documents_collection"
embedding_function_name = "DefaultEmbeddingFunction"
embedding_config = {}
conn.upload_documents(collection_name=collection_name,
documents=["lorem ipsum", "doc2", "doc3"],
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
ids=["id1", "id2", "id3"],
embeeding_function_name=embedding_function_name,
embedding_config=embedding_config,
embeddings=None)
queried_data = conn.query(collection_name=collection_name,
query=["random_query1", "random_query2"],
num_results_limit=10,
attributes=["documents", "embeddings", "metadatas", "data"])
Metadata and document filters are also provided in where_metadata_filter
and where_document_filter
arguments respectively for more relevant search. For better understanding on the usage of where filters, please refer to: https://docs.trychroma.com/usage-guide#using-where-filters
queried_data = conn.query(collection_name=collection_name,
query=["this is"],
num_results_limit=10,
attributes=["documents", "embeddings", "metadatas", "data"],
where_metadata_filter={"chapter": "3"})
🎉 That's it! ChromaDBConnection
is ready to be used with st.connection()
. 🎉
Contribution 🔥
author={Vu Quang Minh},
github={Dev317},
year={2023}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file streamlit_chromadb_connection-1.0.5.tar.gz
.
File metadata
- Download URL: streamlit_chromadb_connection-1.0.5.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f61ec7fa8971503e24fcd90588a18aab244a1ef728c29446b3d304e494131a3 |
|
MD5 | 4b57c616b542ece4ba62ff3cba907cd7 |
|
BLAKE2b-256 | 9c6bc18679bbb9757ee894a1ce5b640aa0e4891c207eb5251cfc401d45da4bcd |
File details
Details for the file streamlit_chromadb_connection-1.0.5-py3-none-any.whl
.
File metadata
- Download URL: streamlit_chromadb_connection-1.0.5-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a00846bf5e84b7f686e14be70a7f2e2b68902e88283ac1f294d6911e58c77c2 |
|
MD5 | 600d63b66c0325b958bb221b747b361f |
|
BLAKE2b-256 | 97cc82efb902ed0d6ece8b7e18bd9a8e5669d78aafe63790ca9d3df1d441cb0b |