chatdbt is an openai-based dbt documentation robot. You can use natural language to describe your data query requirements to the robot, and chatdbt will help you select the dbt model you need, or generate sql responses based on these dbt models to meet your need
Project description
chatdbt
What is this?
chatdbt is an openai-based dbt documentation robot. You can use natural language to describe your data query requirements to the robot, and chatdbt will help you select the dbt model you need, or generate sql responses based on these dbt models to meet your needs. Of course, you need to set up your dbt documentation for chatdbt in advance.
Quick Install
pip install chatdbt
package extras:
nomic: use nomic/atlas as vector storage backendpgvector: use pgvector as vector storage backend
Internals
Chatdbt uses openai's text-embedding-ada-002 model interface to embed your dbt documentation and save the vectors to the vector storage you provide. When you make an inquiry to chatdbt, it retrieves the models and metrics (todo😊) that are semantically similar to your question. Based on the returned content and your question, it uses openai gpt-3.5-turbo model to provide appropriate answers. Similar to langchain or llama_index.
How does chatdbt integrate with my dbt doc, and where is my embedding data stored?
There are several interfaces within chatdbt:
VectorStorageis responsible for storing embedding vectors. Currently supporting:-
atlasSet up your
api_keyandproject_nameto use Nomic Atlas for storing and retrieving the vector data. -
pgvectorSet up your
connect_stringandtable_nameto use pgvector for storing and retrieving the vector data.
-
DBTDocResolveris responsible for providing dbt manifest and catalog data. Currently supporting:-
localfsSet up
manifest_json_pathandmanifest_json_path, and chatdbt will read the dbt manifest and catalog from the local file system.
-
TikTokenProvideris responsible for estimating the number of tokens consumed by OpenAI. Currently supporting:-
tiktoken_http_serverSet up a tiktoken-http-server
api_base(example:http://localhost:8080) to use tiktoken-http-server for estimating the number of tokens consumed by OpenAI.
-
You can also implement the above interfaces yourself and integrate them into your own system.
Quick Start
You can initialize a chatdbt instance manually:
your_pgvector_connect_string = "postgresql+psycopg://postgres:foobar@localhost:5432/chatdbt"
your_pgvector_table_name = "chatdbt"
your_manifest_json_path = "data/manifest.json"
your_catalog_json_path = "data/catalog.json"
your_openai_key = "sk-foobar"
import os
os.environ["OPENAI_API_KEY"] = your_openai_key
from chatdbt import ChatBot
from chatdbt.vector_storage.pgvector import PGVectorStorage
from chatdbt.dbt_doc_resolver.localfs import LocalfsDBTDocResolver
vector_storage = PGVectorStorage(connect_string=your_pgvector_connect_string, table_name=your_pgvector_table_name)
dbt_doc_resolver = LocalfsDBTDocResolver(manifest_json_path=your_manifest_json_path, catalog_json_path=your_catalog_json_path)
bot = ChatBot(doc_resolver=dbt_doc_resolver, vector_storage=vector_storage, tiktoken_provider=None)
bot.suggest_table("query the number of users who have purchased a product")
bot.suggest_sql("query the number of users who have purchased a product")
or initialize a chatdbt instance with environment variables:
import os
os.environ["CHATDBT_I18N"] = "zh-cn"
os.environ["CHATDBT_VECTOR_STORAGE_TYPE"] = "pgvector"
os.environ[
"CHATDBT_VECTOR_STORAGE_CONFIG_CONNECT_STRING"
] = your_pgvector_connect_string
os.environ["CHATDBT_VECTOR_STORAGE_CONFIG_TABLE_NAME"] = your_pgvector_table_name
os.environ["CHATDBT_DBT_DOC_RESOLVER_TYPE"] = "localfs"
os.environ["CHATDBT_DBT_DOC_RESOLVER_CONFIG_MANIFEST_JSON_PATH"] = your_manifest_json_path
os.environ["CHATDBT_DBT_DOC_RESOLVER_CONFIG_CATALOG_JSON_PATH"] = your_catalog_json_path
os.environ["OPENAI_API_KEY"] = your_openai_key
import chatdbt
chatdbt.suggest_table("query the number of users who have purchased a product")
chatdbt.suggest_sql("query the number of users who have purchased a product")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chatdbt-0.0.5.tar.gz.
File metadata
- Download URL: chatdbt-0.0.5.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.8.13 Darwin/22.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e6ea46203c856493e3a7bfbb3f201cadf613bd6d67836433958dd1d48347359
|
|
| MD5 |
049c1e535f880e77351e606cdf282763
|
|
| BLAKE2b-256 |
860b4c465012465bcc2a54aa350455e56fd70834aca6ce598bea20a43858c4d2
|
File details
Details for the file chatdbt-0.0.5-py3-none-any.whl.
File metadata
- Download URL: chatdbt-0.0.5-py3-none-any.whl
- Upload date:
- Size: 14.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.8.13 Darwin/22.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7296a0675d60264dd76f9538f810ec9be71071abe2d1b676440aed424b90230b
|
|
| MD5 |
8b7984a47b7cb97e2bc906e029611420
|
|
| BLAKE2b-256 |
f0d92efa4e5bcf5e3593f403d9b5450bf0150a5767c661727b756c6769a2e919
|