Skip to main content

LLM based tools for dbt projects. Answer data questions, generate documentation and more.

Project description

dbt-llm-tools

LLM based tools for dbt projects. Answer data questions, generate documentation and more.

Currently Includes:

  • Chatbot: ask questions about data and get answers based on your dbt model documentation
  • Documentation Generator: generate documentation for dbt models based on model and upstream model definition.

Get Started

Installation

dbt-llm-tools can be installed via pip.

pip install dbt-llm-tools

Basic Usage - Chatbot

How to load your dbt project into the Chatbot and ask questions about your data.

from dbt_llm_tools import Chatbot

# Instantiate a chatbot object
chatbot = Chatbot(
	dbt_project_root='/path/to/dbt/project',
	openai_api_key='YOUR_OPENAI_API_KEY',
)

# Step 1. Load models information from your dbt ymls into a local vector store
chatbot.load_models()

# Step 2. Ask the chatbot a question
response = chatbot.ask_question(
    'How can I obtain the number of customers who upgraded to a paid plan in the last 3 months?'
)
print(response)

Note: dbt-llm-tools currently only supports OpenAI ChatGPT models for generating embeddings and responses to queries.

How it works

The Chatbot is based on the concept of Retrieval Augmented Generation and basically works as follows:

  • When you call the chatbot.load_models() method, the bot scans all the folders in the locations specified by you for dbt YML files.
  • It then converts all the models into a text description, which are stored as embeddings in a vector database. The bot currently only supports ChromaDB as a vector db, which is persisted in a file on your local machine.
  • When you ask a query, it fetches 3 models whose description is found to be the most relevant for your query.
  • These models are then fed into ChatGPT as a prompt, along with some basic instructions and your question.
  • The response is returned to you as a string.

Basic Usage - Documentation Generator

How to load your dbt project into the Documentation Generator and have it write documentation for your models.

from dbt_llm_tools import DocumentationGenerator

# Instantiate a Documentation Generator object
doc_gen = DocumentationGenerator(
    dbt_project_root="YOUR_DBT_PROJECT_PATH",
    openai_api_key="YOUR_OPENAI_API_KEY",
)

# Generate documentation for a model and all its upstream models
doc_gen.generate_documentation(
    model_name='dbt_model_name',
    write_documentation_to_yaml=False
)

Advanced Usage

You can control the behaviour of some of the class member functions in more detail, or inspect the underlying classes for more functionality.

The Chatbot is composed of two classes:

  • Vector Store
  • DBT Project
    • Composed of DBT Model

Here are the classes and methods they expose:

Chatbot

A class representing a chatbot that allows users to ask questions about dbt models.

Attributes:
    project (DbtProject): The dbt project being used by the chatbot.
    store (VectorStore): The vector store being used by the chatbot.

Methods:
    set_embedding_model: Set the embedding model for the vector store.
    set_chatbot_model: Set the chatbot model for the chatbot.
    get_instructions: Get the instructions for the chatbot.
    set_instructions: Set the instructions for the chatbot.
    load_models: Load the models into the vector store.
    reset_model_db: Reset the model vector store.
    ask_question: Ask the chatbot a question and get a response.

Methods

init

Initializes a chatbot object along with a default set of instructions.

    Args:
        dbt_project_root (str): The absolute path to the root of the dbt project.
        openai_api_key (str): Your OpenAI API key.

        embedding_model (str, optional): The name of the OpenAI embedding model to be used.
        Defaults to "text-embedding-3-large".

        chatbot_model (str, optional): The name of the OpenAI chatbot model to be used.
            Defaults to "gpt-4-turbo-preview".

        db_persist_path (str, optional): The path to the persistent database file.
            Defaults to "./chroma.db".

    Returns:
        None

load_models

Upsert the set of models that will be available to your chatbot into a vector store. The chatbot will only be able to use these models to answer questions and nothing else.

The default behavior is to load all models in the dbt project, but you can specify a subset of models, included folders or excluded folders to customize the set of models that will be available to the chatbot.

    Args:
        models (list[str], optional): A list of model names to load into the vector store.

        included_folders (list[str], optional): A list of paths to all folders that should be included
        in model search. Paths are relative to dbt project root.

        exclude_folders (list[str], optional): A list of paths to all folders that should be excluded
        in model search. Paths are relative to dbt project root.

    Returns:
        None

ask_question

Ask the chatbot a question about your dbt models and get a response. The chatbot looks the dbt models most similar to the user query and uses them to answer the question.

    Args:
        query (str): The question you want to ask the chatbot.

    Returns:
        str: The chatbot's response to your question.

reset_model_db

This will reset and remove all the models from the vector store. You'll need to load the models again using the load_models method if you want to use the chatbot.

    Returns:
        None

get_instructions

Get the instructions being used to tune the chatbot.

    Returns:
        list[str]: A list of instructions being used to tune the chatbot.

set_instructions

Set the instructions for the chatbot.

    Args:
        instructions (list[str]): A list of instructions for the chatbot.

    Returns:
        None

set_embedding_model

Set the embedding model for the vector store.

    Args:
        model (str): The name of the OpenAI embedding model to be used.

    Returns:
        None

set_chatbot_model

Set the chatbot model for the chatbot.

    Args:
        model (str): The name of the OpenAI chatbot model to be used.

    Returns:
        None

Appendices

These are the underlying classes that are used to compose the functionality of the chatbot.

Vector Store

A class representing a vector store for dbt models.

Methods:
    get_client: Returns the client object for the vector store.
    upsert_models: Upsert the models into the vector store.
    reset_collection: Clear the collection of all documents.

DBT Project

A class representing a DBT project yaml parser.

Attributes:
    project_root (str): Absolute path to the root of the dbt project being parsed

DBT Model

A class representing a dbt model.

Attributes:
    name (str): The name of the model.
    description (str, optional): The description of the model.
    columns (list[DbtModelColumn], optional): A list of columns contained in the model.
        May or may not be exhaustive.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_llm_tools-0.1.1a0.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

dbt_llm_tools-0.1.1a0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file dbt_llm_tools-0.1.1a0.tar.gz.

File metadata

  • Download URL: dbt_llm_tools-0.1.1a0.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/6.5.0-1016-azure

File hashes

Hashes for dbt_llm_tools-0.1.1a0.tar.gz
Algorithm Hash digest
SHA256 1783dd0d30ad22cb93c0215335ec92f137d871c23449df29ff2fa653a6e7915c
MD5 f68da918fff2975e1d23ab7e9e50af00
BLAKE2b-256 12bc37970122c04be2cccc11c5c76bc4e99e7475c6fe49e0597e164efc8cb37c

See more details on using hashes here.

File details

Details for the file dbt_llm_tools-0.1.1a0-py3-none-any.whl.

File metadata

  • Download URL: dbt_llm_tools-0.1.1a0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.12 Linux/6.5.0-1016-azure

File hashes

Hashes for dbt_llm_tools-0.1.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 3f89e66c6bc9f3a810d33e05f8e770afd94486f0a2ee9b2c1018f0cd60009565
MD5 f1a1331aa0e18c8c16901cb89c1659af
BLAKE2b-256 8df7aa800be0b6695a820f6d426f16b02846f79a240bfc276d66d0920b08dba1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page