Run python scripts from any dbt project.

These details have not been verified by PyPI

Project links

Project description

Welcome to dbt-fal 👋 do more with dbt

dbt-fal adapter is the ✨easiest✨ way to run your dbt Python models.

Starting with dbt v1.3, you can now build your dbt models in Python. This leads to some cool use cases that was once difficult to build with SQL alone. Some examples are:

Using Python stats libraries to calculate stats
Building forecasts
Building other predictive models such as classification and clustering

This is fantastic! BUT, there is still one issue though! The developer experience with Snowflake and Bigquery is not great, and there is no Python support for Redshift and Postgres.

dbt-fal provides the best environment to run your Python models that works with all other data warehouses! With dbt-fal, you can:

Build and test your models locally
Isolate each model to run in its own environment with its own dependencies
[Coming Soon] Run your Python models in the ☁️ cloud ☁️ with elasticly scaling Python environments.
[Coming Soon] Even add GPUs to your models for some heavier workloads such as training ML models.

Getting Started

1. Install dbt-fal

pip install dbt-fal[bigquery,snowflake] Add your current warehouse here

2. Update your `profiles.yml` and add the fal adapter

jaffle_shop:
  target: dev_with_fal
  outputs:
    dev_with_fal:
      type: fal
      db_profile: dev_bigquery # This points to your main adapter
    dev_bigquery:
      type: bigquery
      ...

Don't forget to point to your main adapter with the db_profile attribute. This is how the fal adapter knows how to connect to your data warehouse.

3. `dbt run`!

That is it! It is really that simple 😊

4. [🚨 Cool Feature Alert 🚨] Environment management with dbt-fal

If you want some help with environment management (vs sticking to the default env that the dbt process runs in), you can create a fal_project.yml in the same folder as your dbt project and have “named environments”:

In your dbt project folder:

$ touch fal_project.yml

# Paste the config below
environments:
  - name: ml
    type: venv
    requirements:
      - prophet

and then in your dbt model:

$ vi models/orders_forecast.py

def model(dbt, fal):
    dbt.config(fal_environment="ml") # Add this line

    df: pd.DataFrame = dbt.ref("orders_daily")

The dbt.config(fal_environment=“ml”) will give you an isolated clean env to run things in, so you dont pollute your package space.

5. [Coming Soon™️] Move your compute to the Cloud!

Let us know if you are interested in this. We are looking for beta users.

`dbt-fal` command line tool

With the dbt-fal CLI, you can:

Send Slack notifications upon dbt model success or failure.
Load data from external data sources before a model starts running.
Download dbt models into a Python context with a familiar syntax: ref('my_dbt_model') using FalDbt
Programatically access rich metadata about your dbt project.

Head to our documentation site for a deeper dive or play with in-depth examples to see how fal can help you get more done with dbt.

1. Install `dbt-fal`

$ pip install dbt-fal[postgres]

2. Go to your dbt project directory

$ cd ~/src/my_dbt_project

3. Create a Python script: `send_slack_message.py`

import os
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError

CHANNEL_ID = os.getenv("SLACK_BOT_CHANNEL")
SLACK_TOKEN = os.getenv("SLACK_BOT_TOKEN")

client = WebClient(token=SLACK_TOKEN)
message_text = f"Model: {context.current_model.name}. Status: {context.current_model.status}."

try:
    response = client.chat_postMessage(
        channel=CHANNEL_ID,
        text=message_text
    )
except SlackApiError as e:
    assert e.response["error"]

4. Add a `meta` section in your `schema.yml`

models:
  - name: historical_ozone_levels
    description: Ozone levels
    config:
      materialized: table
    columns:
      - name: ozone_level
        description: Ozone level
      - name: ds
        description: Date
    meta:
      fal:
        scripts:
          - send_slack_message.py

5. Run `dbt-fal flow run`

$ dbt-fal flow run
# both your dbt models and python scripts are run

6. Alternatively run `dbt` and `fal` consecutively

$ dbt run
# Your dbt models are run

$ dbt-fal run
# Your python scripts are run

Running scripts before dbt runs

Run scripts before the model runs by using the pre-hook: configuration option.

Given the following schema.yml:

models:
  - name: boston
    description: Ozone levels
    config:
      materialized: table
    meta:
      owner: "@meder"
      fal:
        pre-hook:
          - fal_scripts/trigger_fivetran.py
        post-hook:
          - fal_scripts/slack.py

dbt-fal flow run will run fal_scripts/trigger_fivetran.py, then the boston dbt model, and finally fal_scripts/slack.py. If a model is selected with a selection flag (e.g. --select boston), the hooks associated to the model will always run with it.

$ dbt-fal flow run --select boston

Concepts

profile.yml and Credentials

fal integrates with dbt's profile.yml file to access and read data from the data warehouse. Once you setup credentials in your profile.yml file for your existing dbt workflows anytime you use ref or source to create a dataframe fal authenticates using the credentials specified in the profile.yml file.

`meta` Syntax

models:
  - name: historical_ozone_levels
    ...
    meta:
      owner: "@me"
      fal:
        post-hook:
          - send_slack_message.py
          - another_python_script.py

Use the fal and post-hook keys underneath the meta config to let fal CLI know where to look for the Python scripts. You can pass a list of scripts as shown above to run one or more scripts as a post-hook operation after a dbt run.

Variables and functions

Inside a Python script, you get access to some useful variables and functions

Variables

A context object with information relevant to the model through which the script was run. For the meta Syntax example, we would get the following:

context.current_model.name
#= historical_ozone_levels

context.current_model.meta
#= {'owner': '@me'}

context.current_model.meta.get("owner")
#= '@me'

context.current_model.status
# Could be one of
#= 'success'
#= 'error'
#= 'skipped'

context object also has access to test information related to the current model. If the previous dbt command was either test or build, the context.current_model.test property is populated with a list of tests:

context.current_model.tests
#= [CurrentTest(name='not_null', modelname='historical_ozone_levels, column='ds', status='Pass')]

`ref` and `source` functions

There are also available some familiar functions from dbt

# Refer to dbt models or sources by name and returns it as `pandas.DataFrame`
ref('model_name')
source('source_name', 'table_name')

# You can use it to get the running model data
ref(context.current_model.name)

`write_to_model` function

❗️ We recommend using the dbt-fal adapter for writing data back to your data-warehouse.

It is also possible to send data back to your data-warehouse. This makes it easy to get the data, process it and upload it back into dbt territory.

This function is available in fal Python models only, that is a Python script inside a fal_models directory and add a fal-models-paths to your dbt_project.yml

name: "jaffle_shop"
# ...
model-paths: ["models"]
# ...

vars:
  # Add this to your dbt_project.yml
  fal-models-paths: ["fal_models"]

Once added, it will automatically be run by fal without having to add any extra configurations in the schema.yml.

source_df = source('source_name', 'table_name')
ref_df = ref('a_model')

# Your code here
df = ...

# Upload a `pandas.DataFrame` back to the datawarehouse
write_to_model(df)

write_to_model also accepts an optional dtype argument, which lets you specify datatypes of columns. It works the same way as dtype argument for DataFrame.to_sql function.

from sqlalchemy.types import Integer
# Upload but specifically create the `value` column with type `integer`
# Can be useful if data has `None` values
write_to_model(df, dtype={'value': Integer()})

Importing `fal` as a Python package

You may be interested in accessing dbt models and sources easily from a Jupyter Notebook or another Python script. For that, just import the fal package and intantiate a FalDbt project:

from fal.dbt import FalDbt
faldbt = FalDbt(profiles_dir="~/.dbt", project_dir="../my_project")

faldbt.list_sources()
# [
#    DbtSource(name='results' ...),
#    DbtSource(name='ticket_data_sentiment_analysis' ...)
#    ...
# ]

faldbt.list_models()
# [
#    DbtModel(name='zendesk_ticket_data' ...),
#    DbtModel(name='agent_wait_time' ...)
#    ...
# ]


sentiments = faldbt.source('results', 'ticket_data_sentiment_analysis')
# pandas.DataFrame
tickets = faldbt.ref('stg_zendesk_ticket_data')
# pandas.DataFrame

Why are we building this?

We think dbt is great because it empowers data people to get more done with the tools that they are already familiar with.

This library will form the basis of our attempt to more comprehensively enable data science workloads downstream of dbt. And because having reliable data pipelines is the most important ingredient in building predictive analytics, we are building a library that integrates well with dbt.

Have feedback or need help?

Join us in fal on Discord
Join the dbt Community and go into our #tools-fal channel

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.5.9

Sep 4, 2023

1.5.8

Aug 21, 2023

1.5.7

Jul 17, 2023

1.5.6

Jul 17, 2023

1.5.5

Jul 17, 2023

1.5.4

Jun 28, 2023

1.5.3

Jun 12, 2023

1.5.2

Jun 1, 2023

1.5.1

May 17, 2023

1.5.0

Apr 28, 2023

1.4.9

Jun 1, 2023

1.4.8

Apr 27, 2023

1.4.7

Apr 13, 2023

1.4.6

Apr 6, 2023

1.4.5

Mar 30, 2023

1.4.4

Mar 14, 2023

1.4.3

Mar 13, 2023

1.4.2

Feb 17, 2023

1.4.1

Feb 14, 2023

1.4.0

Feb 2, 2023

1.3.16

Mar 23, 2023

1.3.15

Mar 20, 2023

1.3.14

Mar 20, 2023

1.3.13

Feb 6, 2023

1.3.12

Jan 20, 2023

1.3.11

Jan 12, 2023

1.3.10

Dec 26, 2022

1.3.9

Dec 22, 2022

1.3.8

Dec 16, 2022

1.3.7

Dec 6, 2022

1.3.6

Nov 18, 2022

1.3.5

Nov 11, 2022

1.3.4

Nov 10, 2022

1.3.3

Nov 9, 2022

1.3.2

Nov 7, 2022

1.3.1

Nov 7, 2022

1.3.0

Nov 4, 2022

1.3.0rc1 pre-release

Nov 1, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_fal-1.5.9.tar.gz (96.6 kB view details)

Uploaded Sep 4, 2023 Source

Built Distribution

dbt_fal-1.5.9-py3-none-any.whl (123.5 kB view details)

Uploaded Sep 4, 2023 Python 3

File details

Details for the file dbt_fal-1.5.9.tar.gz.

File metadata

Download URL: dbt_fal-1.5.9.tar.gz
Upload date: Sep 4, 2023
Size: 96.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.8.17 Linux/5.15.0-1041-azure

File hashes

Hashes for dbt_fal-1.5.9.tar.gz
Algorithm	Hash digest
SHA256	`50f13af042fcaf91285fb8a58e494841de996d6ec9d614d93d8d167cae168077`
MD5	`8f5418ca93dc169eac12e81cc42ff063`
BLAKE2b-256	`01234f20974fe054fb7b864da982d56b370ca7fb094c62dd72e5e8d11d194c2d`

See more details on using hashes here.

File details

Details for the file dbt_fal-1.5.9-py3-none-any.whl.

File metadata

Download URL: dbt_fal-1.5.9-py3-none-any.whl
Upload date: Sep 4, 2023
Size: 123.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.8.17 Linux/5.15.0-1041-azure

File hashes

Hashes for dbt_fal-1.5.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`81de0baf3d3e19dddbf51a7007fd70c4b0976f1d516ee983fa77493c3ff311fa`
MD5	`f1a3397b2cf6484aec8e912991fae07d`
BLAKE2b-256	`284dbd038a65e5ef70218951e94582296dbcfba16c282863bf0d54437b3df270`

See more details on using hashes here.

dbt-fal 1.5.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Welcome to dbt-fal 👋 do more with dbt

Getting Started

1. Install dbt-fal

2. Update your profiles.yml and add the fal adapter

3. dbt run!

4. [🚨 Cool Feature Alert 🚨] Environment management with dbt-fal

5. [Coming Soon™️] Move your compute to the Cloud!

dbt-fal command line tool

1. Install dbt-fal

2. Go to your dbt project directory

3. Create a Python script: send_slack_message.py

4. Add a meta section in your schema.yml

5. Run dbt-fal flow run

6. Alternatively run dbt and fal consecutively

Running scripts before dbt runs

Concepts

profile.yml and Credentials

meta Syntax

Variables and functions

Variables

ref and source functions

write_to_model function

Importing fal as a Python package

Why are we building this?

Have feedback or need help?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2. Update your `profiles.yml` and add the fal adapter

3. `dbt run`!

`dbt-fal` command line tool

1. Install `dbt-fal`

3. Create a Python script: `send_slack_message.py`

4. Add a `meta` section in your `schema.yml`

5. Run `dbt-fal flow run`

6. Alternatively run `dbt` and `fal` consecutively

`meta` Syntax

`ref` and `source` functions

`write_to_model` function

Importing `fal` as a Python package