This package serves to cascadingly populate column level documentation, build & conform schema files, and audit coverage.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

dbt-osmosis

PyPI Downloads License: Apache 2.0 black

Primary Objectives

Hello and welcome to the project! dbt-osmosis 🌊 serves to enhance the developer experience significantly. We do this by automating the most of the management of schema yml files, we synchronize inheritable column level documentation which permits a write-it-once principle in a DAG oriented way, and we expose a workbench which allows you to interactively develop in dbt. The workbench allows you to develop and instantly compile models side by side (extremely performant compilation), document model columns, test the query against your data warehouse, inspect row level diffs and diff metric as you modify SQL, run tests, and more.

Workbench

The workbench is under active development. Feel free to open issues or discuss additions. There is still a lot on the roadmap regarding robustness of diffs (currently we only see rows added/removed), we have no error catched so errors don't break anything but they are piped to the app and displayed- which is pretty to a developer but not to an end user so we should use st.warning and st.error for pretty notifications and fork the the right logical path.

✔️ dbt Model Editor

✔️ Materialize Active Model in Warehouse

Not handling building of upstream dependencies if they are not materialized

✔️ Query Tester

✔️ SQL Model Data Diffs

Adding pandas engine and support for MODIFIED rows in addition to ADDED and REMOVED
Adding scorecards which show the sum of each of the 3 diff categories

✔️ Data Profiler (leverages pandas-profiling)

Need to expose config option for details or basic report to account for vairable dataset size

⚠️ Doc Editor

View only, modifications aren't committed yet

❗ Test Runner (not implemented yet)

✔️ Manifest View

The editor is able to compile models with control+enter or as you type. Its speedy!

editor

Select a target, models can also be materialized by executing the SQL against the target using dbt as a wrapper.

profiles

See when there are uncommitted changes and commit them to file when ready, or revert to initial state.

pivot-uncommitted

Test dbt models as you work against whatever profile you have selected and inspect the results.

test-model

As you develop and modify a model with uncommitted changes, you can calculate the diff. This allows you instant feedback on if the changes you make are safe.

diff-model

CLI

dbt-osmosis is ready to use as-is. To get familiar, you should run it on a fresh branch and ensure everything is backed in source control. Enjoy!

You should set a base config in your dbt_project.yml and ensure any models within the scope of your execution plan will inherit a config/preference. Example below.

models:

    your_dbt_project:

        # This config will apply to your whole project
        +dbt-osmosis: "schema/model.yml"

        staging:

            # This config will apply to your staging directory
            +dbt-osmosis: "folder.yml"

            +tags: 
                - "staged"

            +materialized: view

            monday:
                intermediate:
                    +materialized: ephemeral

        marts:

            +tags: 
                - "mart"

            supply_chain:

To use dbt-osmosis, simply run the following:

# Install
pip install dbt-osmosis
# Alternatively
pipx install dbt-osmosis


# This command executes all tasks in preferred order and is usually all you need

dbt-osmosis run --project-dir /path/to/dbt/project --target prod


# Inherit documentation in staging/salesforce/ & sync 
# schema yaml columns with database columns

dbt-osmosis document --project-dir /path/to/dbt/project --target prod --fqn staging.salesforce


# Reorganize marts/operations/ & inject undocumented models 
# into schema files or create new schema files as needed

dbt-osmosis compose --project-dir /path/to/dbt/project --target prod --fqn marts.operations


# Open the dbt-osmosis workbench

dbt-osmosis workbench

Roadmap

These features are being actively developed and will be merged into the next few minor releases

Complete build out of sources tools.
Add --min-cov flag to audit task and to workbench
Add interactive documentation flag that engages user to documents ONLY progenitors and novel columns for a subset of models (the most optimized path to full documentation coverage feasible)
Add impact command that allows us to leverage our resolved column level progenitors for ad hoc impact analysis

Features

Standardize organization of schema files (and provide ability to define and conform with code)

Config can be set on per directory basis if desired utilizing dbt_project.yml, all models which are processed require direct or inherited config +dbt-osmosis:. If even one dir is missing the config, we close gracefully and inform user to update dbt_project.yml. No assumed defaults. Placing our config under your dbt project name in models: is enough to set a default for the project since the config applies to all subdirectories.

Note: You can change these configs as often as you like or try them all, dbt-osmosis will take care of restructuring your project schema files-- no human effort required.

A directory can be configured to conform to any one of the following standards:
- Can be one schema file to one model file sharing the same name and directory ie.
```
  staging/
      stg_order.sql
      stg_order.yml
      stg_customer.sql
      stg_customer.yml
```
  - +dbt-osmosis: "model.yml"
- Can be one schema file per directory wherever model files reside named schema.yml, ie.
```
  staging/
      stg_order.sql
      stg_customer.sql
      schema.yml
```
  - +dbt-osmosis: "schema.yml"
- Can be one schema file per directory wherever model files reside named after its containing folder, ie.
```
  staging/
      stg_order.sql
      stg_customer.sql
      staging.yml
```
  - +dbt-osmosis: "folder.yml"
- Can be one schema file to one model file sharing the same name nested in a schema subdir wherever model files reside, ie.
```
  staging/
      stg_order.sql
      stg_customer.sql
      schema/
          stg_order.yml
          stg_customer.yml
```
  - +dbt-osmosis: "schema/model.yml"

Build and Inject Non-documented models

Injected models will automatically conform to above config per directory based on location of model file.
This means you can focus fully on modelling; and documentation, including yaml updates or creation, will automatically follow at any time with simple invocation of dbt-osmosis

Propagate existing column level documentation downward to children

Build column level knowledge graph accumulated and updated from furthest identifiable origin (ancestors) to immediate parents
Will automatically populate undocumented columns of the same name with passed down knowledge accumulated within the context of the models upstream dependency tree
This means you can freely generate models and all columns you pull into the models SQL that already have been documented will be automatically learned/propagated. Again the focus for analysts is almost fully on modelling and yaml work is an afterthought / less heavy of a manual lift.

Order Matters

In a full run [ dbt-osmosis run ] we will:

Conform dbt project
- Configuration lives in dbt_project.yml --> we require our config to run, can be at root level of models: to apply a default convention to a project or can be folder by folder, follows dbt config resolution where config is overridden by scope. Config is called +dbt-osmosis: "folder.yml" | "schema.yml" | "model.yml" | "schema/model.yml"
Bootstrap models to ensure all models exist
Recompile Manifest
Propagate definitions downstream to undocumented models solely within the context of each models dependency tree

Here are some of the original foundational pillars:

First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. 80% of documentation is often a matter of inheritance and continued passing down of columns from parent models to children. They need not be redocumented if there has been no mutation.

Second, we want to standardize ways that we all organize our schema files which hold the fruits of our documentation. We should be able to enforce a standard on a per directory basis and jump between layouts at will as certain folders scale up the number of models or scale down.

Lastly, and tangential to the first objective, we want to understand column level lineage, streamline impact analysis, and audit our documentation.

New workflows enabled!

Build one dbt model or a bunch of them without documenting anything (gasp)

Run dbt-osmosis run or dbt-osmosis compose && dbt-osmosis document

Sit back and watch as:

Automatically constructed/updated schema yamls are built with as much of the definitions pre-populated as possible from upstream dependencies

Schema yaml(s) are automatically organized in exactly the right directories / style that conform to the easily configurable standard upheld and enforced across your dbt project on a directory by directory basis

boom, mic drop
Problem reported by stakeholder with data (WIP)

Identify column

Run dbt-osmosis impact --model orders --column price

Find the originating model and action
Need to score our documentation (WIP)

Run dbt-osmosis audit --docs --min-cov 80

Get a curated list of all the documentation to update in your pre-bootstrapped dbt project

Sip coffee and engage in documentation
Add dbt-osmosis to a pre-commit hook to ensure all your analysts are passing down column level documentation & reaching your designated min-coverage

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.12.10

May 5, 2024

0.12.9

Apr 8, 2024

0.12.8

Mar 29, 2024

0.12.7

Mar 28, 2024

0.12.6

Jan 17, 2024

0.12.5

Jan 7, 2024

0.12.4

Sep 22, 2023

0.12.3

Sep 21, 2023

0.12.2

Sep 21, 2023

0.12.1

Sep 6, 2023

0.12.0

Sep 4, 2023

0.11.23

Aug 12, 2023

0.11.22

Jul 19, 2023

0.11.21

Jun 14, 2023

0.11.19

May 11, 2023

0.11.17

Apr 3, 2023

0.11.16

Apr 1, 2023

0.11.15

Mar 31, 2023

0.11.14

Mar 30, 2023

0.11.13

Mar 30, 2023

0.11.12

Mar 30, 2023

0.11.11

Mar 29, 2023

0.11.10

Mar 28, 2023

0.11.9

Mar 28, 2023

0.11.8

Mar 27, 2023

0.11.7

Mar 27, 2023

0.11.6

Mar 27, 2023

0.11.5

Mar 26, 2023

0.11.2

Mar 26, 2023

0.11.1

Mar 25, 2023

0.10.8

Mar 23, 2023

0.10.7

Mar 23, 2023

0.10.6

Mar 23, 2023

0.10.5

Mar 17, 2023

0.10.4

Mar 17, 2023

0.10.3

Mar 17, 2023

0.10.2

Mar 16, 2023

0.10.1

Mar 16, 2023

0.10.0

Mar 16, 2023

0.9.8

Nov 22, 2022

0.9.7

Oct 23, 2022

0.9.6

Oct 14, 2022

0.9.5

Oct 13, 2022

0.9.4

Oct 13, 2022

0.9.3

Oct 11, 2022

0.9.2

Oct 10, 2022

0.9.1

Oct 10, 2022

0.9.0

Oct 9, 2022

0.8.7

Oct 6, 2022

0.8.6

Oct 6, 2022

0.8.5

Oct 5, 2022

0.8.4

Oct 5, 2022

0.8.3

Oct 4, 2022

0.8.2

Oct 4, 2022

0.8.1

Oct 3, 2022

0.8.0

Oct 2, 2022

0.7.17

Sep 29, 2022

0.7.16

Sep 22, 2022

0.7.14

Sep 22, 2022

0.7.13

Sep 20, 2022

0.7.12

Sep 20, 2022

0.7.11

Sep 20, 2022

0.7.10

Sep 18, 2022

0.7.9

Sep 17, 2022

0.7.8

Sep 13, 2022

0.7.7

Sep 9, 2022

0.7.6

Sep 8, 2022

0.7.5

Sep 7, 2022

0.7.4

Sep 7, 2022

0.7.3

Sep 6, 2022

0.7.2

Sep 4, 2022

0.7.1

Sep 4, 2022

0.6.3

Aug 10, 2022

0.6.2

Aug 7, 2022

0.6.1

Aug 7, 2022

0.6.0

Aug 6, 2022

0.5.8

Jun 24, 2022

0.5.7

Jun 9, 2022

0.5.6

Jun 8, 2022

0.5.5

Jun 8, 2022

0.5.4

Jun 8, 2022

0.5.3

Jun 7, 2022

0.5.2

Jun 7, 2022

0.5.1

May 31, 2022

0.5.0

Mar 7, 2022

0.4.10

Mar 7, 2022

0.4.9

Mar 7, 2022

0.4.8

Feb 27, 2022

0.4.7

Dec 11, 2021

0.4.6

Dec 1, 2021

0.4.5

Oct 27, 2021

0.4.4

Oct 24, 2021

0.4.3

Oct 22, 2021

0.4.2

Oct 22, 2021

0.4.1

Oct 21, 2021

0.4.0

Oct 21, 2021

0.3.2

Oct 20, 2021

This version

0.3.1

Oct 20, 2021

0.3.0

Oct 20, 2021

0.2.3

Oct 14, 2021

0.2.1

Oct 9, 2021

0.2.0

Oct 9, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt-osmosis-0.3.1.tar.gz (32.1 kB view hashes)

Uploaded Oct 20, 2021 Source

Built Distribution

dbt_osmosis-0.3.1-py3-none-any.whl (28.7 kB view hashes)

Uploaded Oct 20, 2021 Python 3

Hashes for dbt-osmosis-0.3.1.tar.gz

Hashes for dbt-osmosis-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`698793506cf045c01ce90790029c441d84bf65e7b2c1f6e76b587e5dfac622ad`
MD5	`5dc547595f2edd91fe526b30a56f50d8`
BLAKE2b-256	`7781010004ad45c2aff692341f65565441d1949ec6d9efb777f94a40ec3bcb8e`

Hashes for dbt_osmosis-0.3.1-py3-none-any.whl

Hashes for dbt_osmosis-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`919dcd661db7dda684c1ab24761df4e4501d0e9ef862a1a19884c9bfc6bb05aa`
MD5	`27dbc7cabda9f22f1f73385705c6ca34`
BLAKE2b-256	`624b943d552cee3492a938a7ac0b4e5f690f300b4df9a55aa955653ba43a8483`