Skip to main content

Data Platform Toolkit (DPT) — SQL magic for Jupyter notebooks with Snowflake, and a DBT compile CLI.

Project description

Data Platform Toolbox (DPT)

This tool has been created in order to more efficiently work with the Data Platform. There are 4 tools in this package:

  • CLI - A CLI tool that allows the user to compile any DBT model for any environment
  • notebook_sql - A package that allows SQL queries to be executed inside of a Python notebook
  • validation:snowflake - A package that uses snowpark code to perform comparisons between two tables
  • validation:report - A package that uses pandas code to perform comparisons between two DataFrames (and can generate an excel report)

Setup

To use this package you have the option to copy the entire dpt folder to a new domain, and install the package in 'development' mode. This allows you to edit/adapt the code to your personal needs. Alternatively you can build a python "wheel", which is a single distributable file, which you can then install into the new domain repo like any other python package.

Prerequisites

This setup assumes you have already setup a Python virtual environment with all necessary DBT packages installed.

Wheel

This Repo

In your terminal, navigate to the dpt directory

cd dpt

Make sure the Python building tools are up to date

python -m pip install --upgrade pip build twine

Build the wheel

python -m build --wheel

Destination Repo

Copy and paste the wheel into the root directory of the destination repo

Install the wheel (replace <path-to-wheel-file> with actual wheel file name)

pip install <path-to-wheel-file>

"Development" Mode

Copy and paste the entire dpt folder/directory into the root directory of the destination repo

Destination Repo

Navigate to the dpt directory

cd dpt

Install the dpt package in interactive mode

pip install -e . --index-url https://proget.pggm-intra.intern/pypi/PGGMPythonGallery/simple --trusted-host proget.pggm-intra.intern

Any changes you make to the python files in the dpt repo will immediatly change the behavior of the dpt package wherever it is being used (CLI, Python script, Jupyter Notebook, etc.)

Usage

CLI

The CLI can be used from the terminal to compile any model to any environment. It uses the same syntax as the dbt CLI, for example to compile a model using dbt you would use

dbt compile -s <model-name>

Example:

dbt compile -s pre_date_date

The dpt CLI works similar, you can use the exact same syntax to compile a model

dpt compile -s <model-name>

Example:

dpt compile -s pre_date_date

The formatting of the output is nicer, and the compiled query is automatically copied to your clipboard. What the CLI does under the hood is, it sends of a dbt compile command in a virtual terminal. It then looks for the compiled SQL code in the target directory. If you want to see the actual dbt command being sent out you can use the --verbose (shorthand -v) flag

dpt compile -s <model-name> --verbose

Example:

dpt compile -s pre_date_date --verbose

The real power of the CLI is being able to compile sql code for the PRD environment. You can compile the code to any environment using the --env (shorthand -e) flag.

dpt compile -s  <model-name> --env <environment-name>

Example:

dpt compile -s pre_date_date --env PRD

Validation & Notebooks

Example scripts/notebooks have been created to showcase these packages & how to use them

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pggm_dpt_toolkit-1.0.0.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pggm_dpt_toolkit-1.0.0-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file pggm_dpt_toolkit-1.0.0.tar.gz.

File metadata

  • Download URL: pggm_dpt_toolkit-1.0.0.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.13

File hashes

Hashes for pggm_dpt_toolkit-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d5e6dc29b13245dd4624441f75c3e9bb4fce8acaad55a10422433916ecf974d6
MD5 22bc33810498a3d3acbb5294c17789fa
BLAKE2b-256 7d0d8d92b949b5a1f56ecd330d7e65329aa11beea0c65ceab82a8fc62eb4de04

See more details on using hashes here.

File details

Details for the file pggm_dpt_toolkit-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pggm_dpt_toolkit-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a2b4f895f25da76d3be116d972e5f41e378c85d8d399acb5ce0470bcbdcac1f1
MD5 0db25a4de4be4fb33c96a93396aaccb6
BLAKE2b-256 795385065be5ead1df319ca54dfa37a816211b970f90a3b8f3bd58dd37dd0544

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page