Skip to main content

Data Platform Toolkit (DPT) — SQL magic for Jupyter notebooks with Snowflake, and a DBT compile CLI.

Project description

Data Platform Toolbox (DPT)

This tool has been created in order to more efficiently work with the Data Platform. There are 4 tools in this package:

  • CLI - A CLI tool that allows the user to compile any DBT model for any environment
  • notebook_sql - A package that allows SQL queries to be executed inside of a Python notebook
  • validation:snowflake - A package that uses snowpark code to perform comparisons between two tables
  • validation:report - A package that uses pandas code to perform comparisons between two DataFrames (and can generate an excel report)

Setup

To use this package you have the option to copy the entire dpt folder to a new domain, and install the package in 'development' mode. This allows you to edit/adapt the code to your personal needs. Alternatively you can build a python "wheel", which is a single distributable file, which you can then install into the new domain repo like any other python package.

Prerequisites

This setup assumes you have already setup a Python virtual environment with all necessary DBT packages installed.

Wheel

This Repo

In your terminal, navigate to the dpt directory

cd dpt

Make sure the Python building tools are up to date

python -m pip install --upgrade pip build twine

Build the wheel

python -m build --wheel

Destination Repo

Copy and paste the wheel into the root directory of the destination repo

Install the wheel (replace <path-to-wheel-file> with actual wheel file name)

pip install <path-to-wheel-file>

"Development" Mode

Copy and paste the entire dpt folder/directory into the root directory of the destination repo

Destination Repo

Navigate to the dpt directory

cd dpt

Install the dpt package in interactive mode

pip install -e . --index-url https://proget.pggm-intra.intern/pypi/PGGMPythonGallery/simple --trusted-host proget.pggm-intra.intern

Any changes you make to the python files in the dpt repo will immediatly change the behavior of the dpt package wherever it is being used (CLI, Python script, Jupyter Notebook, etc.)

Usage

CLI

The CLI can be used from the terminal to compile any model to any environment. It uses the same syntax as the dbt CLI, for example to compile a model using dbt you would use

dbt compile -s <model-name>

Example:

dbt compile -s pre_date_date

The dpt CLI works similar, you can use the exact same syntax to compile a model

dpt compile -s <model-name>

Example:

dpt compile -s pre_date_date

The formatting of the output is nicer, and the compiled query is automatically copied to your clipboard. What the CLI does under the hood is, it sends of a dbt compile command in a virtual terminal. It then looks for the compiled SQL code in the target directory. If you want to see the actual dbt command being sent out you can use the --verbose (shorthand -v) flag

dpt compile -s <model-name> --verbose

Example:

dpt compile -s pre_date_date --verbose

The real power of the CLI is being able to compile sql code for the PRD environment. You can compile the code to any environment using the --env (shorthand -e) flag.

dpt compile -s  <model-name> --env <environment-name>

Example:

dpt compile -s pre_date_date --env PRD

Validation & Notebooks

Example Jupyter notebooks are bundled with the package and can be extracted to your project using the CLI.

List available examples

dpt examples --list

Copy examples to the current directory

dpt examples

This creates a dpt_examples/ folder containing:

  • sql_notebook/ — Demonstrates how to run SQL queries in Jupyter notebooks using the %%sql magic command, mix SQL with Python, and save results.
  • validation/ — Shows how to compare tables in Snowflake and generate Excel comparison reports.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pggm_dpt_toolkit-1.0.1.tar.gz (55.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pggm_dpt_toolkit-1.0.1-py3-none-any.whl (62.2 kB view details)

Uploaded Python 3

File details

Details for the file pggm_dpt_toolkit-1.0.1.tar.gz.

File metadata

  • Download URL: pggm_dpt_toolkit-1.0.1.tar.gz
  • Upload date:
  • Size: 55.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.13

File hashes

Hashes for pggm_dpt_toolkit-1.0.1.tar.gz
Algorithm Hash digest
SHA256 034692cba0072562100905aac781621a6e7fefe8776405baddcdd369c67b4070
MD5 74dd3f2f7ce83097d63c618411eeb626
BLAKE2b-256 90aeccf68e2101a0eb6104c7b483e32e02a7c23c26ec4e093e3fbc5dc141580a

See more details on using hashes here.

File details

Details for the file pggm_dpt_toolkit-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pggm_dpt_toolkit-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b0d38168a711562bbfb1a40154553cf6d8975d947d01bc2490ea7878887020cb
MD5 1af37ec8efcec6d441f49d9a6345c25f
BLAKE2b-256 1c466310695f6a28adfe8b33e21aba939a4276996c9092bbb5eede70bddfebe4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page