Data Platform Toolkit (DPT) — SQL magic for Jupyter notebooks with Snowflake, and a DBT compile CLI.
Project description
Data Platform Toolbox (DPT)
This tool has been created in order to more efficiently work with the Data Platform. There are 4 tools in this package:
- CLI - A CLI tool that allows the user to compile any DBT model for any environment
- notebook_sql - A package that allows SQL queries to be executed inside of a Python notebook
- validation:snowflake - A package that uses snowpark code to perform comparisons between two tables
- validation:report - A package that uses pandas code to perform comparisons between two DataFrames (and can generate an excel report)
Setup
To use this package you have the option to copy the entire dpt folder to a new domain, and install the package in 'development' mode. This allows you to edit/adapt the code to your personal needs. Alternatively you can build a python "wheel", which is a single distributable file, which you can then install into the new domain repo like any other python package.
Prerequisites
This setup assumes you have already setup a Python virtual environment with all necessary DBT packages installed.
- Python 3.11
- Virtual environment setup & active
- VS Code Jupyter extension
- VS Code Python extension
Wheel
This Repo
In your terminal, navigate to the dpt directory
cd dpt
Make sure the Python building tools are up to date
python -m pip install --upgrade pip build twine
Build the wheel
python -m build --wheel
Destination Repo
Copy and paste the wheel into the root directory of the destination repo
Install the wheel (replace <path-to-wheel-file> with actual wheel file name)
pip install <path-to-wheel-file>
"Development" Mode
Copy and paste the entire dpt folder/directory into the root directory of the destination repo
Destination Repo
Navigate to the dpt directory
cd dpt
Install the dpt package in interactive mode
pip install -e . --index-url https://proget.pggm-intra.intern/pypi/PGGMPythonGallery/simple --trusted-host proget.pggm-intra.intern
Any changes you make to the python files in the dpt repo will immediatly change the behavior of the dpt package wherever it is being used (CLI, Python script, Jupyter Notebook, etc.)
Usage
CLI
The CLI can be used from the terminal to compile any model to any environment. It uses the same syntax as the dbt CLI, for example to compile a model using dbt you would use
dbt compile -s <model-name>
Example:
dbt compile -s pre_date_date
The dpt CLI works similar, you can use the exact same syntax to compile a model
dpt compile -s <model-name>
Example:
dpt compile -s pre_date_date
The formatting of the output is nicer, and the compiled query is automatically copied to your clipboard. What the CLI does under the hood is, it sends of a dbt compile command in a virtual terminal. It then looks for the compiled SQL code in the target directory. If you want to see the actual dbt command being sent out you can use the --verbose (shorthand -v) flag
dpt compile -s <model-name> --verbose
Example:
dpt compile -s pre_date_date --verbose
The real power of the CLI is being able to compile sql code for the PRD environment. You can compile the code to any environment using the --env (shorthand -e) flag.
dpt compile -s <model-name> --env <environment-name>
Example:
dpt compile -s pre_date_date --env PRD
Validation & Notebooks
Example Jupyter notebooks are bundled with the package and can be extracted to your project using the CLI.
List available examples
dpt examples --list
Copy examples to the current directory
dpt examples
This creates a dpt_examples/ folder containing:
- sql_notebook/ — Demonstrates how to run SQL queries in Jupyter notebooks using the
%%sqlmagic command, mix SQL with Python, and save results. - validation/ — Shows how to compare tables in Snowflake and generate Excel comparison reports.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pggm_dpt_toolkit-1.0.1.tar.gz.
File metadata
- Download URL: pggm_dpt_toolkit-1.0.1.tar.gz
- Upload date:
- Size: 55.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
034692cba0072562100905aac781621a6e7fefe8776405baddcdd369c67b4070
|
|
| MD5 |
74dd3f2f7ce83097d63c618411eeb626
|
|
| BLAKE2b-256 |
90aeccf68e2101a0eb6104c7b483e32e02a7c23c26ec4e093e3fbc5dc141580a
|
File details
Details for the file pggm_dpt_toolkit-1.0.1-py3-none-any.whl.
File metadata
- Download URL: pggm_dpt_toolkit-1.0.1-py3-none-any.whl
- Upload date:
- Size: 62.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0d38168a711562bbfb1a40154553cf6d8975d947d01bc2490ea7878887020cb
|
|
| MD5 |
1af37ec8efcec6d441f49d9a6345c25f
|
|
| BLAKE2b-256 |
1c466310695f6a28adfe8b33e21aba939a4276996c9092bbb5eede70bddfebe4
|