Skip to main content

Data Platform Toolkit (DPT) — SQL magic for Jupyter notebooks with Snowflake, and a DBT compile CLI.

Project description

Data Platform Toolbox (DPT)

This tool has been created in order to more efficiently work with the Data Platform. There are 4 tools in this package:

  • CLI - A CLI tool that allows the user to compile any DBT model for any environment
  • notebook_sql - A package that allows SQL queries to be executed inside of a Python notebook
  • validation:snowflake - A package that uses snowpark code to perform comparisons between two tables
  • validation:report - A package that uses pandas code to perform comparisons between two DataFrames (and can generate an excel report)

Setup

To use this package you have the option to copy the entire dpt folder to a new domain, and install the package in 'development' mode. This allows you to edit/adapt the code to your personal needs. Alternatively you can build a python "wheel", which is a single distributable file, which you can then install into the new domain repo like any other python package.

Prerequisites

This setup assumes you have already setup a Python virtual environment with all necessary DBT packages installed.

Wheel

This Repo

In your terminal, navigate to the dpt directory

cd dpt

Make sure the Python building tools are up to date

python -m pip install --upgrade pip build twine

Build the wheel

python -m build --wheel

Destination Repo

Copy and paste the wheel into the root directory of the destination repo

Install the wheel (replace <path-to-wheel-file> with actual wheel file name)

pip install <path-to-wheel-file>

"Development" Mode

Copy and paste the entire dpt folder/directory into the root directory of the destination repo

Destination Repo

Navigate to the dpt directory

cd dpt

Install the dpt package in interactive mode

pip install -e . --index-url https://proget.pggm-intra.intern/pypi/PGGMPythonGallery/simple --trusted-host proget.pggm-intra.intern

Any changes you make to the python files in the dpt repo will immediatly change the behavior of the dpt package wherever it is being used (CLI, Python script, Jupyter Notebook, etc.)

Usage

CLI

The CLI can be used from the terminal to compile any model to any environment. It uses the same syntax as the dbt CLI, for example to compile a model using dbt you would use

dbt compile -s <model-name>

Example:

dbt compile -s pre_date_date

The dpt CLI works similar, you can use the exact same syntax to compile a model

dpt compile -s <model-name>

Example:

dpt compile -s pre_date_date

The formatting of the output is nicer, and the compiled query is automatically copied to your clipboard. What the CLI does under the hood is, it sends of a dbt compile command in a virtual terminal. It then looks for the compiled SQL code in the target directory. If you want to see the actual dbt command being sent out you can use the --verbose (shorthand -v) flag

dpt compile -s <model-name> --verbose

Example:

dpt compile -s pre_date_date --verbose

The real power of the CLI is being able to compile sql code for the PRD environment. You can compile the code to any environment using the --env (shorthand -e) flag.

dpt compile -s  <model-name> --env <environment-name>

Example:

dpt compile -s pre_date_date --env PRD

Validation & Notebooks

Example Jupyter notebooks are bundled with the package and can be extracted to your project using the CLI.

List available examples

dpt examples --list

Copy examples to the current directory

dpt examples

This creates a dpt_examples/ folder containing:

  • sql_notebook/ — Demonstrates how to run SQL queries in Jupyter notebooks using the %%sql magic command, mix SQL with Python, and save results.
  • validation/ — Shows how to compare tables in Snowflake and generate Excel comparison reports.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pggm_dpt_toolkit-1.0.2.tar.gz (55.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pggm_dpt_toolkit-1.0.2-py3-none-any.whl (62.2 kB view details)

Uploaded Python 3

File details

Details for the file pggm_dpt_toolkit-1.0.2.tar.gz.

File metadata

  • Download URL: pggm_dpt_toolkit-1.0.2.tar.gz
  • Upload date:
  • Size: 55.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.13

File hashes

Hashes for pggm_dpt_toolkit-1.0.2.tar.gz
Algorithm Hash digest
SHA256 09cf9d902028f2258672b84327e6c7acbc8ac7aff6035bcc7c1e1adb5d00b69a
MD5 74ed402ef37e0cafa6120b0d210a1bcc
BLAKE2b-256 135b5b02c01c9a5c3e084e0f3bcc25d62e61ecb7ccd70be2789608bf1c05706a

See more details on using hashes here.

File details

Details for the file pggm_dpt_toolkit-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for pggm_dpt_toolkit-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1bc05861e4cc215b5bf15ca5857902aa83eb1350b610261a22aadfbcd8544d9b
MD5 aac7eafd139c595f84b2b53434fd4c5e
BLAKE2b-256 22f031ea442c9b8915ce5a2e119204de5a7ce1065c893e5e74a6d03013af44d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page