Skip to main content

Package to parse SQL scripts for information

Project description

SQL Express

Introduction

How often do you find yourself in the following situation:

  • Having tons of SQL files to maintain
    • E.g. trust & safety teams with many rules written in SQL
    • E.g. data engineering teams with many ETL jobs written in SQL
  • No time to read & understand the logic of every single SQL
    • E.g. each SQL could be up to 1000s of lines long
    • E.g. logic could be complicated & not well documented
  • Struggle with data quality issues
    • E.g. output has issue because source tables have issue / missing data
  • Struggle with query optimization
    • E.g. SQLs may contain repeated logic that can be combined to save cost & money

Wouldn't it be nice if there is a package that can:

  • Summarize the logic of your SQLs
  • Perform data sanity checks on the required source tables
  • Identify repeated logics across your SQLs

Given a single SQL file, this package is designed to:

  • Parse query structure
  • Extract source tables
  • < more to come >

Given multiple SQL files, this package is further designed to:

  • Bulk perform all functionalities listed above
  • Visualise relationship between source & output tables across SQLs
  • < more to come >

Getting Started

Installation

pip install sqlexpress

Single SQL File

Using command line:

# print query structure
python3 -m sqlexpress structure -f tests/data/example1.sql

# print source tables
python3 -m sqlexpress sources -f tests/data/example1.sql

Using python:

from sqlexpress.parsers import QueryParser
query = open('tests/data/example1.sql', 'r').read()
parser = QueryParser(query)

# print query structure
parser.print()

# get source tables
source_tables = parser.extract_sources()  # ['`project.dataset.raw1`', ...]

Multiple SQL Files

Firstly, create a yaml file containing details of the SQL files (example).

Using command line:

# process yaml file
python3 -m sqlexpress bulk -f tests/data/bulk1.yaml

# visualize relationships between tables
export WEBSERVER_FOLDER=tests/data  # folder containing yaml files
python3 -m sqlexpress webserver     # start web server

Extracted details are stored in a json file (example).

Web Server

Installation

pip install sqlexpress[web]

Home page

Shows all yaml files inside WEBSERVER_FOLDER.

  • Click on Extract button to process yaml file.
  • Click on file name to jump to project page.

home_page

Project page

Shows the following details:

  1. Input - Contents of yaml file
  2. Output - Contents of json file
  3. Graph - Visualisation of source & output tables
  4. Table: List all tables extracted from SQLs

project_page

Under The Hood

Want to identify the structure of the query first so that it is easier to identify other things (e.g. source tables). This is done in 2 steps:

parse_logic

With the parsed structure, we can construct a directed table graph (below) even with long, complicated queries. The source tables can be easily identified from the table graph (nodes with only outward arrows).

table_graph

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlexpress-0.2.4.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

sqlexpress-0.2.4-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file sqlexpress-0.2.4.tar.gz.

File metadata

  • Download URL: sqlexpress-0.2.4.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.2

File hashes

Hashes for sqlexpress-0.2.4.tar.gz
Algorithm Hash digest
SHA256 aa6bc9286d371f5c7b829c2bb8066bd7834a64217d256efe25875ad53e990e7e
MD5 790bcdaaaba5c7f2c1d0d17fdb935ecf
BLAKE2b-256 860b83db58a6603470154d138657a7340b8f34d31e6de6767dc8b03b5e9cb98f

See more details on using hashes here.

File details

Details for the file sqlexpress-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: sqlexpress-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.2

File hashes

Hashes for sqlexpress-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5edfd3cb7c3467e6cf49674d2dfc4c0eb7f1c58f1eade21b257fb3537831f412
MD5 e22f6d671e8e0447565f078a53112d50
BLAKE2b-256 0a6ed3856fbf935af85652580830c0bb10e34548c9e7167f8b39c67b48c60692

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page