Skip to main content

SQLBucket - Write your SQL ETL flow and ETL integrity tool.

Project description

https://travis-ci.org/socialpoint-labs/sqlbucket.svg?branch=master

SQLBucket was built to help write, orchestrate and validate SQL ETL. It gives the possibility to set variables and introduces some control flow like if/else and for loops, to make your queries dynamic when writing them. It also implements a very simplistic integration testing framework to validate the results of your ETL pipelines in the form of SQL checks.

Lightweight, it can work as a stand alone service, or be part of your workflow manager environment (Airflow, Luigi, ..).

Installing

Install and update using pip:

pip install -U sqlbucket

SQLBucket works only for Python 3.6 and 3.7, and probably 3.8 although not tested yet.

A Simple Example

To start working with SQLBucket, you need to have a ‘projects’ folder that will contain all your SQL ETL. SQLBucket, essentially works with the following folder structure as a root folder, where you can have as many projects as you want.

projects/
    |-- project1/
    |-- project2/
    |-- project3/
        ...

Inside a project, it must contain a folder called queries that will contains your SQL files to be ran, and a config.yaml that will let you set the order in which those queries must be processed. SQLBucket leverages on the amazing Jinja2 templating library to give you the possibility to set variables in your SQL as well as giving you pure execution flows like for loops. To see in more depth what can be done, see the documentation on how to write SQL in SQLBucket.

In practice, this is how a project folder would look.

projects/
    |-- my_super_project/
        |-- config.yaml
        |-- queries/
            |-- my_super_insert_query.sql
            |-- some_other_query.sql
        |-- integrity/
            |-- test1.sql
            |-- test2.sql

The integrity folder gives you the possibility to write some checks in SQL, that will ran at the end of your ETL to validate your data. Check documentation on integrity for a more detailed explanation on testing the integrity of your ETL.

This is how you would launch a project:

from sqlbucket import SQLBucket

connections = {
'db_test': 'postgresql://user:password@host:5439/database'
}

bucket = SQLBucket(connections=connections)
project = bucket.load_project(
    project_name='fat_etl',
    connection_name='db_test',
    variables={'foo': 1}
)

project.run()
project.run_integrity()

This would trigger logs as below.

documentation/images/terminal.gif

You can also launch a project using the sqlbucket command line interface.

Contributing

For guidance on how to make a contribution to SQLBucket, see the contributing guidelines.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlbucket-0.3.2.dev4.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlbucket-0.3.2.dev4-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file sqlbucket-0.3.2.dev4.tar.gz.

File metadata

  • Download URL: sqlbucket-0.3.2.dev4.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.0

File hashes

Hashes for sqlbucket-0.3.2.dev4.tar.gz
Algorithm Hash digest
SHA256 ca7f07e6bc7dbaba79cb426280281c2a674259608369d5c4d454ca07662f799e
MD5 db6b5c262ba923b4e40198c42e8c33ed
BLAKE2b-256 768292c3b62d32d5e47efcfd6be14d37e7b81080535e5ed5dedbe95cc992d5d9

See more details on using hashes here.

File details

Details for the file sqlbucket-0.3.2.dev4-py3-none-any.whl.

File metadata

  • Download URL: sqlbucket-0.3.2.dev4-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.0

File hashes

Hashes for sqlbucket-0.3.2.dev4-py3-none-any.whl
Algorithm Hash digest
SHA256 d886bc8581b7e7286b0f2b6680e273436929cf626fc1d233cee6185ac019c37d
MD5 ca57f7e939e8b44e3f62ba9ae6e20eb8
BLAKE2b-256 b526a4a1aa4d3ee79d7066c8ed68a09c9ae18b826ad7089ad5623a0b8cba8c46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page