Skip to main content

SQLBucket - Write your SQL ETL flow and ETL integrity tool.

Project description

https://travis-ci.org/socialpoint-labs/sqlbucket.svg?branch=master

SQLBucket was built to help write, orchestrate and validate SQL ETL. It gives the possibility to set variables and introduces some control flow like if/else and for loops, to make your queries dynamic when writing them. It also implements a very simplistic integration testing framework to validate the results of your ETL pipelines in the form of SQL checks.

Lightweight, it can work as a stand alone service, or be part of your workflow manager environment (Airflow, Luigi, ..).

Installing

Install and update using pip:

pip install -U sqlbucket

SQLBucket works only for Python 3.6 and 3.7, and probably 3.8 although not tested yet.

A Simple Example

To start working with SQLBucket, you need to have a ‘projects’ folder that will contain all your SQL ETL. SQLBucket, essentially works with the following folder structure as a root folder, where you can have as many projects as you want.

projects/
    |-- project1/
    |-- project2/
    |-- project3/
        ...

Inside a project, it must contain a folder called queries that will contains your SQL files to be ran, and a config.yaml that will let you set the order in which those queries must be processed. SQLBucket leverages on the amazing Jinja2 templating library to give you the possibility to set variables in your SQL as well as giving you pure execution flows like for loops. To see in more depth what can be done, see the documentation on how to write SQL in SQLBucket.

In practice, this is how a project folder would look.

projects/
    |-- my_super_project/
        |-- config.yaml
        |-- queries/
            |-- my_super_insert_query.sql
            |-- some_other_query.sql
        |-- integrity/
            |-- test1.sql
            |-- test2.sql

The integrity folder gives you the possibility to write some checks in SQL, that will ran at the end of your ETL to validate your data. Check documentation on integrity for a more detailed explanation on testing the integrity of your ETL.

This is how you would launch a project:

from sqlbucket import SQLBucket

connections = {
'db_test': 'postgresql://user:password@host:5439/database'
}

bucket = SQLBucket(connections=connections)
project = bucket.load_project(
    project_name='fat_etl',
    connection_name='db_test',
    variables={'foo': 1}
)

project.run()
project.run_integrity()

This would trigger logs as below.

documentation/images/terminal.gif

You can also launch a project using the sqlbucket command line interface.

Contributing

For guidance on how to make a contribution to SQLBucket, see the contributing guidelines.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlbucket-0.3.3.dev3.tar.gz (13.8 kB view hashes)

Uploaded Source

Built Distribution

sqlbucket-0.3.3.dev3-py3-none-any.whl (15.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page