sqlbucket

SQLBucket - Write your SQL ETL flow and ETL integrity tool.

These details have not been verified by PyPI

Project links

Homepage

Project description

https://travis-ci.org/socialpoint-labs/sqlbucket.svg?branch=master

SQLBucket is a lightweight framework to help write, orchestrate and validate SQL data pipelines. It gives the possibility to set variables and introduces some control flow using the fantastic Jinja2 library. It also implements a very simplistic unit and integration test framework where you can validate the results of your ETL in the form of SQL checks. With SQLBucket, you can apply TDD principles when writing data pipelines.

It can work as a stand alone service, or be part of your workflow manager environment (Airflow, Luigi, ..).

Installing

Install and update using pip:

pip install -U sqlbucket

SQLBucket now works for python 3.10.

A Simple Example

To start working, you need to instantiate your SQLBucket core object with the project_folder parameter. That folder will contain all your SQL ETL. The python file where you create your SQLBucket object is also a good place to instantiate your command line interface, as shown below.

# my_sqlbucket.py
from sqlbucket import SQLBucket


bucket = SQLBucket(projects_folder='projects')


if __name__ == '__main__':
    bucket.cli()

The following command will create your first project in your projects folder.

python my_sqlbucket.py create-project -n my_first_project

For more info on CLI, please refer to its documentations.

Your projects should now look like the structure below:

projects/
    |-- my_first_project/
        |-- config.yaml
        |-- queries/
            |-- query_one.sql
            |-- query_two.sql
        |-- integrity/
            |-- integrity_one.sql

SQLBucket project structure

An SQLBucket project is made of 3 core components: the configuration, the ETL queries and the integrity check queries.

Configuration

The config.yaml is the core of your project. This is where you can define variables at project level, and configure the order your sql queries must be executed. For a better explanations on how to configure variables you can refer to the usage documentation, and also the variables documentation which also describes environment and connections variables.

ETL queries

The queries folder simply contain your SQL queries. You can organize them in the folder structure of your choice. As long as they are in the queries folder, SQLBucket will find them and execute them when configured to do so. See the documentation on how to write SQL with SQLBucket.

Integrity queries

The integrity folder simply contain SQL queries to help you validate your ETL. You can organize them in the folder structure of your choice. The only convention is to return the result of your integrity (True/False) in a field named passed. The main idea is that integrity is done by SQL itself. Check documentation on integrity for a more detailed explanation on testing the integrity of your ETL. We also have a set of common macros that can be helpful to start with.

See below a full example that will actually first run your ETL, and then run your integrity checks for a given database configuration.

from sqlbucket import SQLBucket

connections = {
    'db_demo': 'postgresql://user:password@host:5439/database'
}

bucket = SQLBucket(connections=connections)
project = bucket.load_project(
    project_name='my_first_project',
    connection_name='db_demo',
    variables={'foo': 1}
)

# to run ETL
project.run()

# to run integrity
project.run_integrity()

We recommend setting your connection urls as environment variables for security purposes.

Template project

To get you up to speed, you can create a fork of the SQLBucket template project and start building SQL data pipelines within minutes.

Contributing

For guidance on how to make a contribution to SQLBucket, see the contributing guidelines.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.4.4

Dec 1, 2022

0.4.3

Oct 18, 2022

0.4.2

Sep 28, 2022

0.4.1

Aug 27, 2020

0.4.0

Mar 27, 2020

0.3.3.dev3 pre-release

Mar 9, 2020

0.3.3.dev2 pre-release

Mar 9, 2020

0.3.3.dev1 pre-release

Mar 9, 2020

0.3.2

Feb 26, 2020

0.3.2.dev4 pre-release

Feb 9, 2020

0.3.2.dev3 pre-release

Feb 9, 2020

0.3.2.dev2 pre-release

Feb 9, 2020

0.3.2.dev1 pre-release

Feb 9, 2020

0.3.2.dev0 pre-release

Feb 9, 2020

0.3.1

Dec 4, 2019

0.3.0

Nov 27, 2019

0.2.12.dev3 pre-release

Nov 27, 2019

0.2.12.dev2 pre-release

Nov 27, 2019

0.2.12.dev1 pre-release

Nov 27, 2019

0.2.12.dev0 pre-release

Nov 27, 2019

0.2.11

Nov 26, 2019

0.2.11.dev1 pre-release

Nov 26, 2019

0.2.11.dev0 pre-release

Nov 22, 2019

0.2.10.dev9 pre-release

Nov 20, 2019

0.2.10.dev8 pre-release

Nov 20, 2019

0.2.10.dev7 pre-release

Nov 20, 2019

0.2.10.dev6 pre-release

Nov 20, 2019

0.2.10.dev4 pre-release

Nov 20, 2019

0.2.10.dev3 pre-release

Nov 20, 2019

0.2.10.dev2 pre-release

Nov 20, 2019

0.2.10.dev1 pre-release

Nov 20, 2019

0.2.10.dev0 pre-release

Nov 20, 2019

0.2.9

Nov 19, 2019

0.2.8

Nov 19, 2019

0.2.7

Nov 19, 2019

0.2.6

Nov 19, 2019

0.2.5

Nov 18, 2019

0.2.4

Nov 6, 2019

0.2.3

Nov 6, 2019

0.2.2

Nov 6, 2019

0.2.1

Oct 10, 2019

0.1.21

Oct 8, 2019

0.1.20

Oct 8, 2019

0.1.19

Oct 8, 2019

0.1.18

Oct 8, 2019

0.1.17

Oct 8, 2019

0.1.16

Oct 8, 2019

0.1.15

Oct 8, 2019

0.1.14

Sep 12, 2019

0.1.13

Sep 12, 2019

0.1.12

Sep 12, 2019

0.1.11

Sep 12, 2019

0.1.10

Sep 11, 2019

0.1.9

Sep 11, 2019

0.1.8

Sep 11, 2019

0.1.7

Sep 6, 2019

0.1.6

Sep 6, 2019

0.1.5

Sep 6, 2019

0.1.4

Sep 6, 2019

0.1.3

Sep 6, 2019

0.1.2

Sep 6, 2019

0.1.1

Sep 6, 2019

0.1.1.dev0 pre-release

Sep 6, 2019

0.1.0

Sep 6, 2019

0.1.0.dev0 pre-release

Sep 6, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlbucket-0.4.4.tar.gz (15.9 kB view details)

Uploaded Dec 1, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sqlbucket-0.4.4-py3-none-any.whl (16.5 kB view details)

Uploaded Dec 1, 2022 Python 3

File details

Details for the file sqlbucket-0.4.4.tar.gz.

File metadata

Download URL: sqlbucket-0.4.4.tar.gz
Upload date: Dec 1, 2022
Size: 15.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.15

File hashes

Hashes for sqlbucket-0.4.4.tar.gz
Algorithm	Hash digest
SHA256	`12863ce991d8331f523b78c2dc4d28818fc448a13a59bc39ceac4ec0a49b90ee`
MD5	`5c4f3bfc98811fbbfcc00d38619a182e`
BLAKE2b-256	`2baf2868e47db69bdf077b0ad9a9eb7901e525f07a5acda0fbe9629d1396802d`

See more details on using hashes here.

File details

Details for the file sqlbucket-0.4.4-py3-none-any.whl.

File metadata

Download URL: sqlbucket-0.4.4-py3-none-any.whl
Upload date: Dec 1, 2022
Size: 16.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.15

File hashes

Hashes for sqlbucket-0.4.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bf3e33c8512f07f04603b38b734d81c2737bcd603d7e8bdaece7c413b775f3e1`
MD5	`a945fd4401532eac97678a36e6ceb54d`
BLAKE2b-256	`b031729f37c060cd08c0fb602a2994eecb0800021cc96c8dc0dc49474946e75a`

See more details on using hashes here.

sqlbucket 0.4.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installing

A Simple Example

SQLBucket project structure

Template project

Contributing

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes