The smallest DuckDB SQL transformations orchestrator

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

yato — yet another transformation orchestrator

yato is the smallest orchestrator on Earth to orchestrate SQL data transformations on top of DuckDB. You just give a folder with SQL queries and it guesses the DAG and runs the queries in the right order.

Installation

yato works with Python 3.8+.

pip install yato-lib

Get Started

Create a folder named sql and put your SQL files in it, you can for instance uses the 2 queries given in the example folder.

from yato import Yato

yato = Yato(
    # The path of the file in which yato will run the SQL queries.
    # If you want to run it in memory, just set it to :memory:
    database_path="tmp.duckdb",
    # This is the folder where the SQL files are located.
    # The names of the files will determine the name of the table created.
    sql_folder="sql/",
    # The name of the DuckDB schema where the tables will be created.
    schema="transform",
)

# Runs yato against the DuckDB database with the queries in order.
yato.run()

You can also run yato with the cli:

yato run --db tmp.duckdb sql/

Works with dlt

yato is designed to work in pair with dlt. dlt handles the data loading and yato the data transformation.

import dlt
from yato import Yato

yato = Yato(
    database_path="db.duckdb",
    sql_folder="sql/",
    schema="transform",
)

# You restore the database from S3 before runnning dlt
yato.restore()

pipeline = dlt.pipeline(
    pipeline_name="get_my_data",
    destination="duckdb",
    dataset_name="production",
    credentials="db.duckdb",
)

data = my_source()

load_info = pipeline.run(data)

# You backup the database after a successful dlt run
yato.backup()
yato.run()

Advanced usage

Mixing SQL and Python transformation

Even if we would love to do everything is SQL it happens sometimes that writing a transformation in Python with pandas (or other libraries) might be faster.

This is why you can mix SQL and Python transformation in yato.

In order to do it you can add a Python file in the transformation folder. In this Python file you have to implement a Transformation class with a run method. If you depend on other SQL transformation you have to define the source SQL query in a static method called source_sql.

Below an example of a transformation (like orders.py). The framework will understand that orders needs to run after source_orders.

from yato import Transformation


class Orders(Transformation):
    @staticmethod
    def source_sql():
        return "SELECT * FROM source_orders"

    def run(self, context, *args, **kwargs):
        df = self.get_source(context)

        df["new_column"] = 1

        return df

How does it work?

yato runs relies on the amazing SQLGlot library to syntactically parse the SQL queries and build a DAG of the dependencies. Then, it runs the queries in the right order.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.0.9

Mar 12, 2024

0.0.7

Mar 7, 2024

0.0.6

Mar 5, 2024

0.0.5

Mar 4, 2024

0.0.4

Mar 4, 2024

This version

0.0.3

Mar 1, 2024

0.0.2

Feb 28, 2024

0.0.1

Feb 28, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yato_lib-0.0.3.tar.gz (7.8 kB view hashes)

Uploaded Mar 1, 2024 Source

Built Distribution

yato_lib-0.0.3-py3-none-any.whl (10.0 kB view hashes)

Uploaded Mar 1, 2024 Python 3

Hashes for yato_lib-0.0.3.tar.gz

Hashes for yato_lib-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`ecfbc885799689b13409f0bf8a56666e79f6d1ba9ac42c324ea933377807d532`
MD5	`29ca216ac670a2eafa598038b5486df3`
BLAKE2b-256	`158e079f9e043cb6c6940d498391cf465914c59d5f6b9b944d2e7ca6beb7e534`

Hashes for yato_lib-0.0.3-py3-none-any.whl

Hashes for yato_lib-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9ba913474568832b78496b79fd3aca0ad720fa35a00b0c98155b323fb37a0ca4`
MD5	`8d16e140b81c81baa69be6620d6166da`
BLAKE2b-256	`3388d3a4f29a59851869a2cc7be90b728bf3df2dd675ce8ff945cb9847a18afb`