Skip to main content

The smallest DuckDB SQL transformations orchestrator

Project description

yato — yet another transformation orchestrator

yato is the smallest orchestrator on Earth to orchestrate SQL data transformations on top of DuckDB. You just give a folder with SQL queries and it guesses the DAG and runs the queries in the right order.

Installation

yato works with Python 3.8+.

pip install yato-orchestrator

Get Started

Create a folder named sql and put your SQL files in it, you can for instance uses the 2 queries given in the example folder.

from yato import Yato

yato = Yato(
    # The path of the file in which yato will run the SQL queries.
    # If you want to run it in memory, just set it to :memory:
    database_path="tmp.duckdb",
    # This is the folder where the SQL files are located.
    # The names of the files will determine the name of the table created.
    sql_folder="sql/",
    # The name of the DuckDB schema where the tables will be created.
    schema="transform",
)

# Runs yato against the DuckDB database with the queries in order.
yato.run()

You can also run yato with the cli:

yato run --db tmp.duckdb sql/

Works with dlt

yato is designed to work in pair with dlt. dlt handles the data loading and yato the data transformation.

import dlt
from yato import Yato

yato = Yato(
    database_path="db.duckdb",
    sql_folder="sql/",
    schema="transform",
)

# You restore the database from S3 before runnning dlt
yato.restore()

pipeline = dlt.pipeline(
    pipeline_name="get_my_data",
    destination="duckdb",
    dataset_name="production",
    credentials="db.duckdb",
)

data = my_source()

load_info = pipeline.run(data)

# You backup the database after a successful dlt run
yato.backup()
yato.run()

How does it work?

yato runs relies on the amazing SQLGlot library to syntactically parse the SQL queries and build a DAG of the dependencies. Then, it runs the queries in the right order.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yato_lib-0.0.2.tar.gz (5.5 kB view hashes)

Uploaded Source

Built Distribution

yato_lib-0.0.2-py3-none-any.whl (8.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page