Skip to main content

QuickETL - Fast & Flexible Python ETL Framework with 20+ backend support via Ibis

Project description

QuickETL

Fast & Flexible Python ETL Framework with 20+ backend support via Ibis

QuickETL is a configuration-driven ETL framework that provides a simple, unified API for data processing across multiple compute backends including DuckDB, Polars, Spark, and pandas.

Features

  • Multi-backend Support: Run the same pipeline on DuckDB, Polars, DataFusion, Spark, pandas, and more via Ibis
  • Configuration-driven: Define pipelines in YAML with variable substitution
  • Quality Checks: Built-in data quality validation (not_null, unique, row_count, accepted_values, expression)
  • 12 Transform Operations: select, rename, filter, derive_column, cast, fill_null, dedup, sort, join, aggregate, union, limit
  • CLI Interface: quicketl run, quicketl validate, quicketl init, quicketl info
  • Airflow Integration: @quicketl_task decorator for DAG tasks
  • Cloud Storage: S3, GCS, Azure via fsspec

Installation

# Basic installation (DuckDB + Polars)
pip install quicketl

# With additional backends
pip install quicketl[spark]
pip install quicketl[datafusion]

# With cloud storage
pip install quicketl[aws]
pip install quicketl[gcp]
pip install quicketl[azure]

# All backends and tools
pip install quicketl[all]

Quick Start

CLI Usage

# Initialize a new project
quicketl init my_project
cd my_project

# Run a pipeline
quicketl run pipelines/sample.yml

# Validate configuration
quicketl validate pipelines/sample.yml

# Show available backends
quicketl info --backends

Pipeline Configuration (YAML)

name: sales_etl
description: Process daily sales data
engine: duckdb

source:
  type: file
  path: data/sales.parquet
  format: parquet

transforms:
  - op: filter
    predicate: amount > 0
  - op: derive_column
    name: total_with_tax
    expr: amount * 1.1
  - op: aggregate
    group_by: [region]
    aggs:
      total_sales: sum(amount)
      order_count: count(*)

checks:
  - type: not_null
    columns: [region, total_sales]
  - type: row_count
    min: 1

sink:
  type: file
  path: data/output.parquet
  format: parquet

Python API

from quicketl import Pipeline, QuickETLEngine
from quicketl.config.models import FileSource, FileSink
from quicketl.config.transforms import FilterTransform, DeriveColumnTransform

# From YAML
pipeline = Pipeline.from_yaml("pipeline.yml")
result = pipeline.run()

# Builder pattern
pipeline = (
    Pipeline("my_pipeline", engine="duckdb")
    .source(FileSource(path="data.parquet"))
    .transform(FilterTransform(predicate="amount > 0"))
    .transform(DeriveColumnTransform(name="tax", expr="amount * 0.1"))
    .sink(FileSink(path="output.parquet"))
)
result = pipeline.run()

# Direct engine usage
engine = QuickETLEngine(backend="duckdb")
table = engine.read_file("data.parquet", "parquet")
filtered = engine.filter(table, "amount > 100")
result = engine.to_polars(filtered)

Airflow Integration

from quicketl.integrations.airflow import quicketl_task

@quicketl_task(config_path="pipelines/daily_etl.yml")
def run_daily_etl(**context):
    return {"RUN_DATE": context["ds"]}

Supported Backends

Backend Type Installation
DuckDB Local/Embedded Included by default
Polars Local/Embedded Included by default
DataFusion Local/Embedded pip install quicketl[datafusion]
Spark Distributed pip install quicketl[spark]
pandas Local pip install quicketl[pandas]
PostgreSQL Database pip install quicketl[postgres]
MySQL Database pip install quicketl[mysql]
ClickHouse Database pip install quicketl[clickhouse]
Snowflake Cloud DW pip install quicketl[snowflake]
BigQuery Cloud DW pip install quicketl[bigquery]
Trino Distributed SQL pip install quicketl[trino]

Development

# Clone and install dev dependencies
git clone https://github.com/ameijin/quicketl.git
cd quicketl
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check src/

# Type check
mypy src/

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quicketl-0.1.0.tar.gz (339.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quicketl-0.1.0-py3-none-any.whl (50.7 kB view details)

Uploaded Python 3

File details

Details for the file quicketl-0.1.0.tar.gz.

File metadata

  • Download URL: quicketl-0.1.0.tar.gz
  • Upload date:
  • Size: 339.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for quicketl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 270f50eb1dac5ac87cdaa867d895687e05e40bb642b5e9319abb726ed6803e17
MD5 6ea403b006665b8ab3f10fc80bda1358
BLAKE2b-256 15caa1efa7816a2d16ea35a49dced8a4c18860ed9c504934812d3ca7bebd4a48

See more details on using hashes here.

Provenance

The following attestation bundles were made for quicketl-0.1.0.tar.gz:

Publisher: publish.yml on ameijin/quicketl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file quicketl-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: quicketl-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for quicketl-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c025ad0cf45a9c3a9d9b75d1cbc48b931bf376c810f5d9fe43cc1cba7bb9791d
MD5 2ea85f2181cb402eba815c4449e27c92
BLAKE2b-256 320aedd32326d488ee307a8db5419ad8fc26a62ea104c46472b256bc0370de78

See more details on using hashes here.

Provenance

The following attestation bundles were made for quicketl-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ameijin/quicketl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page