QuickETL - Fast & Flexible Python ETL Framework with 20+ backend support via Ibis
Project description
QuickETL
Fast & Flexible Python ETL Framework with 20+ backend support via Ibis
QuickETL is a configuration-driven ETL framework that provides a simple, unified API for data processing across multiple compute backends including DuckDB, Polars, Spark, and pandas.
Features
- Multi-backend Support: Run the same pipeline on DuckDB, Polars, DataFusion, Spark, pandas, and more via Ibis
- Configuration-driven: Define pipelines in YAML with variable substitution
- Quality Checks: Built-in data quality validation (not_null, unique, row_count, accepted_values, expression)
- 12 Transform Operations: select, rename, filter, derive_column, cast, fill_null, dedup, sort, join, aggregate, union, limit
- CLI Interface:
quicketl run,quicketl validate,quicketl init,quicketl info - Airflow Integration:
@quicketl_taskdecorator for DAG tasks - Cloud Storage: S3, GCS, Azure via fsspec
Installation
# Basic installation (DuckDB + Polars)
pip install quicketl
# With additional backends
pip install quicketl[spark]
pip install quicketl[datafusion]
# With cloud storage
pip install quicketl[aws]
pip install quicketl[gcp]
pip install quicketl[azure]
# All backends and tools
pip install quicketl[all]
Quick Start
CLI Usage
# Initialize in existing project
quicketl init
# Or create a new project
quicketl init my_project
cd my_project
# Run a pipeline
quicketl run pipelines/sample.yml
# Validate configuration
quicketl validate pipelines/sample.yml
# Show available backends
quicketl info --backends
Pipeline Configuration (YAML)
name: sales_etl
description: Process daily sales data
engine: duckdb
source:
type: file
path: data/sales.parquet
format: parquet
transforms:
- op: filter
predicate: amount > 0
- op: derive_column
name: total_with_tax
expr: amount * 1.1
- op: aggregate
group_by: [region]
aggs:
total_sales: sum(amount)
order_count: count(*)
checks:
- type: not_null
columns: [region, total_sales]
- type: row_count
min: 1
sink:
type: file
path: data/output.parquet
format: parquet
Python API
from quicketl import Pipeline, QuickETLEngine
from quicketl.config.models import FileSource, FileSink
from quicketl.config.transforms import FilterTransform, DeriveColumnTransform
# From YAML
pipeline = Pipeline.from_yaml("pipeline.yml")
result = pipeline.run()
# Builder pattern
pipeline = (
Pipeline("my_pipeline", engine="duckdb")
.source(FileSource(path="data.parquet"))
.transform(FilterTransform(predicate="amount > 0"))
.transform(DeriveColumnTransform(name="tax", expr="amount * 0.1"))
.sink(FileSink(path="output.parquet"))
)
result = pipeline.run()
# Direct engine usage
engine = QuickETLEngine(backend="duckdb")
table = engine.read_file("data.parquet", "parquet")
filtered = engine.filter(table, "amount > 100")
result = engine.to_polars(filtered)
Airflow Integration
from quicketl.integrations.airflow import quicketl_task
@quicketl_task(config_path="pipelines/daily_etl.yml")
def run_daily_etl(**context):
return {"RUN_DATE": context["ds"]}
Supported Backends
| Backend | Type | Installation |
|---|---|---|
| DuckDB | Local/Embedded | Included by default |
| Polars | Local/Embedded | Included by default |
| DataFusion | Local/Embedded | pip install quicketl[datafusion] |
| Spark | Distributed | pip install quicketl[spark] |
| pandas | Local | pip install quicketl[pandas] |
| PostgreSQL | Database | pip install quicketl[postgres] |
| MySQL | Database | pip install quicketl[mysql] |
| ClickHouse | Database | pip install quicketl[clickhouse] |
| Snowflake | Cloud DW | pip install quicketl[snowflake] |
| BigQuery | Cloud DW | pip install quicketl[bigquery] |
| Trino | Distributed SQL | pip install quicketl[trino] |
Development
# Clone and install dev dependencies
git clone https://github.com/ameijin/quicketl.git
cd quicketl
pip install -e ".[dev]"
# Run tests
pytest
# Lint
ruff check src/
# Type check
mypy src/
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quicketl-1.0.1.tar.gz.
File metadata
- Download URL: quicketl-1.0.1.tar.gz
- Upload date:
- Size: 331.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7da36daf64bfd186b0cad9a63509dd4980cc0567ef700bf34f23f9470800f6f
|
|
| MD5 |
dd5aa719a49b7055cfe95f949f365a3b
|
|
| BLAKE2b-256 |
fe84d1fef74bc16eaf1f6d94fbc1d1b6a1c6665d4261273cbe04506a0f98767a
|
Provenance
The following attestation bundles were made for quicketl-1.0.1.tar.gz:
Publisher:
release.yml on ameijin/quicketl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
quicketl-1.0.1.tar.gz -
Subject digest:
f7da36daf64bfd186b0cad9a63509dd4980cc0567ef700bf34f23f9470800f6f - Sigstore transparency entry: 766143468
- Sigstore integration time:
-
Permalink:
ameijin/quicketl@d161ba3599bbb4312d7bb8ee078b6327388b0515 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ameijin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d161ba3599bbb4312d7bb8ee078b6327388b0515 -
Trigger Event:
workflow_run
-
Statement type:
File details
Details for the file quicketl-1.0.1-py3-none-any.whl.
File metadata
- Download URL: quicketl-1.0.1-py3-none-any.whl
- Upload date:
- Size: 64.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03a221616934b1fb65a1f14fa6ccfbca74a990852e3d03c087de42556b9ed4a9
|
|
| MD5 |
91efd465db1ec0a68d7f0bdf6699c8ed
|
|
| BLAKE2b-256 |
f7146adf67aac529be3b433ea0390bb6ea1685328d33b7a8cad14fdfea36b83a
|
Provenance
The following attestation bundles were made for quicketl-1.0.1-py3-none-any.whl:
Publisher:
release.yml on ameijin/quicketl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
quicketl-1.0.1-py3-none-any.whl -
Subject digest:
03a221616934b1fb65a1f14fa6ccfbca74a990852e3d03c087de42556b9ed4a9 - Sigstore transparency entry: 766143479
- Sigstore integration time:
-
Permalink:
ameijin/quicketl@d161ba3599bbb4312d7bb8ee078b6327388b0515 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ameijin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d161ba3599bbb4312d7bb8ee078b6327388b0515 -
Trigger Event:
workflow_run
-
Statement type: