QuickETL - Fast & Flexible Python ETL Framework with 20+ backend support via Ibis
Project description
QuickETL
Fast & Flexible Python ETL Framework with 20+ backend support via Ibis
QuickETL is a configuration-driven ETL framework that provides a simple, unified API for data processing across multiple compute backends including DuckDB, Polars, Spark, and pandas.
Features
- 20+ Backends: DuckDB, Polars, Spark, pandas, Snowflake, BigQuery, PostgreSQL, and more via Ibis
- Configuration-driven: Define pipelines in YAML with variable substitution
- 18 Transforms: filter, aggregate, join, union, derive_column, window, pivot, unpivot, hash_key, coalesce, cast, fill_null, dedup, sort, select, rename, limit, and more
- 6 Quality Checks: not_null, unique, row_count, accepted_values, expression, and contract (Pandera)
- Data Contracts: Schema validation with Pandera, YAML-defined contracts, and a contract registry
- Multi-Source Pipelines: Join and union across multiple data sources in a single pipeline
- Database Sink: Write to databases with append, truncate, replace, and upsert modes
- Partitioned Writes: Write partitioned Parquet/CSV files by one or more columns
- Workflows: Multi-stage pipeline orchestration with parallel execution
- AI/ML Transforms: Text chunking and embedding generation for RAG pipelines
- Secrets Management: Pluggable providers for AWS Secrets Manager, Azure Key Vault, and env vars
- Telemetry: OpenTelemetry and OpenLineage integration for observability
- CLI & Python API: Use
quicketl runor the Pipeline builder pattern - Cloud Storage: S3, GCS, Azure via fsspec
Installation
pip install quicketl
With optional extras:
# Specific backends
pip install quicketl[polars]
pip install quicketl[spark]
# AI/ML features
pip install quicketl[embeddings-openai]
pip install quicketl[chunking]
# Data contracts
pip install quicketl[contracts]
# All optional dependencies
pip install quicketl[all]
See installation docs for backend-specific extras.
Quick Start
# Create a new project
quicketl init my_project
cd my_project
# Run the sample pipeline
quicketl run pipelines/sample.yml
Or use the Python API:
from quicketl import Pipeline
# From YAML configuration
pipeline = Pipeline.from_yaml("pipeline.yml")
result = pipeline.run()
# Or use the builder pattern
from quicketl.config.models import FileSource, FileSink
from quicketl.config.transforms import FilterTransform, AggregateTransform
from quicketl.config.checks import NotNullCheck
pipeline = (
Pipeline("sales_summary", engine="duckdb")
.source(FileSource(path="data/sales.parquet"))
.transform(FilterTransform(predicate="amount > 0"))
.transform(AggregateTransform(
group_by=["region"],
aggs={"total": "sum(amount)", "count": "count(*)"},
))
.check(NotNullCheck(columns=["region"]))
.sink(FileSink(path="output/summary.parquet"))
)
result = pipeline.run()
print(result.summary())
Example Pipeline
name: sales_etl
engine: duckdb
source:
type: file
path: data/sales.parquet
transforms:
- op: filter
predicate: amount > 0
- op: derive_column
name: revenue
expr: quantity * unit_price
- op: aggregate
group_by: [region]
aggs:
total: sum(amount)
order_count: count(*)
checks:
- type: not_null
columns: [region, total]
- type: row_count
min: 1
sink:
type: file
path: output/summary.parquet
Multi-Source Join
name: orders_with_customers
engine: duckdb
sources:
orders:
type: file
path: data/orders.parquet
customers:
type: file
path: data/customers.parquet
transforms:
- op: join
right: customers
"on": [customer_id]
how: left
- op: select
columns: [order_id, customer_name, amount]
sink:
type: file
path: output/enriched_orders.parquet
Documentation
Full documentation, tutorials, and API reference at quicketl.com
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quicketl-1.6.0.tar.gz.
File metadata
- Download URL: quicketl-1.6.0.tar.gz
- Upload date:
- Size: 492.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7e9d935999e556aea562133948bbb27ff133c823519234664429fdd67d075bd
|
|
| MD5 |
70ec06589369d6bc57a166e8ac14b126
|
|
| BLAKE2b-256 |
19efec076daed33cc6ebdc63d78c0eb50747163bffcfbc2eb371853ead8a212c
|
Provenance
The following attestation bundles were made for quicketl-1.6.0.tar.gz:
Publisher:
release.yml on ameijin/quicketl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
quicketl-1.6.0.tar.gz -
Subject digest:
c7e9d935999e556aea562133948bbb27ff133c823519234664429fdd67d075bd - Sigstore transparency entry: 906136562
- Sigstore integration time:
-
Permalink:
ameijin/quicketl@72ab5b9496538e079272bc32b577887fb64a27b7 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ameijin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@72ab5b9496538e079272bc32b577887fb64a27b7 -
Trigger Event:
workflow_run
-
Statement type:
File details
Details for the file quicketl-1.6.0-py3-none-any.whl.
File metadata
- Download URL: quicketl-1.6.0-py3-none-any.whl
- Upload date:
- Size: 111.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35a9ea81f714d5587caede3a4f04e9eb130a28ff60b7f1454095ef19c35e63ff
|
|
| MD5 |
f871f612ad536bc4f278091f49e7c4b2
|
|
| BLAKE2b-256 |
aceccc74262f7fd4752f3fcadbf380d07a801432e0ce11450cda24d638329e02
|
Provenance
The following attestation bundles were made for quicketl-1.6.0-py3-none-any.whl:
Publisher:
release.yml on ameijin/quicketl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
quicketl-1.6.0-py3-none-any.whl -
Subject digest:
35a9ea81f714d5587caede3a4f04e9eb130a28ff60b7f1454095ef19c35e63ff - Sigstore transparency entry: 906136771
- Sigstore integration time:
-
Permalink:
ameijin/quicketl@72ab5b9496538e079272bc32b577887fb64a27b7 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ameijin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@72ab5b9496538e079272bc32b577887fb64a27b7 -
Trigger Event:
workflow_run
-
Statement type: