Skip to main content

QuickETL - Fast & Flexible Python ETL Framework with 20+ backend support via Ibis

Project description

QuickETL

Fast & Flexible Python ETL Framework with 20+ backend support via Ibis

PyPI version License: MIT

QuickETL is a configuration-driven ETL framework that provides a simple, unified API for data processing across multiple compute backends including DuckDB, Polars, Spark, and pandas.

Documentation | GitHub

Features

  • 20+ Backends: DuckDB, Polars, Spark, pandas, Snowflake, BigQuery, PostgreSQL, and more via Ibis
  • Configuration-driven: Define pipelines in YAML with variable substitution
  • 18 Transforms: filter, aggregate, join, union, derive_column, window, pivot, unpivot, hash_key, coalesce, cast, fill_null, dedup, sort, select, rename, limit, and more
  • 6 Quality Checks: not_null, unique, row_count, accepted_values, expression, and contract (Pandera)
  • Data Contracts: Schema validation with Pandera, YAML-defined contracts, and a contract registry
  • Multi-Source Pipelines: Join and union across multiple data sources in a single pipeline
  • Database Sink: Write to databases with append, truncate, replace, and upsert modes
  • Partitioned Writes: Write partitioned Parquet/CSV files by one or more columns
  • Workflows: Multi-stage pipeline orchestration with parallel execution
  • AI/ML Transforms: Text chunking and embedding generation for RAG pipelines
  • Secrets Management: Pluggable providers for AWS Secrets Manager, Azure Key Vault, and env vars
  • Telemetry: OpenTelemetry and OpenLineage integration for observability
  • CLI & Python API: Use quicketl run or the Pipeline builder pattern
  • Cloud Storage: S3, GCS, Azure via fsspec

Installation

pip install quicketl

With optional extras:

# Specific backends
pip install quicketl[polars]
pip install quicketl[spark]

# AI/ML features
pip install quicketl[embeddings-openai]
pip install quicketl[chunking]

# Data contracts
pip install quicketl[contracts]

# All optional dependencies
pip install quicketl[all]

See installation docs for backend-specific extras.

Quick Start

# Create a new project
quicketl init my_project
cd my_project

# Run the sample pipeline
quicketl run pipelines/sample.yml

Or use the Python API:

from quicketl import Pipeline

# From YAML configuration
pipeline = Pipeline.from_yaml("pipeline.yml")
result = pipeline.run()

# Or use the builder pattern
from quicketl.config.models import FileSource, FileSink
from quicketl.config.transforms import FilterTransform, AggregateTransform
from quicketl.config.checks import NotNullCheck

pipeline = (
    Pipeline("sales_summary", engine="duckdb")
    .source(FileSource(path="data/sales.parquet"))
    .transform(FilterTransform(predicate="amount > 0"))
    .transform(AggregateTransform(
        group_by=["region"],
        aggs={"total": "sum(amount)", "count": "count(*)"},
    ))
    .check(NotNullCheck(columns=["region"]))
    .sink(FileSink(path="output/summary.parquet"))
)
result = pipeline.run()
print(result.summary())

Example Pipeline

name: sales_etl
engine: duckdb

source:
  type: file
  path: data/sales.parquet

transforms:
  - op: filter
    predicate: amount > 0
  - op: derive_column
    name: revenue
    expr: quantity * unit_price
  - op: aggregate
    group_by: [region]
    aggs:
      total: sum(amount)
      order_count: count(*)

checks:
  - type: not_null
    columns: [region, total]
  - type: row_count
    min: 1

sink:
  type: file
  path: output/summary.parquet

Multi-Source Join

name: orders_with_customers
engine: duckdb

sources:
  orders:
    type: file
    path: data/orders.parquet
  customers:
    type: file
    path: data/customers.parquet

transforms:
  - op: join
    right: customers
    "on": [customer_id]
    how: left
  - op: select
    columns: [order_id, customer_name, amount]

sink:
  type: file
  path: output/enriched_orders.parquet

Documentation

Full documentation, tutorials, and API reference at quicketl.com

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quicketl-1.6.0.tar.gz (492.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quicketl-1.6.0-py3-none-any.whl (111.8 kB view details)

Uploaded Python 3

File details

Details for the file quicketl-1.6.0.tar.gz.

File metadata

  • Download URL: quicketl-1.6.0.tar.gz
  • Upload date:
  • Size: 492.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for quicketl-1.6.0.tar.gz
Algorithm Hash digest
SHA256 c7e9d935999e556aea562133948bbb27ff133c823519234664429fdd67d075bd
MD5 70ec06589369d6bc57a166e8ac14b126
BLAKE2b-256 19efec076daed33cc6ebdc63d78c0eb50747163bffcfbc2eb371853ead8a212c

See more details on using hashes here.

Provenance

The following attestation bundles were made for quicketl-1.6.0.tar.gz:

Publisher: release.yml on ameijin/quicketl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file quicketl-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: quicketl-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 111.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for quicketl-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35a9ea81f714d5587caede3a4f04e9eb130a28ff60b7f1454095ef19c35e63ff
MD5 f871f612ad536bc4f278091f49e7c4b2
BLAKE2b-256 aceccc74262f7fd4752f3fcadbf380d07a801432e0ce11450cda24d638329e02

See more details on using hashes here.

Provenance

The following attestation bundles were made for quicketl-1.6.0-py3-none-any.whl:

Publisher: release.yml on ameijin/quicketl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page