A Swiss Army knife for simple ETL operations

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

djrlj694

These details have not been verified by PyPI

Project links

Project description

ETLPlus

ETLPlus is a veritable Swiss Army knife for enabling simple ETL operations, offering both a Python package and command-line interface for data extraction, validation, transformation, and loading.

ETLPlus

Getting Started

ETLPlus helps you extract, validate, transform, and load data from files, databases, and APIs, either as a Python library or from the command line.

To get started:

See Installation for setup instructions.
Try the Quickstart for a minimal working example (CLI and Python).
Explore Usage for more detailed options and workflows.
See SUPPORT.md for the current support policy, supported Python versions, and response targets.

ETLPlus currently supports Python 3.13 and 3.14.

At a Glance

Install with pip install etlplus for the supported CLI, etlplus.ops, the API client, and the built-in implemented file handlers.
Use pip install -e ".[dev]" for contributor tooling and pip install -e ".[file]" when you need the remaining scientific and specialty format dependencies.
Use pip install -e ".[storage]" when you want cloud storage backends for s3://, azure-blob://, or abfs:// URIs through etlplus.storage and etlplus.file.File.
Expect the most stable execution surface from the documented CLI commands, etlplus.ops, implemented file handlers, and etlplus.api.
See docs/source/getting-started/compatibility.md for the supported Python versions, platform coverage, and dependency groups.
See docs/source/getting-started/quickstart.md if you want the shortest path from install to a working ETL flow.

Detailed file-handler coverage and migration notes are still available later in this README and in docs/source/guides/file-handler-matrix.md, but they are no longer required reading to get started.

Release Status

ETLPlus treats the v1.x line as its stable public release line. The repository still retains some placeholders, stubs, and migration-reference modules for historical or implementation reasons, but they are not part of the supported public contract unless they are explicitly documented as such.

The stable surface for the current v1.x releases is:

The documented CLI commands: check, extract, history, load, log, render, report, run, status, transform, and validate
The documented Python ETL primitives in etlplus.ops
The implemented file handlers listed as implemented in the handler matrix
The documented API client and pagination helpers under etlplus.api

The following are not part of the stable execution surface unless explicitly promoted later:

Database extract/load execution paths that are still described as placeholders
Stubbed file handlers and placeholder formats
Defunct or migration-reference modules retained for historical context

Maintainers handling packaging, CI, versioned docs, or release gating should consult RELEASE-CHECKLIST.md.

Features

Check data pipeline definitions before running them:
- Summarize jobs, sources, targets, and transforms
- Run lightweight runtime and config readiness checks with --readiness
- Confirm configuration changes by printing focused sections on demand
Render SQL DDL from shared table specs:
- Generate CREATE TABLE or view statements
- Swap templates or direct output to files for database migrations
Extract data from multiple sources:
- Files (CSV, JSON, XML, YAML)
- Databases (connection string support; extract is a placeholder today)
- REST APIs (GET)
Validate data with flexible rules:
- Type checking
- Required fields
- Value ranges (min/max)
- String length constraints
- Pattern matching
- Enum validation
Transform data with powerful operations:
- Filter records
- Map/rename fields
- Select specific fields
- Sort data
- Aggregate functions (avg, count, max, min, sum)
Load data to multiple targets:
- Files (CSV, JSON, XML, YAML)
- Databases (connection string support; load is a placeholder today)
- REST APIs (PATCH, POST, PUT)
Inspect local run history and reports:
- List normalized runs with filters and table output
- Stream raw append events for backend-level troubleshooting
- Inspect the latest run or aggregate success and duration metrics by job, status, or day

Installation

pip install etlplus

For development:

pip install -e ".[dev]"

The default install includes the non-native dependencies used by the built-in file handlers for common binary, columnar, spreadsheet, and embedded-database formats such as cbor2, duckdb, fastavro, msgpack, openpyxl, odfpy, pandas, pyarrow, pymongo, xlrd, and xlwt.

This is intentional for the stable line. ETLPlus treats the documented CLI, etlplus.ops, etlplus.api, and the implemented built-in file handlers as one supported default runtime surface, so the base install keeps the dependencies needed for that surface together instead of pushing core implemented handlers behind extras.

For development with full optional file-format support:

pip install -e ".[dev,file]"

For runtime-only optional file-format support:

pip install -e ".[file]"

For runtime cloud-storage support:

pip install -e ".[storage]"

The file extra is now reserved for the remaining scientific and specialty format dependencies such as netCDF4, pyreadr, pyreadstat, and xarray.

That split is also intentional: the file extra is reserved for narrower optional workflows rather than for the built-in formats that ETLPlus expects most users of the default runtime to have available.

Quickstart

Get up and running in under a minute.

Command-line interface

# Inspect help and version
etlplus --help
etlplus --version

# One-liner: extract CSV, filter, select, and write JSON
etlplus extract examples/data/sample.csv \
  | etlplus transform --operations '{"filter": {"field": "age", "op": "gt", "value": 25}, "select": ["name", "email"]}' \
  - temp/sample_output.json

Python API

from etlplus.ops import extract, transform, validate, load

data = extract("file", "input.csv")
ops = {"filter": {"field": "age", "op": "gt", "value": 25}, "select": ["name", "email"]}
filtered = transform(data, ops)
rules = {"name": {"type": "string", "required": True}, "email": {"type": "string", "required": True}}
assert validate(filtered, rules)["valid"]
load(filtered, "file", "temp/sample_output.json", file_format="json")

Support ETLPlus

If ETLPlus saves you engineering time, consider supporting the project through the repository sponsor button once the funding links are live on the default branch. Funding helps to pay for:

Maintenance and bug fixes
New file, API, and database connectors
Documentation, examples, and release automation
Compatibility work for new Python and dependency versions

The preferred sponsorship path is GitHub Sponsors, with Buy Me a Coffee as the lightweight fallback for one-time support.

Support is only one way to contribute. ETLPlus also benefits from codeless contributions such as documentation fixes, issue triage, reproducible bug reports, usage feedback, examples, testing results, answering questions in discussions, and release validation.

For community participation, use GitHub Discussions for questions, docs feedback, examples, and support conversations. Use GitHub Issues for confirmed bugs and concrete feature work. See docs/community-discussions.md for the recommended setup.

Data Connectors

Data connectors abstract sources from which to extract data and targets to which to load data. They are differentiated by their types, each of which is represented in the subsections below.

REST APIs (`api`)

ETLPlus can extract from REST APIs and load results via common HTTP methods. Supported operations include GET for extract and PATCH/POST/PUT for load.

Databases (`database`)

Database connectors use connection strings for extraction and loading, and DDL can be rendered from table specs for migrations or schema checks. Database extract/load operations are currently placeholders; plan to integrate a database client in your runner.

Files (`file`)

Recognized file formats are listed in the tables below. Support for reading to or writing from a recognized file format is marked as:

Y: implemented (may require optional dependencies)
N: stubbed or not yet implemented

Handler Architecture

File IO is moving to class-based handlers rooted at etlplus/file/base.py (FileHandlerABC, category ABCs, and ReadOnlyFileHandlerABC).
etlplus/file/registry.py resolves handlers using an explicit FileFormat -> handler class map.
Dispatch is explicit-only: unmapped formats raise Unsupported format.
Module-level etlplus.file.<format>.read() / write() wrapper APIs have been removed.
Use handler instances directly (for example, JsonFile().read(path) / JsonFile().write(path, data)) or etlplus.file.File dispatch via File(path, file_format).read() and .write(...).
Documentation and examples intentionally use handler class methods, not deprecated module wrappers.
Placeholder handlers are split into:
- etlplus/file/stub.py for generic stub behavior
- etlplus/file/_stub_categories.py for category-aware internal stub ABCs
Scientific/statistical handlers dta, nc, rda, rds, sav, and xpt now implement ScientificDatasetFileHandlerABC dataset hooks.

Current Migration Coverage (Class-Based + Explicit Registry Mapping)

Delimited/text: csv, dat, fwf, psv, tab, tsv, txt
Semi-structured/config: ini, json, ndjson, properties, toml, xml, yaml
Columnar: arrow, feather, orc, parquet
Binary/interchange: avro, bson, cbor, msgpack, pb, proto
Embedded DB: duckdb, sqlite
Spreadsheets: ods, xls, xlsm, xlsx
Scientific/statistical: dta, nc, rda, rds, sav, xpt, sas7bdat (read-only), plus single-dataset scientific stubs mat, sylk, zsav
Archive wrappers: gz, zip
Log/event streams: log
Templates: hbs, jinja2, mustache, vm
Explicit module-owned stub handlers (via stub.py + _stub_categories.py): stub, accdb, cfg, conf, ion, mdb, numbers, pbf, wks

Handler Matrix Guardrail

The concise matrix below is the migration guardrail for class-based handler coverage. For batch-by-batch maintenance notes and the same matrix in docs, see docs/file-handler-matrix.md.

Format	Handler Class	Base ABC	Read/Write Support	Status
`accdb`	`AccdbFile`	`StubEmbeddedDatabaseFileHandlerABC`	read/write	stub
`arrow`	`ArrowFile`	`ColumnarFileHandlerABC`	read/write	implemented
`avro`	`AvroFile`	`BinarySerializationFileHandlerABC`	read/write	implemented
`bson`	`BsonFile`	`BinarySerializationFileHandlerABC`	read/write	implemented
`cbor`	`CborFile`	`BinarySerializationFileHandlerABC`	read/write	implemented
`cfg`	`CfgFile`	`StubSemiStructuredTextFileHandlerABC`	read/write	stub
`conf`	`ConfFile`	`StubSemiStructuredTextFileHandlerABC`	read/write	stub
`csv`	`CsvFile`	`StandardDelimitedTextFileHandlerABC`	read/write	implemented
`dat`	`DatFile`	`DelimitedTextFileHandlerABC`	read/write	implemented
`dta`	`DtaFile`	`SingleDatasetScientificFileHandlerABC`	read/write	implemented
`duckdb`	`DuckdbFile`	`EmbeddedDatabaseFileHandlerABC`	read/write	implemented
`feather`	`FeatherFile`	`ColumnarFileHandlerABC`	read/write	implemented
`fwf`	`FwfFile`	`TextFixedWidthFileHandlerABC`	read/write	implemented
`gz`	`GzFile`	`ArchiveWrapperFileHandlerABC`	read/write	implemented
`hbs`	`HbsFile`	`TemplateFileHandlerABC`	read/write	implemented
`hdf5`	`Hdf5File`	`ScientificDatasetFileHandlerABC`	read-only	implemented
`ini`	`IniFile`	`DictPayloadSemiStructuredTextFileHandlerABC`	read/write	implemented
`ion`	`IonFile`	`StubSemiStructuredTextFileHandlerABC`	read/write	stub
`jinja2`	`Jinja2File`	`TemplateFileHandlerABC`	read/write	implemented
`json`	`JsonFile`	`RecordPayloadSemiStructuredTextFileHandlerABC`	read/write	implemented
`log`	`LogFile`	`LogEventFileHandlerABC`	read/write	implemented
`mat`	`MatFile`	`StubSingleDatasetScientificFileHandlerABC`	read/write	stub
`mdb`	`MdbFile`	`StubEmbeddedDatabaseFileHandlerABC`	read/write	stub
`msgpack`	`MsgpackFile`	`BinarySerializationFileHandlerABC`	read/write	implemented
`mustache`	`MustacheFile`	`TemplateFileHandlerABC`	read/write	implemented
`nc`	`NcFile`	`SingleDatasetScientificFileHandlerABC`	read/write	implemented
`ndjson`	`NdjsonFile`	`SemiStructuredTextFileHandlerABC`	read/write	implemented
`numbers`	`NumbersFile`	`StubSpreadsheetFileHandlerABC`	read/write	stub
`ods`	`OdsFile`	`SpreadsheetFileHandlerABC`	read/write	implemented
`orc`	`OrcFile`	`ColumnarFileHandlerABC`	read/write	implemented
`parquet`	`ParquetFile`	`ColumnarFileHandlerABC`	read/write	implemented
`pb`	`PbFile`	`BinarySerializationFileHandlerABC`	read/write	implemented
`pbf`	`PbfFile`	`StubBinarySerializationFileHandlerABC`	read/write	stub
`properties`	`PropertiesFile`	`DictPayloadSemiStructuredTextFileHandlerABC`	read/write	implemented
`proto`	`ProtoFile`	`BinarySerializationFileHandlerABC`	read/write	implemented
`psv`	`PsvFile`	`StandardDelimitedTextFileHandlerABC`	read/write	implemented
`rda`	`RdaFile`	`ScientificDatasetFileHandlerABC`	read/write	implemented
`rds`	`RdsFile`	`SingleDatasetScientificFileHandlerABC`	read/write	implemented
`sas7bdat`	`Sas7bdatFile`	`SingleDatasetScientificFileHandlerABC`	read-only	implemented
`sav`	`SavFile`	`SingleDatasetScientificFileHandlerABC`	read/write	implemented
`sqlite`	`SqliteFile`	`EmbeddedDatabaseFileHandlerABC`	read/write	implemented
`stub`	`StubFile`	`StubFileHandlerABC`	read/write	stub
`sylk`	`SylkFile`	`StubSingleDatasetScientificFileHandlerABC`	read/write	stub
`tab`	`TabFile`	`StandardDelimitedTextFileHandlerABC`	read/write	implemented
`toml`	`TomlFile`	`DictPayloadSemiStructuredTextFileHandlerABC`	read/write	implemented
`tsv`	`TsvFile`	`StandardDelimitedTextFileHandlerABC`	read/write	implemented
`txt`	`TxtFile`	`PlainTextFileHandlerABC`	read/write	implemented
`vm`	`VmFile`	`TemplateFileHandlerABC`	read/write	implemented
`wks`	`WksFile`	`StubSpreadsheetFileHandlerABC`	read/write	stub
`xls`	`XlsFile`	`ReadOnlySpreadsheetFileHandlerABC`	read-only	implemented
`xlsm`	`XlsmFile`	`SpreadsheetFileHandlerABC`	read/write	implemented
`xlsx`	`XlsxFile`	`SpreadsheetFileHandlerABC`	read/write	implemented
`xml`	`XmlFile`	`SemiStructuredTextFileHandlerABC`	read/write	implemented
`xpt`	`XptFile`	`SingleDatasetScientificFileHandlerABC`	read/write	implemented
`yaml`	`YamlFile`	`RecordPayloadSemiStructuredTextFileHandlerABC`	read/write	implemented
`zip`	`ZipFile`	`ArchiveWrapperFileHandlerABC`	read/write	implemented
`zsav`	`ZsavFile`	`StubSingleDatasetScientificFileHandlerABC`	read/write	stub

Stubbed / Placeholder

Format	Read	Write	Description
`stub`	N	Placeholder format for tests and future connectors.

Tabular & Delimited Text

Format	Read	Write	Description
`csv`	Y	Y	Comma-Separated Values
`dat`	Y	Y	Generic data file, often delimited or fixed-width
`fwf`	Y	Y	Fixed-Width Fields
`psv`	Y	Y	Pipe-Separated Values
`tab`	Y	Y	Often synonymous with TSV
`tsv`	Y	Y	Tab-Separated Values
`txt`	Y	Y	Plain text, often delimited or fixed-width

Semi-Structured Text

Format	Read	Write	Description
`cfg`	N	N	Config-style key-value pairs
`conf`	N	N	Config-style key-value pairs
`ini`	Y	Y	Config-style key-value pairs
`json`	Y	Y	JavaScript Object Notation
`ndjson`	Y	Y	Newline-Delimited JSON
`properties`	Y	Y	Java-style key-value pairs
`toml`	Y	Y	Tom's Obvious Minimal Language
`xml`	Y	Y	Extensible Markup Language
`yaml`	Y	Y	YAML Ain't Markup Language

Columnar / Analytics-Friendly

Format	Read	Write	Description
`arrow`	Y	Y	Apache Arrow IPC
`feather`	Y	Y	Apache Arrow Feather
`orc`	Y	Y	Optimized Row Columnar; common in Hadoop
`parquet`	Y	Y	Apache Parquet; common in Big Data

Binary Serialization and Interchange

Format	Read	Write	Description
`avro`	Y	Y	Apache Avro
`bson`	Y	Y	Binary JSON; common with MongoDB exports/dumps
`cbor`	Y	Y	Concise Binary Object Representation
`ion`	N	N	Amazon Ion
`msgpack`	Y	Y	MessagePack
`pb`	Y	Y	Protocol Buffers (Google Protobuf)
`pbf`	N	N	Protocolbuffer Binary Format; often for GIS data
`proto`	Y	Y	Protocol Buffers schema; often in .pb / .bin

Databases and Embedded Storage

Format	Read	Write	Description
`accdb`	N	N	Microsoft Access (newer format)
`duckdb`	Y	Y	DuckDB
`mdb`	N	N	Microsoft Access (older format)
`sqlite`	Y	Y	SQLite

Spreadsheets

Format	Read	Write	Description
`numbers`	N	N	Apple Numbers
`ods`	Y	Y	OpenDocument
`wks`	N	N	Lotus 1-2-3
`xls`	Y	N	Microsoft Excel (BIFF; read-only)
`xlsm`	Y	Y	Microsoft Excel Macro-Enabled (Open XML)
`xlsx`	Y	Y	Microsoft Excel (Open XML)

Statistical / Scientific / Numeric Computing

Format	Read	Write	Description
`dta`	Y	Y	Stata
`hdf5`	Y	N	Hierarchical Data Format
`mat`	N	N	MATLAB
`nc`	Y	Y	NetCDF
`rda`	Y	Y	RData workspace/object
`rds`	Y	Y	R data
`sas7bdat`	Y	N	SAS data
`sav`	Y	Y	SPSS data
`sylk`	N	N	Symbolic Link
`xpt`	Y	Y	SAS Transport
`zsav`	N	N	Compressed SPSS data

Logs and Event Streams

Format	Read	Write	Description
`log`	Y	Y	Generic log file

Data Archives

Format	Read	Write	Description
`gz`	Y	Y	Gzip-compressed file
`zip`	Y	Y	ZIP archive

Templates

Format	Read	Write	Description
`hbs`	Y	Y	Handlebars
`jinja2`	Y	Y	Jinja2
`mustache`	Y	Y	Mustache
`vm`	Y	Y	Apache Velocity

Usage

Command Line Interface

ETLPlus provides a powerful CLI for ETL operations:

# Show help
etlplus --help

# Show version
etlplus --version

The CLI is implemented with Typer (Click-based). The legacy argparse parser has been removed, so rely on the documented commands/flags and run etlplus <command> --help for current options.

Command Shapes

The core commands accept positional source and target arguments when you want to read from or write to explicit paths or URIs. When you omit them, ETLPlus falls back to standard streams:

extract: etlplus extract [SOURCE]
- Omit SOURCE to read from STDIN.
transform: etlplus transform [SOURCE] [TARGET]
- Omit SOURCE to read from STDIN and omit TARGET to write to STDOUT.
load: etlplus load [TARGET]
- Omit TARGET to write to STDOUT.
validate: etlplus validate [SOURCE]
- Omit SOURCE to read from STDIN and use --output if you want file output instead of STDOUT.

Use --source-format, --target-format, --source-type, and --target-type to override the usual inference rules when a filename, URI, or stream does not provide enough context.

Check Pipelines

Use etlplus check to explore pipeline YAML definitions without running them. The command can print job names, summarize configured sources and targets, drill into specific sections, or run readiness checks.

Inspect config contents:

etlplus check --config examples/configs/pipeline.yml --jobs
etlplus check --config examples/configs/pipeline.yml --summary

Show sources or transforms for troubleshooting:

etlplus check --config examples/configs/pipeline.yml --sources
etlplus check --config examples/configs/pipeline.yml --transforms

Run runtime and config readiness checks:

etlplus check --readiness
etlplus check --readiness --config examples/configs/pipeline.yml

Render SQL DDL

Use etlplus render to turn table schema specs into ready-to-run SQL. Render from a pipeline config or from a standalone schema file, and choose the built-in ddl or view templates (or provide your own).

Render all tables defined in a pipeline:

etlplus render --config examples/configs/pipeline.yml --template ddl

Render a single table in that pipeline:

etlplus render --config examples/configs/pipeline.yml --table customers --template view

Render from a standalone table spec to a file:

etlplus render --spec schemas/customer.yml --template view -o temp/customer_view.sql

Extract Data

Note: For file sources, the format is normally inferred from the filename extension. Use --source-format to override inference when a file lacks an extension or when you want to force a specific parser.

Extract from JSON file:

etlplus extract examples/data/sample.json

Extract from CSV file:

etlplus extract examples/data/sample.csv

Extract from XML file:

etlplus extract examples/data/sample.xml

Extract from REST API:

etlplus extract https://api.example.com/data

Save extracted data to file:

etlplus extract examples/data/sample.csv > temp/sample_output.json

Validate Data

Validate data from file or JSON string:

etlplus validate '{"name": "John", "age": 30}' --rules '{"name": {"type": "string", "required": true}, "age": {"type": "number", "min": 0, "max": 150}}'

Validate from file:

etlplus validate examples/data/sample.json --rules '{"email": {"type": "string", "pattern": "^[\\w.-]+@[\\w.-]+\\.\\w+$"}}'

Transform Data

When piping data through etlplus transform, use --source-format whenever the SOURCE argument is - or a literal payload, mirroring the etlplus extract semantics. When TARGET is omitted or set to -, etlplus transform emits JSON to STDOUT. When TARGET is a file path or file URI, the transformed payload is written directly. When TARGET is an API or database target and you provide --target-type, the command delegates the transformed payload to etlplus load and prints the downstream load result envelope. --target-format affects file targets and delegated load targets that honor a format hint. Use --source-type to override the inferred source connector type and --target-type to override the inferred target connector type, matching the etlplus extract/ etlplus load behavior.

Transform file inputs while overriding connector types:

etlplus transform \
  --operations '{"select": ["name", "email"]}' \
  examples/data/sample.json  --source-type file \
  temp/selected_output.json --target-type file

Filter and select fields:

etlplus transform \
  --operations '{"filter": {"field": "age", "op": "gt", "value": 26}, "select": ["name"]}' \
  '[{"name": "John", "age": 30}, {"name": "Jane", "age": 25}]'

Sort data:

etlplus transform \
  --operations '{"sort": {"field": "age", "reverse": true}}' \
  examples/data/sample.json

Aggregate data:

etlplus transform \
  --operations '{"aggregate": {"field": "age", "func": "sum"}}' \
  examples/data/sample.json

Map/rename fields:

etlplus transform \
  --operations '{"map": {"name": "new_name"}}' \
  examples/data/sample.json

Send transformed data to a REST API through the load path:

etlplus transform \
  --operations '{"select": ["name", "email"]}' \
  examples/data/sample.json \
  https://api.example.com/customers --target-type api

Database targets use the same delegated load path, but the current database load implementation is still a documented placeholder.

Inspect Run History

etlplus run persists local run history keyed by run_id. Use the read/query commands to inspect that history without opening the backend directly.

List recent normalized runs:

etlplus history --job file_to_file_customers --status succeeded --limit 10 --table

Show the latest matching run:

etlplus status --job file_to_file_customers

Stream raw history events:

etlplus log --run-id 8e4a33d7 --follow

Aggregate grouped history metrics:

etlplus report --group-by day --since 2026-03-01T00:00:00Z --table

Load Data

etlplus load consumes JSON from STDIN; provide only the target argument plus optional flags.

Load to JSON file:

etlplus extract examples/data/sample.json \
  | etlplus load temp/sample_output.json --target-type file

Load to CSV file:

etlplus extract examples/data/sample.csv \
  | etlplus load temp/sample_output.csv --target-type file

Load to REST API:

cat examples/data/sample.json \
  | etlplus load https://api.example.com/endpoint --target-type api

Python API

Use ETLPlus as a Python library:

from etlplus.ops import extract, validate, transform, load

# Extract data
data = extract("file", "data.json")

# Validate data
validation_rules = {
    "name": {"type": "string", "required": True},
    "age": {"type": "number", "min": 0, "max": 150}
}
result = validate(data, validation_rules)
if result["valid"]:
    print("Data is valid!")

# Transform data
operations = {
    "filter": {"field": "age", "op": "gt", "value": 18},
    "select": ["name", "email"]
}
transformed = transform(data, operations)

# Load data
load(transformed, "file", "temp/sample_output.json", file_format="json")

For YAML-driven pipelines executed end-to-end (extract → validate → transform → load), see:

Authoring: docs/pipeline-guide.md
Runner API and internals: see etlplus.ops.run docstrings and docs/pipeline-guide.md.

CLI quick reference for pipelines:

# List jobs or show a pipeline summary
etlplus check --config examples/configs/pipeline.yml --jobs
etlplus check --config examples/configs/pipeline.yml --summary

# Run a job
etlplus run --config examples/configs/pipeline.yml --job file_to_file_customers

# Run a job and emit structured events to STDERR
etlplus run --config examples/configs/pipeline.yml --job file_to_file_customers --event-format jsonl

Complete ETL Pipeline Example

# 1. Extract from CSV
etlplus extract examples/data/sample.csv > temp/sample_extracted.json

# 2. Transform (filter and select fields)
etlplus transform \
  --operations '{"filter": {"field": "age", "op": "gt", "value": 25}, "select": ["name", "email"]}' \
  temp/sample_extracted.json \
  temp/sample_transformed.json

# 3. Validate transformed data
etlplus validate \
  --rules '{"name": {"type": "string", "required": true}, "email": {"type": "string", "required": true}}' \
  temp/sample_transformed.json

# 4. Load to CSV
cat temp/sample_transformed.json \
  | etlplus load temp/sample_output.csv

Format Overrides

--source-format and --target-format override whichever format would normally be inferred from a file extension. This is useful when an input lacks an extension (for example, records.txt that actually contains CSV) or when you intentionally want to treat a file as another format.

Examples (zsh):

# Force CSV parsing for an extension-less file
etlplus extract data.txt --source-type file --source-format csv

# Write CSV to a file without the .csv suffix
etlplus load output.bin --target-type file --target-format csv < data.json

# Leave the flags off when extensions already match the desired format
etlplus extract data.csv --source-type file
etlplus load output.json --target-type file < data.json

Transformation Operations

Filter Operations

Supported operators:

eq: Equal
ne: Not equal
gt: Greater than
gte: Greater than or equal
lt: Less than
lte: Less than or equal
in: Value in list
contains: List/string contains value

Example:

{
  "filter": {
    "field": "status",
    "op": "in",
    "value": ["active", "pending"]
  }
}

Aggregation Functions

Supported functions:

sum: Sum of values
avg: Average of values
min: Minimum value
max: Maximum value
count: Count of values

Example:

{
  "aggregate": {
    "field": "revenue",
    "func": "sum"
  }
}

Validation Rules

Supported validation rules:

type: Data type (string, number, integer, boolean, array, object)
required: Field is required (true/false)
min: Minimum value for numbers
max: Maximum value for numbers
minLength: Minimum length for strings
maxLength: Maximum length for strings
pattern: Regex pattern for strings
enum: List of allowed values

Example:

{
  "email": {
    "type": "string",
    "required": true,
    "pattern": "^[\\w.-]+@[\\w.-]+\\.\\w+$"
  },
  "age": {
    "type": "number",
    "min": 0,
    "max": 150
  },
  "status": {
    "type": "string",
    "enum": ["active", "inactive", "pending"]
  }
}

Development

API Client Docs

Looking for the HTTP client and pagination helpers? See the dedicated docs in etlplus/api/README.md for:

Quickstart with EndpointClient
Authentication via EndpointCredentialsBearer
Pagination with PaginationConfig (page and cursor styles)
Tips on records_path and cursor_path

Runner Internals and Connectors

Curious how the pipeline runner composes API requests, pagination, and load calls?

Runner overview and helpers: see etlplus.ops.run docstrings and docs/pipeline-guide.md
Unified "connector" vocabulary (API/File/DB): etlplus/connector
- API/file targets reuse the same shapes as sources; API targets typically set a method.

Running Tests

For local CI parity and full coverage of remaining optional file formats, install:

pip install -e ".[dev,file]"

# Lightweight run (uses currently installed extras)
pytest

# Full run with remaining optional file-format dependencies
make test-full

Test Scope and Intent

ETLPlus organizes tests by scope and uses markers for cross-cutting intent.

Scope folders:
- Unit (tests/unit/): isolated function/class behavior, no external services.
- Integration (tests/integration/): cross-module and boundary behavior.
- E2E (tests/e2e/): full workflow/system-boundary behavior.
Intent markers:
- smoke: go/no-go viability checks.
- contract: interface/metadata compatibility checks.

Smoke tests are now treated as an intent marker rather than a primary folder. The legacy path migration is complete; smoke tests live under scope folders and are selected by marker.

If a test calls etlplus.cli.main() or etlplus.ops.run.run(), it is integration by default. Detailed criteria and marker conventions: CONTRIBUTING.md#testing, tests/README.md.

Code Coverage

pytest tests/unit tests/integration tests/e2e --cov=etlplus --cov-report=html

Linting

make lint
make doclint
make fmt
make typecheck

make lint runs the Ruff-based source checks used in CI, make doclint runs pydocstyle and pydoclint, make fmt applies the supported Ruff-plus-autopep8 formatting path, and make typecheck runs mypy against the shipped package. ETLPlus no longer maintains separate Black or Flake8 contributor paths; Ruff is the authoritative lint gate and autopep8 remains as the compatibility formatter used by CI and pre-commit. .ruff.toml is the canonical line-length source, and any duplicated formatter width in supporting tooling is expected to match it. If an external tool still invokes Flake8, the repository .flake8 file exists only as a compatibility shim for the overlapping basics that Flake8 can understand.

Updating Demo Snippets

DEMO.md shows the real output of etlplus --version captured from a freshly built wheel. Regenerate the snippet (and the companion file docs/snippets/installation_version.md) after changing anything that affects the version string:

make demo-snippets

The helper script in tools/update_demo_snippets.py builds the wheel, installs it into a throwaway virtual environment, runs etlplus --version, and rewrites the snippet between the markers in DEMO.md.

Releasing to the Python Package Index (PyPI)

setuptools-scm derives the package version from Git tags, so publishing is now entirely tag driven—no hand-editing pyproject.toml, setup.py, or etlplus/__version__.py.

GitHub Releases is the canonical release-history surface for ETLPlus. It is also the earlier developer-preview and release-announcement surface for tagged releases, whereas PyPI is the later public package-install channel. The docs changelog page links there, and the maintainer-facing release text is drafted from the template and category config in the .github/ folder.

Ensure main is green and the release notes/docs are up to date.

Create and push a SemVer tag matching the v*.*.* pattern:

git tag -a v1.4.0 -m "Release v1.4.0"
git push origin v1.4.0

GitHub Actions runs the tagged release workflow in .github/workflows/release.yml, builds the sdist/wheel, validates the artifacts, validates the tagged docs build, publishes the GitHub Release, and then publishes to PyPI.
Draft the GitHub Release notes using .github/RELEASE-NOTES-TEMPLATE.md together with the categorized notes configured in .github/release.yml.

The tagged docs publication itself is handled by the Read the Docs GitHub App after the tag push; the release workflow only validates that the docs build cleanly from the tagged source.

If you want an extra smoke-test before tagging, run make dist && pip install dist/*.whl locally; this exercises the same build path the workflow uses.

License

This project is licensed under the MIT License.

Contributing

Code and codeless contributions are welcome! If you’d like to add a new feature, fix a bug, or improve the documentation, please feel free to submit a pull request as follows:

Fork this repository.
Create a new feature branch for your changes (git checkout -b feature/feature-name).
Commit your changes (git commit -m "Add feature").
Push to your branch (git push origin feature-name).
Submit a pull request with a detailed description.

If you choose to be a code contributor, please first refer these documents:

Pipeline authoring guide: docs/pipeline-guide.md
Design notes (Mapping inputs, dict outputs): docs/pipeline-guide.md#design-notes-mapping-inputs-dict-outputs
Typing philosophy (TypedDicts as editor hints, permissive runtime): CONTRIBUTING.md#typing-philosophy

Valuable non-code contributions include:

Improving or correcting documentation
Reporting bugs with clear reproduction steps
Testing releases and platform-specific behavior
Proposing examples, tutorials, and workflow patterns
Answering questions in GitHub Discussions
Sponsoring the project through GitHub Sponsors or Buy Me a Coffee

Documentation

Python Packages/Subpackage

Navigate to detailed documentation for each subpackage:

etlplus.api: Lightweight HTTP client and paginated REST helpers
etlplus.cli: Command-line interface definitions for etlplus
etlplus.database: Database engine, schema, and ORM helpers
etlplus.file: Unified file format support and helpers
etlplus.storage: Storage location parsing and backend helpers
etlplus.ops: Extract/validate/transform/load primitives
etlplus.templates: SQL and DDL template helpers
etlplus.workflow: Helpers for data connectors, pipelines, jobs, and profiles

Community Health

Contributing Guidelines: How to contribute, report issues, and submit PRs
Code of Conduct: Community standards and expectations
Security Policy: Responsible disclosure and vulnerability reporting
Support: Where to get help

Other

API client docs: etlplus/api/README.md
Examples: examples/README.md
File handler matrix guardrail: docs/file-handler-matrix.md
Pipeline authoring guide: docs/pipeline-guide.md
Runner internals: see etlplus.ops.run docstrings and docs/pipeline-guide.md
Design notes (Mapping inputs, dict outputs): docs/pipeline-guide.md#design-notes-mapping-inputs-dict-outputs
Typing philosophy: CONTRIBUTING.md#typing-philosophy
Demo and walkthrough: DEMO.md
Additional references: REFERENCES.md

Acknowledgments

ETLPlus is inspired by common work patterns in data engineering and software engineering patterns in Python development, aiming to increase productivity and reduce boilerplate code. Feedback and contributions are always appreciated!

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

djrlj694

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.14.6

May 7, 2026

1.14.5

May 7, 2026

1.14.4

May 6, 2026

1.14.3

May 6, 2026

1.14.2

May 6, 2026

1.14.1

May 5, 2026

1.14.0

May 5, 2026

1.13.18

May 5, 2026

1.13.17

May 5, 2026

1.13.16

May 4, 2026

1.13.15

May 4, 2026

1.13.14

May 4, 2026

1.13.13

May 3, 2026

1.13.12

May 3, 2026

1.13.11

May 2, 2026

1.13.10

May 2, 2026

1.13.7

May 2, 2026

1.13.6

Apr 28, 2026

1.13.5

Apr 27, 2026

1.13.4

Apr 27, 2026

1.13.3

Apr 26, 2026

1.13.2

Apr 26, 2026

1.13.1

Apr 26, 2026

1.12.0

Apr 23, 2026

1.11.11

Apr 23, 2026

1.11.10

Apr 23, 2026

1.11.9

Apr 23, 2026

1.11.8

Apr 23, 2026

1.11.7

Apr 23, 2026

1.11.6

Apr 23, 2026

1.11.5

Apr 22, 2026

1.11.4

Apr 21, 2026

1.11.3

Apr 21, 2026

1.11.2

Apr 21, 2026

1.11.1

Apr 20, 2026

1.11.0

Apr 19, 2026

1.10.5

Apr 19, 2026

1.10.4

Apr 19, 2026

1.10.3

Apr 19, 2026

1.10.2

Apr 19, 2026

1.10.1

Apr 18, 2026

1.10.0

Apr 18, 2026

1.9.4

Apr 18, 2026

1.9.3

Apr 18, 2026

1.9.2

Apr 17, 2026

1.9.1

Apr 17, 2026

1.9.0

Apr 16, 2026

1.8.2

Apr 7, 2026

1.8.1

Apr 7, 2026

1.8.0

Apr 7, 2026

1.7.0

Apr 6, 2026

1.6.0

Apr 6, 2026

1.5.4

Apr 6, 2026

1.5.3

Apr 6, 2026

1.5.2

Apr 5, 2026

1.5.1

Apr 5, 2026

1.5.0

Apr 5, 2026

1.4.5

Apr 5, 2026

1.4.4

Apr 5, 2026

1.4.3

Apr 5, 2026

1.4.2

Apr 4, 2026

1.4.1

Apr 4, 2026

1.4.0

Apr 4, 2026

1.3.4

Apr 3, 2026

1.3.3

Apr 3, 2026

1.3.2

Apr 2, 2026

1.3.1

Apr 2, 2026

1.3.0

Apr 2, 2026

1.2.11

Apr 1, 2026

1.2.10

Apr 1, 2026

1.2.9

Apr 1, 2026

1.2.8

Apr 1, 2026

1.2.7

Mar 30, 2026

1.2.6

Mar 30, 2026

This version

1.2.5

Mar 30, 2026

1.2.4

Mar 29, 2026

1.2.3

Mar 29, 2026

1.2.2

Mar 29, 2026

1.2.1

Mar 29, 2026

1.2.0

Mar 24, 2026

1.1.3

Mar 23, 2026

1.1.2

Mar 21, 2026

1.1.1

Mar 21, 2026

1.1.0

Mar 20, 2026

1.0.3

Mar 20, 2026

1.0.2

Mar 20, 2026

1.0.1

Mar 19, 2026

1.0.0

Mar 19, 2026

0.27.12

Mar 17, 2026

0.27.11

Mar 16, 2026

0.27.10

Mar 15, 2026

0.27.9

Mar 15, 2026

0.27.8

Mar 15, 2026

0.27.7

Mar 15, 2026

0.27.6

Mar 15, 2026

0.27.5

Mar 15, 2026

0.27.4

Mar 15, 2026

0.27.3

Mar 15, 2026

0.27.2

Mar 15, 2026

0.26.1

Mar 13, 2026

0.26.0

Mar 13, 2026

0.25.11

Mar 12, 2026

0.25.10

Mar 12, 2026

0.25.9

Mar 12, 2026

0.25.8

Mar 11, 2026

0.25.7

Mar 11, 2026

0.25.5

Mar 11, 2026

0.25.4

Mar 11, 2026

0.25.3

Mar 10, 2026

0.25.2

Mar 9, 2026

0.25.1

Mar 9, 2026

0.25.0

Mar 9, 2026

0.24.7

Mar 9, 2026

0.24.6

Mar 9, 2026

0.24.5

Mar 9, 2026

0.24.4

Mar 8, 2026

0.24.3

Mar 2, 2026

0.24.2

Mar 2, 2026

0.24.1

Mar 1, 2026

0.24.0

Feb 27, 2026

0.23.1

Feb 26, 2026

0.23.0

Feb 26, 2026

0.22.11

Feb 25, 2026

0.22.10

Feb 25, 2026

0.22.9

Feb 25, 2026

0.22.8

Feb 25, 2026

0.22.6

Feb 25, 2026

0.22.4

Feb 24, 2026

0.22.3

Feb 24, 2026

0.22.2

Feb 24, 2026

0.22.1

Feb 24, 2026

0.22.0

Feb 23, 2026

0.21.8

Feb 23, 2026

0.21.7

Feb 17, 2026

0.21.6

Feb 17, 2026

0.21.5

Feb 16, 2026

0.21.4

Feb 15, 2026

0.21.3

Feb 14, 2026

0.21.2

Feb 13, 2026

0.20.1

Feb 13, 2026

0.19.4

Feb 12, 2026

0.19.3

Feb 12, 2026

0.19.2

Feb 12, 2026

0.19.1

Feb 12, 2026

0.19.0

Feb 11, 2026

0.18.2

Feb 11, 2026

0.18.1

Feb 9, 2026

0.18.0

Feb 7, 2026

0.17.7

Feb 6, 2026

0.17.6

Feb 4, 2026

0.17.5

Feb 4, 2026

0.17.4

Feb 3, 2026

0.17.3

Feb 3, 2026

0.17.2

Feb 3, 2026

0.16.10

Feb 2, 2026

0.16.9

Feb 2, 2026

0.16.8

Feb 1, 2026

0.16.7

Feb 1, 2026

0.16.6

Feb 1, 2026

0.16.5

Feb 1, 2026

0.16.4

Feb 1, 2026

0.16.3

Jan 31, 2026

0.16.2

Jan 30, 2026

0.16.0

Jan 30, 2026

0.15.5

Jan 26, 2026

0.15.4

Jan 25, 2026

0.15.2

Jan 25, 2026

0.15.0

Jan 23, 2026

0.14.3

Jan 20, 2026

0.14.1

Jan 20, 2026

0.14.0

Jan 20, 2026

0.13.0

Jan 18, 2026

0.12.13

Jan 18, 2026

0.12.12

Jan 18, 2026

0.12.11

Jan 18, 2026

0.12.10

Jan 17, 2026

0.12.9

Jan 16, 2026

0.12.5

Jan 15, 2026

0.12.4

Jan 15, 2026

0.12.3

Jan 14, 2026

0.12.2

Jan 14, 2026

0.12.1

Jan 14, 2026

0.11.12

Jan 14, 2026

0.11.11

Jan 14, 2026

0.11.10

Jan 14, 2026

0.11.9

Jan 14, 2026

0.11.8

Jan 14, 2026

0.11.7

Jan 13, 2026

0.11.5

Jan 13, 2026

0.11.4

Jan 13, 2026

0.11.3

Jan 13, 2026

0.11.2

Jan 13, 2026

0.11.1

Jan 13, 2026

0.10.5

Jan 12, 2026

0.10.4

Jan 12, 2026

0.10.3

Jan 12, 2026

0.10.2

Jan 12, 2026

0.10.1

Jan 12, 2026

0.9.2

Jan 25, 2026

0.9.1

Jan 12, 2026

0.9.0

Jan 11, 2026

0.8.6

Jan 10, 2026

0.8.4

Jan 10, 2026

0.8.3

Jan 10, 2026

0.8.2

Jan 9, 2026

0.8.0

Jan 9, 2026

0.7.2

Jan 9, 2026

0.7.1

Jan 8, 2026

0.7.0

Jan 8, 2026

0.6.1

Jan 8, 2026

0.5.5

Jan 8, 2026

0.5.4

Jan 8, 2026

0.5.3

Jan 7, 2026

0.5.2

Jan 7, 2026

0.5.1

Jan 7, 2026

0.4.9

Jan 7, 2026

0.4.8

Jan 7, 2026

0.4.7

Jan 7, 2026

0.4.6

Jan 7, 2026

0.4.5

Jan 7, 2026

0.4.1

Jan 2, 2026

0.4.0

Dec 30, 2025

0.3.25

Dec 29, 2025

0.3.23

Dec 29, 2025

0.3.22

Dec 27, 2025

0.3.21

Dec 27, 2025

0.3.19

Dec 22, 2025

0.3.17

Dec 22, 2025

0.3.16

Dec 19, 2025

0.3.15

Dec 19, 2025

0.3.14

Dec 19, 2025

0.3.13

Dec 18, 2025

0.3.10

Dec 18, 2025

0.3.9

Dec 18, 2025

0.3.8

Dec 18, 2025

0.3.7

Dec 18, 2025

0.3.6

Dec 18, 2025

0.3.5

Dec 18, 2025

0.3.3

Dec 18, 2025

0.3.2

Dec 17, 2025

0.3.1

Dec 17, 2025

0.3.0

Dec 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etlplus-1.2.5.tar.gz (596.8 kB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

etlplus-1.2.5-py3-none-any.whl (340.0 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file etlplus-1.2.5.tar.gz.

File metadata

Download URL: etlplus-1.2.5.tar.gz
Upload date: Mar 30, 2026
Size: 596.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for etlplus-1.2.5.tar.gz
Algorithm	Hash digest
SHA256	`20b3adcfa35f148ec73ed4e88214fa4ee216b358cd78390906c2e6a57cec5b1b`
MD5	`06d6ddf4409acaf88249f4c7d6f2354d`
BLAKE2b-256	`13c0e5c8cb356e1260daf5393025ff25a585243450373f561a7aea4afa0dda97`

See more details on using hashes here.

Provenance

The following attestation bundles were made for etlplus-1.2.5.tar.gz:

Publisher: release.yml on Dagitali/ETLPlus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: etlplus-1.2.5.tar.gz
- Subject digest: 20b3adcfa35f148ec73ed4e88214fa4ee216b358cd78390906c2e6a57cec5b1b
- Sigstore transparency entry: 1199227026
- Sigstore integration time: Mar 30, 2026
Source repository:
- Permalink: Dagitali/ETLPlus@2dfe72c0fccba2eda079fdefbe0b46a18313be8b
- Branch / Tag: refs/tags/v1.2.5
- Owner: https://github.com/Dagitali
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@2dfe72c0fccba2eda079fdefbe0b46a18313be8b
- Trigger Event: push

File details

Details for the file etlplus-1.2.5-py3-none-any.whl.

File metadata

Download URL: etlplus-1.2.5-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 340.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for etlplus-1.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e52b5832e611205e34892dfbe90811bd88505ef5d72ebf839e088d651104e408`
MD5	`66c49ea359cfc0523bef24685607044e`
BLAKE2b-256	`ed4092515741e5aab0ec4ac542ac0347fff4c58fd194fcf200fd87afce4b1406`

See more details on using hashes here.

Provenance

The following attestation bundles were made for etlplus-1.2.5-py3-none-any.whl:

Publisher: release.yml on Dagitali/ETLPlus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: etlplus-1.2.5-py3-none-any.whl
- Subject digest: e52b5832e611205e34892dfbe90811bd88505ef5d72ebf839e088d651104e408
- Sigstore transparency entry: 1199227030
- Sigstore integration time: Mar 30, 2026
Source repository:
- Permalink: Dagitali/ETLPlus@2dfe72c0fccba2eda079fdefbe0b46a18313be8b
- Branch / Tag: refs/tags/v1.2.5
- Owner: https://github.com/Dagitali
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@2dfe72c0fccba2eda079fdefbe0b46a18313be8b
- Trigger Event: push

etlplus 1.2.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ETLPlus

Getting Started

At a Glance

Release Status

Features

Installation

Quickstart

Command-line interface

Python API

Support ETLPlus

Data Connectors

REST APIs (api)

Databases (database)

Files (file)

Handler Matrix Guardrail

Stubbed / Placeholder

Tabular & Delimited Text

Semi-Structured Text

Columnar / Analytics-Friendly

Binary Serialization and Interchange

Databases and Embedded Storage

Spreadsheets

Statistical / Scientific / Numeric Computing

Logs and Event Streams

Data Archives

Templates

Usage

Command Line Interface

Command Shapes

Check Pipelines

Render SQL DDL

Extract Data

Validate Data

Transform Data

Inspect Run History

Load Data

Python API

Complete ETL Pipeline Example

Format Overrides

Transformation Operations

Filter Operations

Aggregation Functions

Validation Rules

Development

API Client Docs

Runner Internals and Connectors

Running Tests

Test Scope and Intent

Code Coverage

Linting

Updating Demo Snippets

Releasing to the Python Package Index (PyPI)

License

Contributing

Documentation

Python Packages/Subpackage

Community Health

Other

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

REST APIs (`api`)

Databases (`database`)

Files (`file`)