No project description provided
Project description
dpetl — Data package ETL
dpetl is a command-line interface (CLI) tool designed to assist in running the three phases of the ETL (Extract, Transform, Load) process (although currently only the extract phase is implemented).
It is designed to work alongside the Data Package standard specification.
Installation
Install using pip:
pip install dpetl
Usage
Use the --help flag to inspect the CLI documentation:
dpetl --help
Currently, only the extract command is available:
# Run extract using the default datapackage.yaml descriptor
dpetl extract
# Specify a descriptor explicitly
dpetl extract -d path/to/datapackage.yaml
# or
dpetl extract --descriptor path/to/datapackage.yaml
How It Works
The CLI loads a Data Package descriptor (via the frictionless-py Python package) and iterates over its resources.
For each resource, dpetl extract comand reads the custom property:
dptel_extract:
The key mode determines which extractor will run.
Currently available modes:
api.email.
Example Data Package Configuration
# datapackage.yaml
resources:
- name: invoices
path: data/invoices.csv
sources:
- method: get
path: https://api.example.com/invoices
dptel_extract:
mode: api
- name: payroll_from_email
path: data/payroll.xlsx
dptel_extract:
mode: email
mailbox: INBOX # optional (defaults to INBOX)
criteria:
subject: "Payroll Report" # optional (defaults to resource name)
Extractors
Email Extractor
-
Connects to an IMAP server using environment variables:
EMAIL_USER.EMAIL_PWD.EMAIL_IMAP.HTTP_PROXY[^1].
[^1]: Just in case you're running the command behind a corporate network that demands proxy configuration. The HTTPS_PROXY, http_proxy and https_proxy are equally acceptable. See this Issue's comment to understand why maybe you'll have to add authentication (http://<user>:<pwd>@<host>:<port>) on PROXY address.
- Reads configuration from:
dptel_extract:
mode: email
mailbox: INBOX # optional (default: INBOX)
criteria: # optional
subject: "Report" # optional (default: resource.name)
from_: "finance@example.com" # optional
date_gte: 2024-01-01 #optional
Behavior:
- If
dptel_extract.mailboxis not provided,INBOXis used. - If
dptel_extract.criteria.subjectis not provided, it defaults to the resourcename. - The extractor searches for the most recent matching email.
- All e-mail attachments are saved to
resource.path.
API Extractor
- Reads
resource.sources. - Searches for a source containing a
method. - Downloads the file.
- Saves it to
resource.path.
Design Philosophy
The dpetl package follows a convention over configuration philosophy, treating the Data Package descriptor as the single source of truth for ETL process.
Each resource declares how it should be processed through structured metadata, enabling reproducible, declarative, and version-controlled data workflows.
The goal is to keep the CLI simple while allowing flexible strategies driven entirely by configuration rather than imperative scripting.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dpetl-0.5.1.tar.gz.
File metadata
- Download URL: dpetl-0.5.1.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.12 Linux/6.17.4-76061704-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef4276ff23a70b25d5e758cc2c4646fc480967b8771a2d2377969fb57cefafa4
|
|
| MD5 |
d3f83df3b841348e16c2d36dace39bb8
|
|
| BLAKE2b-256 |
357c02f77a714619c98af971b7baec1f090b76876a4aa3f447a87c86c77c7e66
|
File details
Details for the file dpetl-0.5.1-py3-none-any.whl.
File metadata
- Download URL: dpetl-0.5.1-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.12 Linux/6.17.4-76061704-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73b66b51159d94c2aa6696568cb05425b739938d6996961ab5e9cc0db458f8d3
|
|
| MD5 |
c7d1b001d066625ac20138e73ebaf174
|
|
| BLAKE2b-256 |
1b15303a93d8687530bd63ea6e4f3d97d3736b3513d568fed4d404489a624c81
|