No project description provided
Project description
dpetl — Data package ETL
The dpetl is a command-line interface (CLI) tool designed to run the three ETL phases (Extract, Transform, Load)[^1]
[^1]: Although currently only the Extract phase is implemented.
It is designed to work alongside the Data Package standard specification.
Installation
It requires Python 3.10 or more. Install:
# using pip
pip install dpetl
# using poetry
poetry add dpetl
Usage
Active your virtual environment!
Use the --help flag to inspect the CLI documentation:
dpetl --help
Currently, only the extract command is available:
# Run extract using the default datapackage.yaml descriptor
dpetl extract
# Specify a descriptor explicitly
dpetl extract -d path/to/datapackage.yaml
# or
dpetl extract --descriptor path/to/datapackage.yaml
How It Works
The CLI loads Data Package descriptor(s) (via the frictionless-py Python package) and iterates over its resources.
A .toml file could also be provided as a descriptor (using the -d flag) to run the command(s) recursively.
Please create a .toml file following the below pattern:
title = 'dados_orcamentarios'
[datapackages] # required
[datapackages.dados_siafi]
path = 'datapackages/dados_siafi/datapackage.yaml' # descriptor required via path property
[datapackages.dados_sisor]
path = 'datapackages/dados_sisor/datapackage.yaml' # descriptor required via path property
For each resource found, dpetl extract command reads its dpetl_extract custom property:
The key mode determines which extractor will run.
Currently, available modes are:
api.email.
Example Data Package Configuration
# datapackage.yaml
resources:
- name: invoices
path: data/invoices.csv
sources:
- method: get
path: https://api.example.com/invoices
dpetl_extract:
mode: api
- name: payroll_from_email
path: data/payroll.xlsx
dpetl_extract:
mode: email
mailbox: INBOX # optional (Defaults to INBOX)
criteria:
subject: "Payroll Report" # optional (Defaults to resource name. See also the flag --add-package-name)
Extractors
Email Extractor
-
Connects to an IMAP server using environment variables:
EMAIL_USER.EMAIL_PWD.EMAIL_IMAP.HTTP_PROXY[^2].
[^2]: Just in case you're running the command behind a corporate network that demands proxy configuration. The HTTP_PROXY, HTTPS_PROXY, http_proxy and https_proxy environment variables are equally acceptable. See this Issue's comment to understand why maybe you'll have to add authentication (http://<user>:<pwd>@<host>:<port>) on PROXY address.
- Reads configuration from:
dpetl_extract:
mode: email
mailbox: INBOX # optional (Defaults to INBOX)
criteria: # optional
subject: "Report" # optional (Defaults to resource name. See also the flag --add-package-name)
from_: "finance@example.com" # optional
date_gte: 2024-01-01 #optional (See also the flag --today-email)
Behavior:
- If
dpetl_extract.mailboxis not provided,INBOXis used. - If
dpetl_extract.criteria.subjectis not provided, it defaults to the resource name. - If the flag
--add-package-nameis provided the e-mail subject pattern will be{package_name}_{resource_name}instead of just resource name. - If the flag
--today-emailis provided the date when the command runs will be used in the to search criteria. - The extractor searches for the most recent matching e-mail.
- All e-mail attachments are saved to
resource.path.
API Extractor
- Reads
resource.sources. - Searches for a source containing a
method. - Downloads the file.
- Saves it to
resource.path.
Design Philosophy
The dpetl package follows a convention over configuration philosophy, treating the Data Package descriptor as the single source of truth for ETL process.
Each resource declares how it should be processed through structured metadata, enabling reproducible, declarative, and version-controlled data workflows.
The goal is to keep the CLI simple while allowing flexible strategies driven entirely by configuration rather than imperative scripting.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dpetl-0.7.0.tar.gz.
File metadata
- Download URL: dpetl-0.7.0.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.12 Linux/6.17.9-76061709-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
749240b78cc4a05c3e209bc9872ad44b89cc9ae28f3c3b383fb1b398ccf2e241
|
|
| MD5 |
7e1e8c659da77ab0de83c71874f680eb
|
|
| BLAKE2b-256 |
b65d590def345616314b9bc16203170014966f6cd6bc977d550397e817f58918
|
File details
Details for the file dpetl-0.7.0-py3-none-any.whl.
File metadata
- Download URL: dpetl-0.7.0-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.10.12 Linux/6.17.9-76061709-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0192ade70e7f1b9f1ba493161cb9b6119bc2da45f1adb2b7549d86b0db751b41
|
|
| MD5 |
8acdd31dcfe69caa28afa39143ce714f
|
|
| BLAKE2b-256 |
2859c53131747c7226f517504e5647a529fbd3f1f04ad3d560eaa30203ff1713
|