Skip to main content

No project description provided

Project description

dpetl — Data package ETL

dpetl is a command-line interface (CLI) tool designed to assist in running the three phases of the ETL (Extract, Transform, Load) process (although currently only the extract phase is implemented).

It is designed to work alongside the Data Package standard specification.

Installation

Install using pip:

pip install dpetl

Usage

Use the --help flag to inspect the CLI documentation:

dpetl --help

Currently, only the extract command is available:

# Run extract using the default datapackage.yaml descriptor
dpetl extract

# Specify a descriptor explicitly
dpetl extract -d path/to/datapackage.yaml
# or
dpetl extract --descriptor path/to/datapackage.yaml

How It Works

The CLI loads a Data Package descriptor (via the frictionless-py Python package) and iterates over its resources.

For each resource, dpetl extract comand reads the custom property:

dptel_extract:

The key mode determines which extractor will run.

Currently available modes:

  • api.
  • email.

Example Data Package Configuration

# datapackage.yaml
resources:
  - name: invoices
    path: data/invoices.csv
    sources:
      - method: get
        path: https://api.example.com/invoices
		dptel_extract:
			mode: api

  - name: payroll_from_email
    path: data/payroll.xlsx
		dptel_extract:
			mode: email
			mailbox: INBOX  # optional (defaults to INBOX)
			criteria:
				subject: "Payroll Report" # optional (defaults to resource name)

Extractors

Email Extractor

  • Connects to an IMAP server using environment variables:

    • EMAIL_USER.
    • EMAIL_PWD.
    • EMAIL_IMAP.
      • HTTP_PROXY[^1].

[^1]: Just in case you're running the command behind a corporate network that demands proxy configuration. The HTTPS_PROXY, http_proxy and https_proxy are equally acceptable. See this Issue's comment to understand why maybe you'll have to add authentication (http://<user>:<pwd>@<host>:<port>) on PROXY address.

  • Reads configuration from:
dptel_extract:
  mode: email
  mailbox: INBOX        # optional (default: INBOX)
  criteria:             # optional
    subject: "Report"   # optional (default: resource.name)
    from_: "finance@example.com" # optional
    date_gte: 2024-01-01 #optional

Behavior:

  • If dptel_extract.mailbox is not provided, INBOX is used.
  • If dptel_extract.criteria.subject is not provided, it defaults to the resource name.
  • The extractor searches for the most recent matching email.
  • All e-mail attachments are saved to resource.path.

API Extractor

  • Reads resource.sources.
  • Searches for a source containing a method.
  • Downloads the file.
  • Saves it to resource.path.

Design Philosophy

The dpetl package follows a convention over configuration philosophy, treating the Data Package descriptor as the single source of truth for ETL process.

Each resource declares how it should be processed through structured metadata, enabling reproducible, declarative, and version-controlled data workflows.

The goal is to keep the CLI simple while allowing flexible strategies driven entirely by configuration rather than imperative scripting.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dpetl-0.5.0.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dpetl-0.5.0-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file dpetl-0.5.0.tar.gz.

File metadata

  • Download URL: dpetl-0.5.0.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.10.12 Linux/6.17.4-76061704-generic

File hashes

Hashes for dpetl-0.5.0.tar.gz
Algorithm Hash digest
SHA256 de07bff71f1e56c31a4e65c7a9e234940f9e5bdfa1c7f02d466b0004d22f8b5e
MD5 9d6528d04cec2fff2c3626330a5d8afe
BLAKE2b-256 9026928181147388fc6f145bc1985e21a1665d95997d84a07b9233b7de99f614

See more details on using hashes here.

File details

Details for the file dpetl-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: dpetl-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.10.12 Linux/6.17.4-76061704-generic

File hashes

Hashes for dpetl-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bdad233a86b8364ff9c2841cfae0c8eba4ca6f179e7dabf3ba432f1ce4d1e53b
MD5 22b8148ffd319e735a4e0a16cada5e4a
BLAKE2b-256 257e6fc1c5743cac42725635abf16625fcdb911e130c20851a32e814e380f4f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page