Skip to main content

A modular and extensible ETL-like pipeline builder

Project description

Build

ModuPipe : A modular and extensible ETL-like pipeline builder

Benefits

  • Entirely typed
  • Abstract, so it fits any use case
  • Class-based for easy configurations and injections

Usage

Extract-Transform-Load (ETL) pipelines are a classic form of data-processing pipelines used in the industry. It consists of 3 main elements:

  1. An Extractor, which returns data in a stream-like structure (Iterator in Python) using a pull strategy.
  2. Some Mapper (optional), which transforms (parse, converts, filters, etc.) the data obtained from the source(s). Mappers can be chained together and chained to an extractor (with +) in order to form a new extractor.
  3. A Loader, which receives the maybe-transformed data using a push strategy. Loaders can be multiple (with LoaderList) or chained together (with +).

Therefore, those 3 processes are offered as interfaces, easily chainable and interchangeable at any time.

An interface Runnable is also offered in order to interface the concept of "running a pipeline". This enables a powerfull composition pattern for wrapping the execution behaviour of runnables.

Examples

Usage examples are present in the examples folder.

Discussion

Optimizing pushing to multiple loaders

If you have multiple loaders (using the LoaderList class or many chained PushTo mappers), but performance is a must, then you should use a multi-processing approach (with modupipe.runnable.MultiProcess), and push to 1 queue per loader. Each queue will also become a direct extractor for each loader, all running in parallel. This is especially usefull when at least one of the loaders takes a long processing time.

As an example, let's take a Loader 1 which is very slow, and a Loader 2 which is normally fast. You'll be going from :

┌────── single pipeline ──────┐        ┌──────────────── single pipeline ───────────────┐
 Extractor ┬─⏵ Loader 1 (slow)    OR    Extractor ──⏵ Loader 1 (slow) ──⏵ Loader 2 (late)
           └─⏵ Loader 2 (late)

to :

┌────── pipeline 1 ──────┐               ┌────────── pipeline 2 ─────────┐
 Extractor ┬─⏵ PutToQueue ──⏵ Queue 1 ⏴── GetFromQueue ──⏵ Loader 1 (slow)
           └─⏵ PutToQueue ──⏵ Queue 2 ⏴── GetFromQueue ──⏵ Loader 2 (not late)
                                         └──────────── pipeline 3 ───────────┘

This will of course not accelerate the Loader 1 processing time, but all the other loaders performances will be greatly improved by not waiting for each other.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modupipe-1.0.2.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modupipe-1.0.2-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file modupipe-1.0.2.tar.gz.

File metadata

  • Download URL: modupipe-1.0.2.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/37.0 requests/2.28.1 requests-toolbelt/0.9.1 urllib3/1.26.12 tqdm/4.64.0 importlib-metadata/4.12.0 keyring/23.8.2 rfc3986/2.0.0 colorama/0.4.5 CPython/3.9.13

File hashes

Hashes for modupipe-1.0.2.tar.gz
Algorithm Hash digest
SHA256 1f653f862f11da5c9bd118c70d4cbb3d02b38a71296e531ae2c59e7f388790b7
MD5 a8dc8aec301b62bbad96f3a14a17dae5
BLAKE2b-256 e2e0615ce9c556139ba996db365fb023b5044a8108ce7709f9656a5391151e7e

See more details on using hashes here.

File details

Details for the file modupipe-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: modupipe-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/37.0 requests/2.28.1 requests-toolbelt/0.9.1 urllib3/1.26.12 tqdm/4.64.0 importlib-metadata/4.12.0 keyring/23.8.2 rfc3986/2.0.0 colorama/0.4.5 CPython/3.9.13

File hashes

Hashes for modupipe-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 46d17c5c2dc3dbb237fce62e9d3defb6d2f6eaf2836f6226ceaf7e1bb083744a
MD5 bd41a2cbb44331b792be0363f57d6dbc
BLAKE2b-256 7e2754db9ce6fd562dccc2112ab1073ba6a29cde501b489154c3087157135ab0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page