A modular and extensible ETL-like pipeline builder
Project description
ModuPipe : A modular and extensible ETL-like pipeline builder
Benefits
- Entirely typed
- Abstract, so it fits any use case
- Class-based for easy configurations and injections
Usage
Extract-Transform-Load (ETL) pipelines are a classic form of data-processing pipelines used in the industry. It consists of 3 main elements:
- An
Extractor, which returns data in a stream-like structure (Iteratorin Python) using a pull strategy. - Some
Mapper(optional), which transforms (parse, converts, filters, etc.) the data obtained from the source(s). Mappers can be chained together and chained to an extractor (with+) in order to form a new extractor. - A
Loader, which receives the maybe-transformed data using a push strategy. Loaders can be multiple (withLoaderList) or chained together (with+).
Therefore, those 3 processes are offered as interfaces, easily chainable and interchangeable at any time.
An interface Runnable is also offered in order to interface the concept of "running a pipeline". This enables a powerfull composition pattern for wrapping the execution behaviour of runnables.
Examples
Usage examples are present in the examples folder.
Discussion
Optimizing pushing to multiple loaders
If you have multiple loaders (using the LoaderList class or many chained PushTo mappers), but performance is a must, then you should use a multi-processing approach (with modupipe.runnable.MultiProcess), and push to 1 queue per loader. Each queue will also become a direct extractor for each loader, all running in parallel. This is especially usefull when at least one of the loaders takes a long processing time.
As an example, let's take a Loader 1 which is very slow, and a Loader 2 which is normally fast. You'll be going from :
┌────── single pipeline ──────┐ ┌──────────────── single pipeline ───────────────┐
Extractor ┬─⏵ Loader 1 (slow) OR Extractor ──⏵ Loader 1 (slow) ──⏵ Loader 2 (late)
└─⏵ Loader 2 (late)
to :
┌────── pipeline 1 ──────┐ ┌────────── pipeline 2 ─────────┐
Extractor ┬─⏵ PutToQueue ──⏵ Queue 1 ⏴── GetFromQueue ──⏵ Loader 1 (slow)
└─⏵ PutToQueue ──⏵ Queue 2 ⏴── GetFromQueue ──⏵ Loader 2 (not late)
└──────────── pipeline 3 ───────────┘
This will of course not accelerate the Loader 1 processing time, but all the other loaders performances will be greatly improved by not waiting for each other.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modupipe-1.0.2.tar.gz.
File metadata
- Download URL: modupipe-1.0.2.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/37.0 requests/2.28.1 requests-toolbelt/0.9.1 urllib3/1.26.12 tqdm/4.64.0 importlib-metadata/4.12.0 keyring/23.8.2 rfc3986/2.0.0 colorama/0.4.5 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f653f862f11da5c9bd118c70d4cbb3d02b38a71296e531ae2c59e7f388790b7
|
|
| MD5 |
a8dc8aec301b62bbad96f3a14a17dae5
|
|
| BLAKE2b-256 |
e2e0615ce9c556139ba996db365fb023b5044a8108ce7709f9656a5391151e7e
|
File details
Details for the file modupipe-1.0.2-py3-none-any.whl.
File metadata
- Download URL: modupipe-1.0.2-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/37.0 requests/2.28.1 requests-toolbelt/0.9.1 urllib3/1.26.12 tqdm/4.64.0 importlib-metadata/4.12.0 keyring/23.8.2 rfc3986/2.0.0 colorama/0.4.5 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46d17c5c2dc3dbb237fce62e9d3defb6d2f6eaf2836f6226ceaf7e1bb083744a
|
|
| MD5 |
bd41a2cbb44331b792be0363f57d6dbc
|
|
| BLAKE2b-256 |
7e2754db9ce6fd562dccc2112ab1073ba6a29cde501b489154c3087157135ab0
|