Python-only framework to easily build ETLs.
Project description
cupyd
__
/\ \
___ __ __ _____ __ __ \_\ \
/'___\ /\ \/\ \ /\ '__`\ /\ \/\ \ /'_` \
/\ \__/ \ \ \_\ \ \ \ \L\ \ \ \ \_\ \ /\ \L\ \
\ \____\ \ \____/ \ \ ,__/ \/`____ \ \ \___,_\
\/____/ \/___/ \ \ \/ `/___/> \ \/__,_ /
\ \_\ /\___/
\/_/ \/__/
Python framework to create your own ETLs.
Features
- Simple but powerful syntax.
- Modular approach that encourages re-using components across different ETLs.
- Parallelism out-of-the-box without the need of writing multiprocessing code.
- Very compatible:
- Runs on Unix, Windows & MacOS.
- Python >= 3.9
- Lightweight:
Usage
In this example we will compute the factorial of 50.000 integers, using multiprocessing, while storing the results into 2 separate lists, one of even results and another for odd ones.
import math
from typing import Any
from cupyd import ETL, Extractor, Transformer, Loader, Filter
class IntegerExtractor(Extractor):
def __init__(self, total_items: int):
super().__init__()
self.total_items = total_items
# generated integers will be passed onto each worker in buckets of size 10
self.configuration.bucket_size = 10
def extract(self) -> int:
for item in range(self.total_items):
yield item
class Factorial(Transformer):
def transform(self, item: int) -> int:
return math.factorial(item)
class EvenOnly(Filter):
def filter(self, item: int) -> int | None:
return item if item & 1 else None
class OddOnly(Filter):
def filter(self, item: int) -> int | None:
return None if item & 1 else item
class ListLoader(Loader):
def __init__(self):
super().__init__()
self.configuration.run_in_main_process = True
self.items = []
def start(self):
self.items = []
def load(self, item: Any):
self.items.append(item)
if __name__ == "__main__":
# 1. Define the ETL Nodes
ext = IntegerExtractor(total_items=50_000)
factorial = Factorial()
even_only = EvenOnly()
odd_only = OddOnly()
even_ldr = ListLoader()
odd_ldr = ListLoader()
# 2. Connect the Nodes to determine the data flow. Notice the ETL branches after the
# factorial is computed
ext >> factorial >> [even_only >> even_ldr, odd_only >> odd_ldr]
# 3. Run the ETL with 8 workers (multiprocessing Processes)
etl = ETL(extractor=ext)
etl.run(workers=8, show_progress=True, monitor_performance=True)
# 4. You can access the results stored in both Loaders after the ETL is finished
even_factorials = even_ldr.items
odd_factorials = odd_ldr.items
For more information, go the examples directory
💘 (Project under construction)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cupyd-0.1.0.tar.gz
(25.0 kB
view details)
Built Distribution
cupyd-0.1.0-py3-none-any.whl
(36.0 kB
view details)
File details
Details for the file cupyd-0.1.0.tar.gz
.
File metadata
- Download URL: cupyd-0.1.0.tar.gz
- Upload date:
- Size: 25.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d582fb17573a4477c13dee52487973384de884678dc627f66d2bb49018d1ed4 |
|
MD5 | d6ce148df04b67e290bd0feb8363d0ef |
|
BLAKE2b-256 | ab7950b381a5918eed2344b6fcfd72329b27defd7fc09850cc05a0662c124fd2 |
File details
Details for the file cupyd-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: cupyd-0.1.0-py3-none-any.whl
- Upload date:
- Size: 36.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8445a8397ac33f2e8d7ff34d8dabb00fe85e73b13c2276cd1ce056e6ca958247 |
|
MD5 | 2e19a66fcd7bfb90857c9cd9f78e9778 |
|
BLAKE2b-256 | b60708547d448242b1f2205a6aa00e0aa957850b7718742eb1b9fbf3edcd1811 |