Python framework to easily build ETLs.
Project description
cupyd
__
/\ \
___ __ __ _____ __ __ \_\ \
/'___\ /\ \/\ \ /\ '__`\ /\ \/\ \ /'_` \
/\ \__/ \ \ \_\ \ \ \ \L\ \ \ \ \_\ \ /\ \L\ \
\ \____\ \ \____/ \ \ ,__/ \/`____ \ \ \___,_\
\/____/ \/___/ \ \ \/ `/___/> \ \/__,_ /
\ \_\ /\___/
\/_/ \/__/
Python framework to create your own ETLs.
Features
- Simple but powerful syntax.
- Modular approach that encourages re-using components across different ETLs.
- Parallelism out-of-the-box without the need of writing multiprocessing code.
- Very compatible:
- Runs on Unix, Windows & MacOS.
- Python >= 3.9
- Lightweight:
Usage
In this example we will compute the factorial of 20.000 integers, using multiprocessing, while storing the results into 2 separate lists, one for even values and another for odd values.
import math
from typing import Any, Iterator
from cupyd import ETL, Extractor, Transformer, Loader, Filter
class IntegerExtractor(Extractor):
def __init__(self, total_items: int):
super().__init__()
self.total_items = total_items
# generated integers will be passed to the workers in buckets of size 10
self.configuration.bucket_size = 10
def extract(self) -> Iterator[int]:
for item in range(self.total_items):
yield item
class Factorial(Transformer):
def transform(self, item: int) -> int:
return math.factorial(item)
class EvenOnly(Filter):
def filter(self, item: int) -> int | None:
return item if item & 1 else None
class OddOnly(Filter):
def filter(self, item: int) -> int | None:
return None if item & 1 else item
class ListLoader(Loader):
def __init__(self):
super().__init__()
self.configuration.run_in_main_process = True
self.items = []
def start(self):
self.items = []
def load(self, item: Any):
self.items.append(item)
if __name__ == "__main__":
# 1. Define the ETL Nodes
ext = IntegerExtractor(total_items=20_000)
factorial = Factorial()
even_only = EvenOnly()
odd_only = OddOnly()
even_ldr = ListLoader()
odd_ldr = ListLoader()
# 2. Connect the Nodes to determine the data flow. Notice the ETL branches after the
# factorial is computed
ext >> factorial >> [even_only >> even_ldr, odd_only >> odd_ldr]
# 3. Run the ETL with 8 workers (multiprocessing Processes)
etl = ETL(extractor=ext)
etl.run(workers=8, show_progress=True, monitor_performance=True)
# 4. You can access the results stored in both Loaders after the ETL is finished
even_factorials = even_ldr.items
odd_factorials = odd_ldr.items
For more information, go the examples directory
💘 (Project under construction)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cupyd-0.2.0.tar.gz
(26.3 kB
view details)
Built Distribution
cupyd-0.2.0-py3-none-any.whl
(36.8 kB
view details)
File details
Details for the file cupyd-0.2.0.tar.gz
.
File metadata
- Download URL: cupyd-0.2.0.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12bded78210013f45d0d5572ea96f9b9babfe503e45f0c116f95bae55ca9cb58 |
|
MD5 | be99daa5a6e4875ecd7cbea05bbea7e4 |
|
BLAKE2b-256 | f4d67a03d64197dea245d9894b96685466f719a2d11ed5979e304cbdb1f219e7 |
File details
Details for the file cupyd-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: cupyd-0.2.0-py3-none-any.whl
- Upload date:
- Size: 36.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f58f8a1327e521142c3f872af97c2b079e8196959119f160a96ef22f68c14f3 |
|
MD5 | 02d4d2022ca0519aaf265afd5028a5a2 |
|
BLAKE2b-256 | afa8680dc08e2d6a9beb6a1a8b4ef8e062d8bb1950ed0a9d3a488b1c097849b3 |