Skip to main content

Python framework to easily build ETLs.

Project description

cupyd

PyPI - Version Python Version from PEP 621 TOML GitHub Actions Workflow Status Coverage Status PyPI - Downloads

                                                  __     
                                                 /\ \    
  ___       __  __      _____       __  __       \_\ \   
 /'___\    /\ \/\ \    /\ '__`\    /\ \/\ \      /'_` \  
/\ \__/    \ \ \_\ \   \ \ \L\ \   \ \ \_\ \    /\ \L\ \ 
\ \____\    \ \____/    \ \ ,__/    \/`____ \   \ \___,_\
 \/____/     \/___/      \ \ \/      `/___/> \   \/__,_ /
                          \ \_\         /\___/           
                           \/_/         \/__/

Python framework to create your own ETLs.

Features

  • Simple but powerful syntax.
  • Modular approach that encourages re-using components across different ETLs.
  • Parallelism out-of-the-box without the need of writing multiprocessing code.
  • Very compatible:
    • Runs on Unix, Windows & MacOS.
    • Python >= 3.9
  • Lightweight:
    • No dependencies for its core version.
    • [WIP] API version will require Falcon, which is a minimalist ASGI/WSGI framework that doesn't require other packages to work.
    • [WIP] The Dashboard (full) version will require Falcon and Dash.

Usage

In this example we will compute the factorial of 20.000 integers, using multiprocessing, while storing the results into 2 separate lists, one for even values and another for odd values.

import math
from typing import Any, Iterator

from cupyd import ETL, Extractor, Transformer, Loader, Filter


class IntegerExtractor(Extractor):

    def __init__(self, total_items: int):
        super().__init__()
        self.total_items = total_items

        # generated integers will be passed to the workers in buckets of size 10
        self.configuration.bucket_size = 10

    def extract(self) -> Iterator[int]:
        for item in range(self.total_items):
            yield item


class Factorial(Transformer):

    def transform(self, item: int) -> int:
        return math.factorial(item)


class EvenOnly(Filter):

    def filter(self, item: int) -> int | None:
        return item if item & 1 else None


class OddOnly(Filter):

    def filter(self, item: int) -> int | None:
        return None if item & 1 else item


class ListLoader(Loader):

    def __init__(self):
        super().__init__()
        self.configuration.run_in_main_process = True
        self.items = []

    def start(self):
        self.items = []

    def load(self, item: Any):
        self.items.append(item)


if __name__ == "__main__":
    # 1. Define the ETL Nodes
    ext = IntegerExtractor(total_items=20_000)
    factorial = Factorial()
    even_only = EvenOnly()
    odd_only = OddOnly()
    even_ldr = ListLoader()
    odd_ldr = ListLoader()

    # 2. Connect the Nodes to determine the data flow. Notice the ETL branches after the
    # factorial is computed
    ext >> factorial >> [even_only >> even_ldr, odd_only >> odd_ldr]

    # 3. Run the ETL with 8 workers (multiprocessing Processes)
    etl = ETL(extractor=ext)
    etl.run(workers=8, show_progress=True, monitor_performance=True)

    # 4. You can access the results stored in both Loaders after the ETL is finished
    even_factorials = even_ldr.items
    odd_factorials = odd_ldr.items

For more information, go the examples directory


💘 (Project under construction)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cupyd-0.2.0.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

cupyd-0.2.0-py3-none-any.whl (36.8 kB view details)

Uploaded Python 3

File details

Details for the file cupyd-0.2.0.tar.gz.

File metadata

  • Download URL: cupyd-0.2.0.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for cupyd-0.2.0.tar.gz
Algorithm Hash digest
SHA256 12bded78210013f45d0d5572ea96f9b9babfe503e45f0c116f95bae55ca9cb58
MD5 be99daa5a6e4875ecd7cbea05bbea7e4
BLAKE2b-256 f4d67a03d64197dea245d9894b96685466f719a2d11ed5979e304cbdb1f219e7

See more details on using hashes here.

File details

Details for the file cupyd-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: cupyd-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 36.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for cupyd-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f58f8a1327e521142c3f872af97c2b079e8196959119f160a96ef22f68c14f3
MD5 02d4d2022ca0519aaf265afd5028a5a2
BLAKE2b-256 afa8680dc08e2d6a9beb6a1a8b4ef8e062d8bb1950ed0a9d3a488b1c097849b3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page