Skip to main content

No project description provided

Project description

buildable-dataclasses

tests pypi License: MIT bear-ified Ruff-ified

batch-processing pipelines via nested dataclasses img.png

example batch-processing pipeline

  • one first has to compute the dependencies before you can compute the FooBar itself
flowchart TD
foobar["FooBar"] --> | depends-on | dependency-a["NeededData"]
foobar["FooBar"] --> | depends-on | dependency-b["AnotherThing"]

DAG declared via Buildable-dataclasses (foobar-example)

@dataclass
class FooBar(Buildable):
    dependency_a: NeededData # another buildable dataclass
    dependency_b: AnotherThing # yet another buildable dataclass
    
    some_result: SomeDataContainer # whatever type you want

    def _build_self(self):
        self.some_result=some_fancy_processing(self.dependency_a, self.dependency_b)
    
data=FooBar(NeededData(), AnotherThing())
data.build() # first builds the dependencies then the dataclass itself (_build_self)
  • everything is happening in memory, what about writing results to disk?

BuildableData (foobar-example)

  • writing processed data to disk (or S3/wherever)
@dataclass
class FooBarData(BuildableData):
    dependency_a: NeededData
    dependency_b: AnotherThing

    @property
    def processed_data_file(self) -> str:
        return f"{self.data_dir}/processed_data.whatever"
    
    @property
    def _is_data_valid(self) -> bool:
        return Path(self.processed_data_file).is_file()

    def _build_data(self) -> None:
        processed_data=fancy_processing(self.dependency_a, self.dependency_b)
        write(self.processed_data_file, processed_data)
        
    def example_reading_method(self)->ProcessedData:
        return read(self.processed_data_file)

data=FooBarData(NeededData(), AnotherThing())
data.build() #  processes data and writes it to disk
# in another python-process or another day
probably_already_processed=FooBarData(NeededData(), AnotherThing())
probably_already_processed.build() # does NOT reprocess your data cause "is_data_valid" returns True
my_data=data.example_reading_method() 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

buildable_dataclasses-0.1.1.tar.gz (9.3 kB view details)

Uploaded Source

File details

Details for the file buildable_dataclasses-0.1.1.tar.gz.

File metadata

  • Download URL: buildable_dataclasses-0.1.1.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for buildable_dataclasses-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cb989491a075317aa21281a91de3e176aad164c0262a393345798de82b2ca1bd
MD5 240c85eae069fda29eda9505a0b4e646
BLAKE2b-256 cfd291c304b7d9e7c31219dab3a6f5b61c3cae6c23fb0e884f7ad3120ff77989

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page