Skip to main content

No project description provided

Project description

buildable-dataclasses

tests pypi License: MIT bear-ified Ruff-ified

batch-processing pipelines via nested dataclasses img.png

example batch-processing pipeline

  • one first has to compute the dependencies before you can compute the FooBar itself
flowchart TD
foobar["FooBar"] --> | depends-on | dependency-a["NeededData"]
foobar["FooBar"] --> | depends-on | dependency-b["AnotherThing"]

DAG declared via Buildable-dataclasses (foobar-example)

@dataclass
class FooBar(Buildable):
    dependency_a: NeededData # another buildable dataclass
    dependency_b: AnotherThing # yet another buildable dataclass
    
    some_result: SomeDataContainer # whatever type you want

    def _build_self(self):
        self.some_result=some_fancy_processing(self.dependency_a, self.dependency_b)
    
data=FooBar(NeededData(), AnotherThing())
data.build() # first builds the dependencies then the dataclass itself (_build_self)
  • everything is happening in memory, what about writing results to disk?

BuildableData (foobar-example)

  • writing processed data to disk (or S3/wherever)
@dataclass
class FooBarData(BuildableData):
    dependency_a: NeededData
    dependency_b: AnotherThing

    @property
    def processed_data_file(self) -> str:
        return f"{self.data_dir}/processed_data.whatever"
    
    @property
    def _is_data_valid(self) -> bool:
        return Path(self.processed_data_file).is_file()

    def _build_data(self) -> None:
        processed_data=fancy_processing(self.dependency_a, self.dependency_b)
        write(self.processed_data_file, processed_data)
        
    def example_reading_method(self)->ProcessedData:
        return read(self.processed_data_file)

data=FooBarData(NeededData(), AnotherThing())
data.build() #  processes data and writes it to disk
# in another python-process or another day
probably_already_processed=FooBarData(NeededData(), AnotherThing())
probably_already_processed.build() # does NOT reprocess your data cause "is_data_valid" returns True
my_data=data.example_reading_method() 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

buildable_dataclasses-0.1.1.tar.gz (9.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page