No project description provided
Project description
buildable-dataclasses
batch-processing pipelines via nested dataclasses
example batch-processing pipeline
- one first has to compute the dependencies before you can compute the FooBar itself
flowchart TD
foobar["FooBar"] --> | depends-on | dependency-a["NeededData"]
foobar["FooBar"] --> | depends-on | dependency-b["AnotherThing"]
DAG declared via Buildable
-dataclasses (foobar-example)
@dataclass
class FooBar(Buildable):
dependency_a: NeededData # another buildable dataclass
dependency_b: AnotherThing # yet another buildable dataclass
some_result: SomeDataContainer # whatever type you want
def _build_self(self):
self.some_result=some_fancy_processing(self.dependency_a, self.dependency_b)
data=FooBar(NeededData(), AnotherThing())
data.build() # first builds the dependencies then the dataclass itself (_build_self)
- everything is happening in memory, what about writing results to disk?
BuildableData (foobar-example)
- writing processed data to disk (or S3/wherever)
@dataclass
class FooBarData(BuildableData):
dependency_a: NeededData
dependency_b: AnotherThing
@property
def processed_data_file(self) -> str:
return f"{self.data_dir}/processed_data.whatever"
@property
def _is_data_valid(self) -> bool:
return Path(self.processed_data_file).is_file()
def _build_data(self) -> None:
processed_data=fancy_processing(self.dependency_a, self.dependency_b)
write(self.processed_data_file, processed_data)
def example_reading_method(self)->ProcessedData:
return read(self.processed_data_file)
data=FooBarData(NeededData(), AnotherThing())
data.build() # processes data and writes it to disk
# in another python-process or another day
probably_already_processed=FooBarData(NeededData(), AnotherThing())
probably_already_processed.build() # does NOT reprocess your data cause "is_data_valid" returns True
my_data=data.example_reading_method()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file buildable_dataclasses-0.1.1.tar.gz
.
File metadata
- Download URL: buildable_dataclasses-0.1.1.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb989491a075317aa21281a91de3e176aad164c0262a393345798de82b2ca1bd |
|
MD5 | 240c85eae069fda29eda9505a0b4e646 |
|
BLAKE2b-256 | cfd291c304b7d9e7c31219dab3a6f5b61c3cae6c23fb0e884f7ad3120ff77989 |