Dataset SDK for consistent read/write [batch, online, streaming] data.
Project description
Welcome to zdatasets
==================================================
TODO
import pandas as pd
from metaflow import FlowSpec, step
from zdatasets import Dataset, Mode
from zdatasets.metaflow import DatasetParameter
from zdatasets.plugins import BatchOptions
# Can also invoke from CLI:
# > python zdatasets/tutorials/0_hello_dataset_flow.py run \
# --hello_dataset '{"name": "HelloDataset", "mode": "READ_WRITE", \
# "options": {"type": "BatchOptions", "partition_by": "region"}}'
class HelloDatasetFlow(FlowSpec):
hello_dataset = DatasetParameter(
"hello_dataset",
default=Dataset("HelloDataset", mode=Mode.READ_WRITE, options=BatchOptions(partition_by="region")),
)
@step
def start(self):
df = pd.DataFrame({"region": ["A", "A", "A", "B", "B", "B"], "zpid": [1, 2, 3, 4, 5, 6]})
print("saving data_frame: \n", df.to_string(index=False))
# Example of writing to a dataset
self.hello_dataset.write(df)
# save this as an output dataset
self.output_dataset = self.hello_dataset
self.next(self.end)
@step
def end(self):
print(f"I have dataset \n{self.output_dataset=}")
# output_dataset to_pandas(partitions=dict(region="A")) only
df: pd.DataFrame = self.output_dataset.to_pandas(partitions=dict(region="A"))
print('self.output_dataset.to_pandas(partitions=dict(region="A")):')
print(df.to_string(index=False))
if __name__ == "__main__":
HelloDatasetFlow()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
zdatasets-1.2.5.tar.gz
(54.8 kB
view details)
Built Distribution
zdatasets-1.2.5-py3-none-any.whl
(84.7 kB
view details)
File details
Details for the file zdatasets-1.2.5.tar.gz
.
File metadata
- Download URL: zdatasets-1.2.5.tar.gz
- Upload date:
- Size: 54.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 170225853920c6a778416244409d09d7010603122948715ffa2ab2d55b37eb43 |
|
MD5 | b91e064c122ca6ec8e2a0b18b3498e1e |
|
BLAKE2b-256 | ab554b31c6a25c395625168c89820fc0c4ed2a9c8af1f1168c915806d7c38e68 |
File details
Details for the file zdatasets-1.2.5-py3-none-any.whl
.
File metadata
- Download URL: zdatasets-1.2.5-py3-none-any.whl
- Upload date:
- Size: 84.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 139733fb02e83fa7e999f2cdc1a0b21227683a63cc0fd15b1192f7e1c3e0ff38 |
|
MD5 | eb4f6e0d4969acb97f632be4223fdaa6 |
|
BLAKE2b-256 | 7336fbd77aee42dfd95451dbabe1aae63708401dad852f7183fc92f56fe9358b |