Skip to main content

Dataset SDK for consistent read/write [batch, online, streaming] data.

Project description

Tests Coverage Status Binder

Welcome to zdatasets

==================================================

TODO

import pandas as pd
from metaflow import FlowSpec, step

from zdatasets import Dataset, Mode
from zdatasets.metaflow import DatasetParameter
from zdatasets.plugins import BatchOptions


# Can also invoke from CLI:
#  > python zdatasets/tutorials/0_hello_dataset_flow.py run \
#    --hello_dataset '{"name": "HelloDataset", "mode": "READ_WRITE", \
#    "options": {"type": "BatchOptions", "partition_by": "region"}}'
class HelloDatasetFlow(FlowSpec):
    hello_dataset = DatasetParameter(
        "hello_dataset",
        default=Dataset("HelloDataset", mode=Mode.READ_WRITE, options=BatchOptions(partition_by="region")),
    )

    @step
    def start(self):
        df = pd.DataFrame({"region": ["A", "A", "A", "B", "B", "B"], "zpid": [1, 2, 3, 4, 5, 6]})
        print("saving data_frame: \n", df.to_string(index=False))

        # Example of writing to a dataset
        self.hello_dataset.write(df)

        # save this as an output dataset
        self.output_dataset = self.hello_dataset

        self.next(self.end)

    @step
    def end(self):
        print(f"I have dataset \n{self.output_dataset=}")

        # output_dataset to_pandas(partitions=dict(region="A")) only
        df: pd.DataFrame = self.output_dataset.to_pandas(partitions=dict(region="A"))
        print('self.output_dataset.to_pandas(partitions=dict(region="A")):')
        print(df.to_string(index=False))


if __name__ == "__main__":
    HelloDatasetFlow()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zdatasets-1.2.5.tar.gz (54.8 kB view details)

Uploaded Source

Built Distribution

zdatasets-1.2.5-py3-none-any.whl (84.7 kB view details)

Uploaded Python 3

File details

Details for the file zdatasets-1.2.5.tar.gz.

File metadata

  • Download URL: zdatasets-1.2.5.tar.gz
  • Upload date:
  • Size: 54.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for zdatasets-1.2.5.tar.gz
Algorithm Hash digest
SHA256 170225853920c6a778416244409d09d7010603122948715ffa2ab2d55b37eb43
MD5 b91e064c122ca6ec8e2a0b18b3498e1e
BLAKE2b-256 ab554b31c6a25c395625168c89820fc0c4ed2a9c8af1f1168c915806d7c38e68

See more details on using hashes here.

File details

Details for the file zdatasets-1.2.5-py3-none-any.whl.

File metadata

  • Download URL: zdatasets-1.2.5-py3-none-any.whl
  • Upload date:
  • Size: 84.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for zdatasets-1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 139733fb02e83fa7e999f2cdc1a0b21227683a63cc0fd15b1192f7e1c3e0ff38
MD5 eb4f6e0d4969acb97f632be4223fdaa6
BLAKE2b-256 7336fbd77aee42dfd95451dbabe1aae63708401dad852f7183fc92f56fe9358b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page