Skip to main content

Dataset SDK for consistent read/write [batch, online, streaming] data.

Project description

Tests Coverage Status Binder

Welcome to @datasets

TODO

import pandas as pd
from metaflow import FlowSpec, Parameter, current, step

from datasets import DatasetType, Mode


# Can also invoke from CLI:
#  > python datasets/tutorials/0_hello_dataset_flow.py run \
#    --hello_dataset '{"name": "foo", "partition_by": "region", "mode": "Write"}'
class HelloDatasetFlow(FlowSpec):
    hello_dataset = Parameter(
        "hello_dataset",
        default=dict(name="HelloDataset", partition_by="region", mode=Mode.Write),
        type=DatasetType,
    )

    @step
    def start(self):
        df = pd.DataFrame({"region": ["A", "A", "A", "B", "B", "B"], "zpid": [1, 2, 3, 4, 5, 6]})
        print("saving df: \n", df.to_string(index=False))

        # Example of writing to a dataset
        print(f"{self.hello_dataset.program_name=}")
        self.hello_dataset.write(df)

        self.next(self.end)

    @step
    def end(self):
        print(f"I have dataset \n{self.hello_dataset=}")

    # hello_dataset to_pandas()
    df: pd.DataFrame = self.hello_dataset.to_pandas(run_id=current.run_id)
    print("self.hello_dataset.to_pandas():\n", df.to_string(index=False))

    # save this as an output dataset
    self.output_dataset = self.hello_dataset


if __name__ == "__main__":
    HelloDatasetFlow()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zdatasets-0.0.10.tar.gz (47.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zdatasets-0.0.10-py3-none-any.whl (75.1 kB view details)

Uploaded Python 3

File details

Details for the file zdatasets-0.0.10.tar.gz.

File metadata

  • Download URL: zdatasets-0.0.10.tar.gz
  • Upload date:
  • Size: 47.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for zdatasets-0.0.10.tar.gz
Algorithm Hash digest
SHA256 6783e0a1c261e1dccdddb322c1d54b56906cc8339c031ae33904e2e453fba21b
MD5 123e65a1a7066ce64b6c5a0eef5a2614
BLAKE2b-256 472d7af7748baa8841c10474d79ac0257592b8bffaf2541f5d594abfb8e9a804

See more details on using hashes here.

File details

Details for the file zdatasets-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: zdatasets-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 75.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for zdatasets-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 222ce99db0feb30162e0b549fe85edffe016e084b568e8b1fa74c2e859fadff3
MD5 913ec7af9c36244f326576fa012a5469
BLAKE2b-256 a83d5ffd15492b3a21f56c4192ed9eba8efe975732c1fc2cafc29bd8c9c62d01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page