Dataset SDK for consistent read/write [batch, online, streaming] data.

These details have not been verified by PyPI

Project description

Tests

Welcome to zdatasets

Development

Set the version to a dev version, e.g. 1.3.0.dev1 in pyproject.toml when starting development.
Bump the dev version (e.g., 1.3.0.dev1 → 1.3.0.dev2) every time you have a change you want to test in other repositories.
After every change, confirm that the github workflow runs are successful at https://github.com/zillow/zdatasets/actions.
The dev versions are published in test PyPI at https://test.pypi.org/project/zdatasets/#history.
While testing your changes, you may need to reference your merge request in other repositories' pyproject.toml instead of using the dev version. For example,

dataset = [
  "zdatasets[kubernetes] @ git+https://github.com/zillow/zdatasets.git@refs/pull/42/head"
]

Bump the release version (e.g., 1.3.0.dev2 → 1.3.1) before merging your code change.
Confirm the release of the new version in PyPI at https://pypi.org/project/zdatasets/#history.
Create the release in https://github.com/zillow/zdatasets/releases.
For any authentication issues in publishing to PyPI, ask for help in the #open-source slack channel.

Example

import pandas as pd
from metaflow import FlowSpec, step

from zdatasets import Dataset, Mode
from zdatasets.metaflow import DatasetParameter
from zdatasets.plugins import BatchOptions


# Can also invoke from CLI:
#  > python zdatasets/tutorials/0_hello_dataset_flow.py run \
#    --hello_dataset '{"name": "HelloDataset", "mode": "READ_WRITE", \
#    "options": {"type": "BatchOptions", "partition_by": "region"}}'
class HelloDatasetFlow(FlowSpec):
    hello_dataset = DatasetParameter(
        "hello_dataset",
        default=Dataset("HelloDataset", mode=Mode.READ_WRITE, options=BatchOptions(partition_by="region")),
    )

    @step
    def start(self):
        df = pd.DataFrame({"region": ["A", "A", "A", "B", "B", "B"], "zpid": [1, 2, 3, 4, 5, 6]})
        print("saving data_frame: \n", df.to_string(index=False))

        # Example of writing to a dataset
        self.hello_dataset.write(df)

        # save this as an output dataset
        self.output_dataset = self.hello_dataset

        self.next(self.end)

    @step
    def end(self):
        print(f"I have dataset \n{self.output_dataset=}")

        # output_dataset to_pandas(partitions=dict(region="A")) only
        df: pd.DataFrame = self.output_dataset.to_pandas(partitions=dict(region="A"))
        print('self.output_dataset.to_pandas(partitions=dict(region="A")):')
        print(df.to_string(index=False))


if __name__ == "__main__":
    HelloDatasetFlow()

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.3.0

Feb 2, 2026

1.2.5

Jun 26, 2023

0.2.5

Apr 27, 2023

0.2.4

Dec 8, 2022

0.2.3

Dec 8, 2022

0.2.2

Nov 17, 2022

0.2.1

Oct 6, 2022

0.2.0

Oct 5, 2022

0.1.3

Oct 5, 2022

0.1.2

Aug 20, 2022

0.1.1

May 31, 2022

0.0.11

May 23, 2022

0.0.10

May 11, 2022

0.0.8.dev2 pre-release

Apr 27, 2022

0.0.4

Dec 5, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zdatasets-1.3.0.tar.gz (55.3 kB view details)

Uploaded Feb 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zdatasets-1.3.0-py3-none-any.whl (86.0 kB view details)

Uploaded Feb 2, 2026 Python 3

File details

Details for the file zdatasets-1.3.0.tar.gz.

File metadata

Download URL: zdatasets-1.3.0.tar.gz
Upload date: Feb 2, 2026
Size: 55.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zdatasets-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`a5376c92a53d28a96832b2ecffb652d52b0c08f6966f97dfeb070ef62984b476`
MD5	`6b481ed7d4791a06a2533fc91b55cde3`
BLAKE2b-256	`fafaa4e4a63d421909eeae8659e90383a67e06b216234b5c59f5fd1927254881`

See more details on using hashes here.

File details

Details for the file zdatasets-1.3.0-py3-none-any.whl.

File metadata

Download URL: zdatasets-1.3.0-py3-none-any.whl
Upload date: Feb 2, 2026
Size: 86.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zdatasets-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d66905334358b8f7d2ea30b5853ee1ba1627891ca1281fbbd3dcc3f22e892abd`
MD5	`494df7c920816716539d19ece86ca8ec`
BLAKE2b-256	`3f6dd654a34ca9225c0e8cdb00918cecf51ff9e1b930dac815693039501bfe77`

See more details on using hashes here.

zdatasets 1.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Welcome to zdatasets

Development

Example

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes