Dataset SDK for consistent read/write [batch, online, streaming] data.
Project description
Welcome to @datasets
TODO
import pandas as pd
from metaflow import FlowSpec, Parameter, current, step
from datasets import DatasetType, Mode
# Can also invoke from CLI:
# > python datasets/tutorials/0_hello_dataset_flow.py run \
# --hello_dataset '{"name": "foo", "partition_by": "region", "mode": "Write"}'
class HelloDatasetFlow(FlowSpec):
hello_dataset = Parameter(
"hello_dataset",
default=dict(name="HelloDataset", partition_by="region", mode=Mode.Write),
type=DatasetType,
)
@step
def start(self):
df = pd.DataFrame({"region": ["A", "A", "A", "B", "B", "B"], "zpid": [1, 2, 3, 4, 5, 6]})
print("saving df: \n", df.to_string(index=False))
# Example of writing to a dataset
print(f"{self.hello_dataset.program_name=}")
self.hello_dataset.write(df)
self.next(self.end)
@step
def end(self):
print(f"I have dataset \n{self.hello_dataset=}")
# hello_dataset to_pandas()
df: pd.DataFrame = self.hello_dataset.to_pandas(run_id=current.run_id)
print("self.hello_dataset.to_pandas():\n", df.to_string(index=False))
# save this as an output dataset
self.output_dataset = self.hello_dataset
if __name__ == "__main__":
HelloDatasetFlow()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
zdatasets-0.0.10.tar.gz
(47.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zdatasets-0.0.10.tar.gz.
File metadata
- Download URL: zdatasets-0.0.10.tar.gz
- Upload date:
- Size: 47.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6783e0a1c261e1dccdddb322c1d54b56906cc8339c031ae33904e2e453fba21b
|
|
| MD5 |
123e65a1a7066ce64b6c5a0eef5a2614
|
|
| BLAKE2b-256 |
472d7af7748baa8841c10474d79ac0257592b8bffaf2541f5d594abfb8e9a804
|
File details
Details for the file zdatasets-0.0.10-py3-none-any.whl.
File metadata
- Download URL: zdatasets-0.0.10-py3-none-any.whl
- Upload date:
- Size: 75.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
222ce99db0feb30162e0b549fe85edffe016e084b568e8b1fa74c2e859fadff3
|
|
| MD5 |
913ec7af9c36244f326576fa012a5469
|
|
| BLAKE2b-256 |
a83d5ffd15492b3a21f56c4192ed9eba8efe975732c1fc2cafc29bd8c9c62d01
|