Streaming lets users create PyTorch compatible datasets that can be streamed from cloud-based object stores
Reason this release was yanked:
Hang bug when torch.distributed isn't initialized
Project description
A Data Streaming Library for Efficient Neural Network Training
[Website] - [Getting Started] - [Docs] - [We're Hiring!]
👋 Welcome
Streaming is a PyTorch compatible dataset that enables users to stream training data from cloud-based object stores. Streaming can read files from local disk or from cloud-based object stores. As a drop-in replacement for your PyTorch IterableDataset class, it’s easy to get streaming:
dataloader = torch.utils.data.DataLoader(dataset=ImageStreamingDataset(remote='s3://...'))
Please check the quick start guide and user guide on how to use the Streaming Dataset.
Key Benefits
- High performance, accurate streaming of training data from cloud storage
- Efficiently train anywhere, independent of training data location
- Cloud-native, no persistent storage required
- Enhanced data security—data exists ephemerally on training cluster
🚀 Quickstart
💾 Installation
Streaming is available with Pip:
pip install mosaicml-streaming
Examples
Please check our Examples section for the end-to-end model training workflow using Streaming datasets.
📚 Documentation
Getting started guides, examples, API reference, and other useful information can be found in our docs.
💫 Contributors
We welcome any contributions, pull requests, or issues!
To start contributing, see our Contributing page.
P.S.: We're hiring!
✍️ Citation
@misc{mosaicml2022streaming,
author = {The Mosaic ML Team},
title = {streaming},
year = {2022},
howpublished = {\url{https://github.com/mosaicml/streaming/}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mosaicml_streaming-0.2.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99e4617b6d37bcebc73b7c2889ead18cd55d5721d0c2d7c97ae73168ed769971 |
|
MD5 | cdefe175e1781f52f35147d2d4453e5c |
|
BLAKE2b-256 | 3a3f550dde84c83969be795065eefc6c345fb337392eec54875400be0fa965e6 |