Streaming lets users create PyTorch compatible datasets that can be streamed from cloud-based object stores
Reason this release was yanked:
Hang bug when torch.distributed isn't initialized
Project description
A Data Streaming Library for Efficient Neural Network Training
[Website] - [Getting Started] - [Docs] - [We're Hiring!]
👋 Welcome
Streaming is a PyTorch compatible dataset that enables users to stream training data from cloud-based object stores. Streaming can read files from local disk or from cloud-based object stores. As a drop-in replacement for your PyTorch IterableDataset class, it’s easy to get streaming:
dataloader = torch.utils.data.DataLoader(dataset=ImageStreamingDataset(remote='s3://...'))
Please check the quick start guide and user guide on how to use the Streaming Dataset.
Key Benefits
- High performance, accurate streaming of training data from cloud storage
- Efficiently train anywhere, independent of training data location
- Cloud-native, no persistent storage required
- Enhanced data security—data exists ephemerally on training cluster
🚀 Quickstart
💾 Installation
Streaming is available with Pip:
pip install mosaicml-streaming
Examples
Please check our Examples section for the end-to-end model training workflow using Streaming datasets.
📚 Documentation
Getting started guides, examples, API reference, and other useful information can be found in our docs.
💫 Contributors
We welcome any contributions, pull requests, or issues!
To start contributing, see our Contributing page.
P.S.: We're hiring!
✍️ Citation
@misc{mosaicml2022streaming,
author = {The Mosaic ML Team},
title = {streaming},
year = {2022},
howpublished = {\url{https://github.com/mosaicml/streaming/}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mosaicml-streaming-0.2.4.tar.gz
.
File metadata
- Download URL: mosaicml-streaming-0.2.4.tar.gz
- Upload date:
- Size: 94.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e856e4e1fbb82875e5237e370e4ffd4f471eb71310c6cf8d5ebade4ab690d7d |
|
MD5 | e32f6894af1f50ab98be537f215a3a28 |
|
BLAKE2b-256 | 6161c47190fd5dd3837829b7b29cba9b6ebad5655ebf22fb0cd5fc26f8d817bc |
File details
Details for the file mosaicml_streaming-0.2.4-py3-none-any.whl
.
File metadata
- Download URL: mosaicml_streaming-0.2.4-py3-none-any.whl
- Upload date:
- Size: 116.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99e4617b6d37bcebc73b7c2889ead18cd55d5721d0c2d7c97ae73168ed769971 |
|
MD5 | cdefe175e1781f52f35147d2d4453e5c |
|
BLAKE2b-256 | 3a3f550dde84c83969be795065eefc6c345fb337392eec54875400be0fa965e6 |