Streaming lets users create PyTorch compatible datasets that can be streamed from cloud-based object stores
Project description
A Data Streaming Library for Efficient Neural Network Training
[Website] - [Getting Started] - [Docs] - [We're Hiring!]
👋 Welcome
Streaming is a PyTorch compatible dataset that enables users to stream training data from cloud-based object stores. Streaming can read files from local disk or from cloud-based object stores. As a drop-in replacement for your PyTorch IterableDataset class, it’s easy to get streaming:
dataloader = torch.utils.data.DataLoader(dataset=ImageStreamingDataset(remote='s3://...'))
Please check the quick start guide and user guide on how to use the Streaming Dataset.
Key Benefits
- High performance, accurate streaming of training data from cloud storage
- Efficiently train anywhere, independent of training data location
- Cloud-native, no persistent storage required
- Enhanced data security—data exists ephemerally on training cluster
🚀 Quickstart
💾 Installation
Streaming is available with Pip:
pip install mosaicml-streaming
Examples
Please check our Examples section for the end-to-end model training workflow using Streaming datasets.
📚 Documentation
Getting started guides, examples, API reference, and other useful information can be found in our docs.
💫 Contributors
We welcome any contributions, pull requests, or issues!
To start contributing, see our Contributing page.
P.S.: We're hiring!
✍️ Citation
@misc{mosaicml2022streaming,
author = {The Mosaic ML Team},
title = {streaming},
year = {2022},
howpublished = {\url{https://github.com/mosaicml/streaming/}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mosaicml_streaming-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ec7877adc4ae2d2f07c7e2de4e7d4f636fa71b7798f5c1de18ac5595cc89b65 |
|
MD5 | c07a802fbbe0bdcf0fb25dc514ed8ad2 |
|
BLAKE2b-256 | 46ee948d815b0ff45209df2ec7d10117c1dd1bc3068a848008fa9bc277631157 |