Streaming lets users create PyTorch compatible datasets that can be streamed from cloud-based object stores
Project description
A Data Streaming Library for Efficient Neural Network Training
[Website] - [Getting Started] - [Docs] - [We're Hiring!]
👋 Welcome
Streaming is a PyTorch compatible dataset that enables users to stream training data from cloud-based object stores. Streaming can read files from local disk or from cloud-based object stores. As a drop-in replacement for your PyTorch IterableDataset class, it’s easy to get streaming:
dataloader = torch.utils.data.DataLoader(dataset=ImageStreamingDataset(remote='s3://...'))
Please check the quick start guide and user guide on how to use the Streaming Dataset.
Key Benefits
- High performance, accurate streaming of training data from cloud storage
- Efficiently train anywhere, independent of training data location
- Cloud-native, no persistent storage required
- Enhanced data security—data exists ephemerally on training cluster
🚀 Quickstart
💾 Installation
Streaming is available with Pip:
pip install mosaicml-streaming
Examples
Please check our Examples section for the end-to-end model training workflow using Streaming datasets.
📚 Documentation
Getting started guides, examples, API reference, and other useful information can be found in our docs.
💫 Contributors
We welcome any contributions, pull requests, or issues!
To start contributing, see our Contributing page.
P.S.: We're hiring!
✍️ Citation
@misc{mosaicml2022streaming,
author = {The Mosaic ML Team},
title = {streaming},
year = {2022},
howpublished = {\url{https://github.com/mosaicml/streaming/}},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mosaicml-streaming-0.1.1b0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34bfc945e88b9942e78a73da43014481acb4366d60839d962834bb7a6802702a |
|
MD5 | fe9cc2d8b8840d58c026c26f89facaa5 |
|
BLAKE2b-256 | 66a52bf0554dff812884126ee13de96d01b547af026b7403256cdd4beb685337 |
Hashes for mosaicml_streaming-0.1.1b0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0d0c4875b2f8ce2921e92c5eccbf06840dfd2551fda0dcf3b17d590eabc3128 |
|
MD5 | 6d56112d399397508550366289378bad |
|
BLAKE2b-256 | d9b76e357e8f98d925df78771d4c0534806436cbe18e37bbc559b070da0929aa |