Skip to main content

Streaming lets users create PyTorch compatible datasets that can be streamed from cloud-based object stores

Reason this release was yanked:

Hang bug when torch.distributed isn't initialized

Project description


A Data Streaming Library for Efficient Neural Network Training

[Website] - [Getting Started] - [Docs] - [We're Hiring!]

PyPi Version PyPi Package Version Unit test PyPi Downloads Documentation Chat @ Slack License


👋 Welcome

Streaming is a PyTorch compatible dataset that enables users to stream training data from cloud-based object stores. Streaming can read files from local disk or from cloud-based object stores. As a drop-in replacement for your PyTorch IterableDataset class, it’s easy to get streaming:

dataloader = torch.utils.data.DataLoader(dataset=ImageStreamingDataset(remote='s3://...'))

Please check the quick start guide and user guide on how to use the Streaming Dataset.

Key Benefits

  • High performance, accurate streaming of training data from cloud storage
  • Efficiently train anywhere, independent of training data location
  • Cloud-native, no persistent storage required
  • Enhanced data security—data exists ephemerally on training cluster

🚀 Quickstart

💾 Installation

Streaming is available with Pip:

pip install mosaicml-streaming

Examples

Please check our Examples section for the end-to-end model training workflow using Streaming datasets.

📚 Documentation

Getting started guides, examples, API reference, and other useful information can be found in our docs.

💫 Contributors

We welcome any contributions, pull requests, or issues!

To start contributing, see our Contributing page.

P.S.: We're hiring!

✍️ Citation

@misc{mosaicml2022streaming,
    author = {The Mosaic ML Team},
    title = {streaming},
    year = {2022},
    howpublished = {\url{https://github.com/mosaicml/streaming/}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mosaicml-streaming-0.2.4.tar.gz (94.8 kB view details)

Uploaded Source

Built Distribution

mosaicml_streaming-0.2.4-py3-none-any.whl (116.0 kB view details)

Uploaded Python 3

File details

Details for the file mosaicml-streaming-0.2.4.tar.gz.

File metadata

  • Download URL: mosaicml-streaming-0.2.4.tar.gz
  • Upload date:
  • Size: 94.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for mosaicml-streaming-0.2.4.tar.gz
Algorithm Hash digest
SHA256 0e856e4e1fbb82875e5237e370e4ffd4f471eb71310c6cf8d5ebade4ab690d7d
MD5 e32f6894af1f50ab98be537f215a3a28
BLAKE2b-256 6161c47190fd5dd3837829b7b29cba9b6ebad5655ebf22fb0cd5fc26f8d817bc

See more details on using hashes here.

File details

Details for the file mosaicml_streaming-0.2.4-py3-none-any.whl.

File metadata

File hashes

Hashes for mosaicml_streaming-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 99e4617b6d37bcebc73b7c2889ead18cd55d5721d0c2d7c97ae73168ed769971
MD5 cdefe175e1781f52f35147d2d4453e5c
BLAKE2b-256 3a3f550dde84c83969be795065eefc6c345fb337392eec54875400be0fa965e6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page