Skip to main content

Streaming lets users create PyTorch compatible datasets that can be streamed from cloud-based object stores

Project description


A Data Streaming Library for Efficient Neural Network Training

[Website] - [Getting Started] - [Docs] - [We're Hiring!]

PyPi Version PyPi Package Version Unit test PyPi Downloads Documentation Chat @ Slack License


👋 Welcome

Streaming is a PyTorch compatible dataset that enables users to stream training data from cloud-based object stores. Streaming can read files from local disk or from cloud-based object stores. As a drop-in replacement for your PyTorch IterableDataset class, it’s easy to get streaming:

dataloader = torch.utils.data.DataLoader(dataset=ImageStreamingDataset(remote='s3://...'))

Please check the quick start guide and user guide on how to use the Streaming Dataset.

Key Benefits

  • High performance, accurate streaming of training data from cloud storage
  • Efficiently train anywhere, independent of training data location
  • Cloud-native, no persistent storage required
  • Enhanced data security—data exists ephemerally on training cluster

🚀 Quickstart

💾 Installation

Streaming is available with Pip:

pip install mosaicml-streaming

Examples

Please check our Examples section for the end-to-end model training workflow using Streaming datasets.

📚 Documentation

Getting started guides, examples, API reference, and other useful information can be found in our docs.

💫 Contributors

We welcome any contributions, pull requests, or issues!

To start contributing, see our Contributing page.

P.S.: We're hiring!

✍️ Citation

@misc{mosaicml2022streaming,
    author = {The Mosaic ML Team},
    title = {streaming},
    year = {2022},
    howpublished = {\url{https://github.com/mosaicml/streaming/}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mosaicml-streaming-0.2.1.tar.gz (75.8 kB view details)

Uploaded Source

Built Distribution

mosaicml_streaming-0.2.1-py3-none-any.whl (107.1 kB view details)

Uploaded Python 3

File details

Details for the file mosaicml-streaming-0.2.1.tar.gz.

File metadata

  • Download URL: mosaicml-streaming-0.2.1.tar.gz
  • Upload date:
  • Size: 75.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for mosaicml-streaming-0.2.1.tar.gz
Algorithm Hash digest
SHA256 a7db3e06b09b939ff173cbcb7a1d8e8ec5911f4cd9bd0bd007a07c625438b659
MD5 40d33115c8bb4d0f45a1712afc04a67f
BLAKE2b-256 b6d334bd2a49cd55672574345f4996bf643a3ba0ffa8a0b68f720b79b8e8f712

See more details on using hashes here.

File details

Details for the file mosaicml_streaming-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for mosaicml_streaming-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b29408450d72094fa5bb659f3ffddc2dd6daa080b3f1e39691b1e96dda5f9b6b
MD5 3f9a1f712c99db0b5bfdd96d8297ce17
BLAKE2b-256 563ecdbb6cdf75f6929fa280afa31b4952dadec2970e3bf37ebfb4ce0b232080

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page