Skip to main content

Streaming lets users create PyTorch compatible datasets that can be streamed from cloud-based object stores

Project description


A Data Streaming Library for Efficient Neural Network Training

[Website] - [Getting Started] - [Docs] - [We're Hiring!]

PyPi Version PyPi Package Version Unit test PyPi Downloads Documentation Chat @ Slack License


👋 Welcome

Streaming is a PyTorch compatible dataset that enables users to stream training data from cloud-based object stores. Streaming can read files from local disk or from cloud-based object stores. As a drop-in replacement for your PyTorch IterableDataset class, it’s easy to get streaming:

dataloader = torch.utils.data.DataLoader(dataset=ImageStreamingDataset(remote='s3://...'))

Please check the quick start guide and user guide on how to use the Streaming Dataset.

Key Benefits

  • High performance, accurate streaming of training data from cloud storage
  • Efficiently train anywhere, independent of training data location
  • Cloud-native, no persistent storage required
  • Enhanced data security—data exists ephemerally on training cluster

🚀 Quickstart

💾 Installation

Streaming is available with Pip:

pip install mosaicml-streaming

Examples

Please check our Examples section for the end-to-end model training workflow using Streaming datasets.

📚 Documentation

Getting started guides, examples, API reference, and other useful information can be found in our docs.

💫 Contributors

We welcome any contributions, pull requests, or issues!

To start contributing, see our Contributing page.

P.S.: We're hiring!

✍️ Citation

@misc{mosaicml2022streaming,
    author = {The Mosaic ML Team},
    title = {streaming},
    year = {2022},
    howpublished = {\url{https://github.com/mosaicml/streaming/}},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mosaicml-streaming-0.2.2.tar.gz (76.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mosaicml_streaming-0.2.2-py3-none-any.whl (108.4 kB view details)

Uploaded Python 3

File details

Details for the file mosaicml-streaming-0.2.2.tar.gz.

File metadata

  • Download URL: mosaicml-streaming-0.2.2.tar.gz
  • Upload date:
  • Size: 76.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for mosaicml-streaming-0.2.2.tar.gz
Algorithm Hash digest
SHA256 a11357945624ebbc3c4c84d3b8fc13f6521c4d4170f97872a687b1ae381e9dfa
MD5 b6c68aaf48763047cc15be9165eb00fc
BLAKE2b-256 69c017c3c295fdb2d43d14efe763ef0ce85d1b7782146043a7823c07dcad4c62

See more details on using hashes here.

File details

Details for the file mosaicml_streaming-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for mosaicml_streaming-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 be183f40d5353a6cfc4c716a920243cea6696a896d0874c864de39040105ba80
MD5 8c40e8a07ce088317ddfd6d582521fab
BLAKE2b-256 a5114b41c13d9d492c6105ed153662c4c18598e341cac5ad08f7fd2b7264ea3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page