Skip to main content

Unified storage framework for machine learning datasets

Project description

Space: Unified Storage for Machine Learning

Unify data in your entire machine learning lifecycle with Space, a comprehensive storage solution that seamlessly handles data from ingestion to training.

Key Features:

  • Ground Truth Database
    • Store and manage data in open source file formats, locally or in the cloud.
    • Ingest from various sources, including ML datasets, files, and labeling tools.
    • Support data manipulation (append, insert, update, delete) and version control.
  • OLAP Database and Lakehouse
  • Distributed Data Processing Pipelines
    • Integrate with processing frameworks like Ray for efficient data transformation.
    • Store processed results as Materialized Views (MVs); incrementally update MVs when the source is changed.
  • Seamless Training Framework Integration
    • Access Space datasets and MVs directly via random access interfaces.
    • Convert to popular ML dataset formats (e.g., TFDS, HuggingFace, Ray).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

space-datasets-0.0.4.tar.gz (117.0 kB view hashes)

Uploaded Source

Built Distribution

space_datasets-0.0.4-py3-none-any.whl (172.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page