Unified storage framework for machine learning datasets
Project description
Space: Unified Storage for Machine Learning
Unify data in your entire machine learning lifecycle with Space, a comprehensive storage solution that seamlessly handles data from ingestion to training.
Key Features:
- Ground Truth Database
- Store and manage multimodal data in open source file formats, row or columnar, local or in cloud.
- Ingest from various sources, including ML datasets, files, and labeling tools.
- Support data manipulation (append, insert, update, delete) and version control.
- OLAP Database and Lakehouse
- Iceberg style open table format.
- Optimized for unstructued data via reference operations.
- Quickly analyze data using SQL engines like DuckDB.
- Distributed Data Processing Pipelines
- Integrate with processing frameworks like Ray for efficient data transformation.
- Store processed results as Materialized Views (MVs); incrementally update MVs when the source is changed.
- Seamless Training Framework Integration
- Access Space datasets and MVs directly via random access interfaces.
- Convert to popular ML dataset formats (e.g., TFDS, HuggingFace, Ray).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
space-datasets-0.0.8.tar.gz
(122.7 kB
view hashes)
Built Distribution
space_datasets-0.0.8-py3-none-any.whl
(179.4 kB
view hashes)
Close
Hashes for space_datasets-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a19ffadb9ff27977308e2d933f6ed8325b57577a7f29925d62ab65b43eb8e71 |
|
MD5 | 150fa95100608cb59769e5cb24ca61a5 |
|
BLAKE2b-256 | 6cfc1245af7b159a84fd3b2b356d9a408a58668cd3d66e7b0958aa125d38f0ed |