Simple, composable standard for storing datasets as one sample per file
Project description
Files Dataset
Dead simple standard for storing/loading datasets as files. Supports TAR and ZIP archives.
pip install files-dataset
Format
A dataset folder looks something like this:
my-dataset/
meta.json
car-images/
001.jpg
002.jpg
003.jpg
train-images/
images.tar
...
meta.json
:
{
"files_dataset": {
"cars": {
"archive": "car-images/*.jpg",
"num_files": 3000 // optionally specify the number of files
},
"trains": {
"archive": "train-images/images.tar",
"format": "tar",
"num_files": 10000
}
},
// you can add other stuff if you want to
}
Usage
import files_dataset as fds
ds = fds.Dataset.read('path/to/my-dataset')
num_samples = ds.len('cars', 'trains') # int | None
for x in ds.samples('inputs', 'labels'):
x['cars'] # the first car image
x['trains'] # the first train image (extracted from the TAR archive)
A common convenience to use is:
import files_dataset as fds
datasets = fds.glob('path/to/datasets/*') # list[fds.Dataset]
for x in fds.chain(datasets, 'trains', 'cars'):
...
And that's it! Simple.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
files_dataset-0.1.4.tar.gz
(3.9 kB
view details)
Built Distribution
File details
Details for the file files_dataset-0.1.4.tar.gz
.
File metadata
- Download URL: files_dataset-0.1.4.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | de8c23030a6c3dd5e5b08c2bf612309337b1fd89613d60c0aa6ece65eb658cb9 |
|
MD5 | 379bcbe5fd498243c03d940e067d5bf8 |
|
BLAKE2b-256 | c237faa0fd9fdf0712935041b04432f5dde4329c5b3a0fe051ec6206e7840665 |
File details
Details for the file files_dataset-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: files_dataset-0.1.4-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4eea6840d57a08565e6d3f781908a9e180c47ad7da52504711910a8b6b5fa1ac |
|
MD5 | 7b7e07259e9138f79664e1132a828e36 |
|
BLAKE2b-256 | 8f75c10e02c75502cf8b96d4e759405744ba7bab33dcbe90c7be806c79d285e7 |