Simple, composable standard for storing datasets as one sample per file
Project description
Files Dataset
Dead simple standard for storing/loading datasets as files. Supports TAR and ZIP archives.
pip install files-dataset
Format
A dataset folder looks something like this:
my-dataset/
meta.json
car-images/
001.jpg
002.jpg
003.jpg
train-images/
images.tar
...
meta.json
:
{
"files_dataset": {
"cars": {
"archive": "car-images/*.jpg",
"num_files": 3000 // optionally specify the number of files
},
"trains": {
"archive": "train-images/images.tar",
"format": "tar",
"num_files": 10000
}
},
// you can add other stuff if you want to
}
Usage
import files_dataset as fds
ds = fds.Dataset.read('path/to/my-dataset')
num_samples = ds.len('cars', 'trains') # int | None
for x in ds.samples('inputs', 'labels'):
x['cars'] # the first car image
x['trains'] # the first train image (extracted from the TAR archive)
A common convenience to use is:
import files_dataset as fds
datasets = fds.glob('path/to/datasets/*') # list[fds.Dataset]
for x in fds.chain(datasets, 'trains', 'cars'):
...
And that's it! Simple.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
files_dataset-0.1.3.tar.gz
(3.8 kB
view hashes)
Built Distribution
Close
Hashes for files_dataset-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b040afa21da49c36f18f50aa734e288a16621491915cf4a997975550838d7ad |
|
MD5 | feada4521af6003a304e12aaea2da8c7 |
|
BLAKE2b-256 | 415a804b9a69fbd2895911cab1b59a4e60f4bd5ba970c7ed9fe06f3f0d7c0d05 |