Simple, composable standard for storing datasets as one sample per file
Project description
Files Dataset
Dead simple standard for storing/loading datasets as files. Supports TAR and ZIP archives.
pip install files-dataset
Format
A dataset folder looks something like this:
my-dataset/
meta.json
car-images/
001.jpg
002.jpg
003.jpg
train-images/
images.tar
...
meta.json
:
{
"files_dataset": {
"cars": {
"archive": "car-images/*.jpg",
"num_files": 3000 // optionally specify the number of files
},
"trains": {
"archive": "train-images/images.tar",
"format": "tar",
"num_files": 10000
}
},
// you can add other stuff if you want to
}
Usage
import files_dataset as fds
ds = fds.Dataset.read('path/to/my-dataset')
num_samples = ds.len('cars', 'trains') # int | None
for x in ds.samples('inputs', 'labels'):
x['cars'] # the first car image
x['trains'] # the first train image (extracted from the TAR archive)
A common convenience to use is:
import files_dataset as fds
datasets = fds.glob('path/to/datasets/*') # list[fds.Dataset]
for x in fds.chain(datasets, 'trains', 'cars'):
...
And that's it! Simple.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
files_dataset-0.1.1.tar.gz
(4.2 kB
view hashes)
Built Distribution
Close
Hashes for files_dataset-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 318cbedd54f08f05341b111f798dd8914e33d306f039f29b083594abc17d4c52 |
|
MD5 | e617420b37d976b454c4fe5ba7c0e769 |
|
BLAKE2b-256 | e9b6abb3d4fb6b3bdaaf2a9f381f3f32e40e1d8d8eeaba994e34c999a402c71d |