Simple, composable standard for storing datasets as one sample per file
Project description
Files Dataset
Dead simple standard for storing/loading datasets as files. Supports TAR and ZIP archives.
pip install files-dataset
Format
A dataset folder looks something like this:
my-dataset/
meta.json
car-images/
001.jpg
002.jpg
003.jpg
train-images/
images.tar
...
meta.json:
{
"files_dataset": {
"cars": {
"archive": "car-images/*.jpg",
"num_files": 3000 // optionally specify the number of files
},
"trains": {
"archive": "train-images/images.tar",
"format": "tar",
"num_files": 10000
}
},
// you can add other stuff if you want to
}
Usage
import files_dataset as fds
ds = fds.Dataset.read('path/to/my-dataset')
num_samples = ds.len('cars', 'trains') # int | None
for x in ds.samples('inputs', 'labels'):
x['cars'] # the first car image
x['trains'] # the first train image (extracted from the TAR archive)
A common convenience to use is:
import files_dataset as fds
datasets = fds.glob('path/to/datasets/*') # list[fds.Dataset]
for x in fds.chain(datasets, 'trains', 'cars'):
...
And that's it! Simple.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
files_dataset-0.1.4.tar.gz
(3.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file files_dataset-0.1.4.tar.gz.
File metadata
- Download URL: files_dataset-0.1.4.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de8c23030a6c3dd5e5b08c2bf612309337b1fd89613d60c0aa6ece65eb658cb9
|
|
| MD5 |
379bcbe5fd498243c03d940e067d5bf8
|
|
| BLAKE2b-256 |
c237faa0fd9fdf0712935041b04432f5dde4329c5b3a0fe051ec6206e7840665
|
File details
Details for the file files_dataset-0.1.4-py3-none-any.whl.
File metadata
- Download URL: files_dataset-0.1.4-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4eea6840d57a08565e6d3f781908a9e180c47ad7da52504711910a8b6b5fa1ac
|
|
| MD5 |
7b7e07259e9138f79664e1132a828e36
|
|
| BLAKE2b-256 |
8f75c10e02c75502cf8b96d4e759405744ba7bab33dcbe90c7be806c79d285e7
|