Skip to main content

Simple, composable standard for storing datasets as one sample per file

Project description

Files Dataset

Dead simple standard for storing/loading datasets as files. Supports TAR and ZIP archives.

pip install files-dataset

Format

A dataset folder looks something like this:

my-dataset/
  meta.json
  car-images/
    001.jpg
    002.jpg
    003.jpg
  train-images/
    images.tar
  ...

meta.json:

{
  "files_dataset": {
    "cars": {
      "archive": "car-images/*.jpg",
      "num_files": 3000 // optionally specify the number of files
    },
    "trains": {
      "archive": "train-images/images.tar",
      "format": "tar",
      "num_files": 10000
    }
  },
  // you can add other stuff if you want to
}

Usage

import files_dataset as fds

ds = fds.Dataset.read('path/to/my-dataset')
num_samples = ds.len('cars', 'trains') # int | None

for x in ds.samples('inputs', 'labels'):
  x['cars'] # the first car image
  x['trains'] # the first train image (extracted from the TAR archive)

A common convenience to use is:

import files_dataset as fds

datasets = fds.glob('path/to/datasets/*') # list[fds.Dataset]
for x in fds.chain(datasets, 'trains', 'cars'):
  ...

And that's it! Simple.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

files_dataset-0.1.4.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

files_dataset-0.1.4-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file files_dataset-0.1.4.tar.gz.

File metadata

  • Download URL: files_dataset-0.1.4.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.9

File hashes

Hashes for files_dataset-0.1.4.tar.gz
Algorithm Hash digest
SHA256 de8c23030a6c3dd5e5b08c2bf612309337b1fd89613d60c0aa6ece65eb658cb9
MD5 379bcbe5fd498243c03d940e067d5bf8
BLAKE2b-256 c237faa0fd9fdf0712935041b04432f5dde4329c5b3a0fe051ec6206e7840665

See more details on using hashes here.

File details

Details for the file files_dataset-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for files_dataset-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 4eea6840d57a08565e6d3f781908a9e180c47ad7da52504711910a8b6b5fa1ac
MD5 7b7e07259e9138f79664e1132a828e36
BLAKE2b-256 8f75c10e02c75502cf8b96d4e759405744ba7bab33dcbe90c7be806c79d285e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page