datadings

datadings is a collection of tools to prepare datasets for machine learning. It's easy to use, space-efficient, and blazingly fast.

These details have not been verified by PyPI

Project links

Project description

datadings is a collection of tools to prepare datasets for machine learning, based on two simple principles

Datasets are collections of individual data samples.

Each sample is a dictionary with descriptive keys.

For supervised training with images samples are dictionaries like this:

{"key": unique_key, "image": imagedata, "label": label}

msgpack is used as an efficient storage format for most supported datasets.

Check out the documentation for more details.

Supported datasets

Dataset	Short Description
ADE20k	Scene Parsing, Segmentation
ANP460	own Eye-Tracking dataset (Jalpa)
CAMVID	Motion-based Segmentation
CAT2000	MIT Saliency
CIFAR	32x32 color image classification with 10/100 classes
Cityscapes	Segmentation, Semantic understanding of urban street scenes
Coutrot1	Eye-Tracking, Saliency
FIGRIMFixation	Eye-Tracking, Saliency
ILSVRC2012	Imagenet Large Scale Visual Recognition Challenge
ImageNet21k	A superset of ILSVRC2012 with 11 M images for 10450 classes
InriaBuildings	Inria Areal Image Labeling Dataset (Buildings), Segmentation, Remote Sensing
MIT1003	Eye-Tracking, Saliency, Learning to predict where humans look
MIT300	Eye-Tracking, Saliency
Places2017	MIT Places, Scene Recognition
Places365	MIT Places365, Scene Recognition
RIT18	High-Res Multispectral Semantic Segmentation, Remote Sensing
SALICON2015	Saliency in Context, Eye-Tracking
SALICON2017	Saliency in Context, Eye-Tracking
VOC2012	Pascal Visual Object Classes Challenge
Vaihingen	Remote Sensing, Semantic Object Classification, Segmentation
YFCC100m	Yahoo Flickr Creative Commons 100 M pics

Command line tools

datadings-write creates new dataset files.
datadings-cat prints the (abbreviated) contents of a dataset file.
datadings-shuffle shuffles an existing dataset file.
datadings-merge merges two or more dataset files.
datadings-split splits a dataset file into two or more subsets.
datadings-bench runs some basic read performance benchmarks.

Basic usage

Each dataset defines modules to read and write in the datadings.sets package. For most datasets the reading module only contains additional metadata like class labels and distributions.

Let’s consider the MIT1003 dataset as an example.

MIT1003_write is an executable that creates dataset files. It can be called directly or through datadings-write. Three files will be written:

MIT1003.msgpack contains sample data
MIT1003.msgpack.index contains index for random access
MIT1003.msgpack.md5 contains MD5 hashes of both files

Reading all samples sequentially, using a MsgpackReader as a context manager:

with MsgpackReader('MIT1003.msgpack') as reader:
    for sample in reader:
        [do dataset things]

This standard iterator returns dictionaries. Use the rawiter() method to get samples as messagepack encoded bytes instead.

Reading specific samples:

reader.seek_key('i14020903.jpeg')
print(reader.next()['key'])
reader.seek_index(100)
print(reader.next()['key'])

Reading samples as raw bytes:

raw = reader.rawnext()
for raw in reader.rawiter():
    print(type(raw), len(raw))

Number of samples:

print(len(reader))

You can also change the order and selection of iterated samples with augments. For example, to randomize the order of samples, wrap the reader in a Shuffler:

from datadings.reader import Shuffler
with Shuffler(MsgpackReader('MIT1003.msgpack')) as reader:
    for sample in reader:
        # do dataset things, but in random order!

A common use case is to iterate over the whole dataset multiple times. This can be done with the Cycler:

from datadings.reader import Cycler
with Cycler(MsgpackReader('MIT1003.msgpack')) as reader:
    for sample in reader:
        # do dataset things, but FOREVER!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.4.7

May 28, 2025

3.4.6

Jul 6, 2022

3.4.5

May 6, 2022

3.4.4

Feb 9, 2022

3.4.3

Feb 7, 2022

3.4.2

Nov 25, 2021

3.4.0

Sep 8, 2021

3.3.0

Jul 13, 2021

3.2.0

Jun 22, 2021

3.1.0

Mar 15, 2021

3.0.2

Nov 2, 2020

3.0.1

Jun 22, 2020

3.0.0

Jun 19, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datadings-3.4.7-py3-none-any.whl (2.9 MB view details)

Uploaded May 28, 2025 Python 3

File details

Details for the file datadings-3.4.7-py3-none-any.whl.

File metadata

Download URL: datadings-3.4.7-py3-none-any.whl
Upload date: May 28, 2025
Size: 2.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for datadings-3.4.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3602f3581650ac55a2f7cf722a72786284441846907fa244561dc9924edacd19`
MD5	`33bdf69f65543590b2f4cb036b7ea979`
BLAKE2b-256	`80182e46a7241c4b3ca653c88b607ca129ff9fae23625f09c25ff73d4d90190c`

See more details on using hashes here.

datadings 3.4.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Supported datasets

Command line tools

Basic usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes