A lightweight wrapper around HuggingFace datasets.

These details have not been verified by PyPI

Project links

Project description

dumb-datasets

A lightweight wrapper around HuggingFace datasets.

Features

🔄 Complete wrapper around HuggingFace datasets with extended functionality
🚀 Cached dataset loading with smart retries and error handling
🛠️ Rich helper functions for common dataset operations
📊 Streamlined data processing pipelines with fluent API
🔍 Type validation with Pydantic models
🔌 Extension points via hooks and adapters
📋 Feature definition and inference utilities

Installation

pip install dumb-datasets

Or with Poetry:

poetry add dumb-datasets

Quick Usage

from dumb_datasets import load_dataset, Features, Value

# Load a dataset with automatic caching and error handling
dataset = load_dataset("squad")

# Get dataset info
info = dataset.info()
print(f"Dataset has {info['num_rows']} rows with features: {info['features']}")

# Apply transformations with a fluent API
processed = (dataset
    .filter(lambda x: len(x["question"]) > 10)
    .map_columns(lambda x: x.lower(), ["question", "context"])
    .shuffle(seed=42))

# Define custom features
features = Features({
    "text": Value("string"),
    "label": Value("int64")
})

# Use session for consistent settings
from dumb_datasets import Session
session = Session(cache_dir="/tmp/datasets", api_token="YOUR_HF_TOKEN")
new_dataset = session.get_dataset("glue", name="mnli")

Advanced Usage

from dumb_datasets import (
    Dataset,
    ClassLabel,
    infer_features_from_dict,
    save_dataset_sample
)

# Infer features from examples
example = {"text": "Hello world", "score": 0.95, "labels": ["positive", "greeting"]}
features = infer_features_from_dict(example)

# Save samples for inspection
save_dataset_sample(dataset, "samples.json", num_examples=5)

# Register an adapter for custom dataset loading
from dumb_datasets import register_adapter
register_adapter("my_format", my_custom_loader_function)

# Use hooks for custom processing
from dumb_datasets import add_hook
add_hook("after_load", lambda ds: print(f"Loaded dataset with {len(ds)} examples"))

Getting started with your project

First, create a repository on GitHub with the same name as this project, and then run the following commands:

git init -b master
git add .
git commit -m "init .gitignore"
git remote add origin git@github.com:nlile/dumb-datasets.git
git push -u origin master

Finally, install the environment and the pre-commit hooks with

make install

You are now ready to start development on your project! The CI/CD pipeline will be triggered when you open a pull request, merge to master, or when you create a new release.

To finalize the set-up for publishing to PyPI or Artifactory, see here. For activating the automatic documentation with MkDocs, see here. To enable the code coverage reports, see here.

Releasing a new version

Create an API Token on PyPI.
Add the API Token to your projects secrets with the name PYPI_TOKEN by visiting this page.
Create a new release on Github.
Create a new tag in the form *.*.*.
For more details, see here.

Repository initiated with nlile/cookiecutter-poetry.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.2

Mar 25, 2025

This version

0.0.1a0 pre-release

Mar 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dumb_datasets-0.0.1a0.tar.gz (13.1 kB view details)

Uploaded Mar 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dumb_datasets-0.0.1a0-py3-none-any.whl (14.9 kB view details)

Uploaded Mar 24, 2025 Python 3

File details

Details for the file dumb_datasets-0.0.1a0.tar.gz.

File metadata

Download URL: dumb_datasets-0.0.1a0.tar.gz
Upload date: Mar 24, 2025
Size: 13.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.11.11 Linux/6.8.0-1021-azure

File hashes

Hashes for dumb_datasets-0.0.1a0.tar.gz
Algorithm	Hash digest
SHA256	`a6b70af7ef95ba385f5599733620d8be73cfb83812b2eb50baee00168d8f5907`
MD5	`727a963dc58a022cf0321e136b008af1`
BLAKE2b-256	`0b37139763c51a3574abbb5df7a5dc7d01b10b0d709c18a697d9694fd72a2dbd`

See more details on using hashes here.

File details

Details for the file dumb_datasets-0.0.1a0-py3-none-any.whl.

File metadata

Download URL: dumb_datasets-0.0.1a0-py3-none-any.whl
Upload date: Mar 24, 2025
Size: 14.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.11.11 Linux/6.8.0-1021-azure

File hashes

Hashes for dumb_datasets-0.0.1a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c690b145959abdf9a00cc76d8a34166f3af8a5b92f9197e2b6a1e109ead786c0`
MD5	`e3f7adb506a7c70e551543e577b754dc`
BLAKE2b-256	`76426c958943e51c3784777169f4552896a55c1f04d000977c8a280d99c39cec`

See more details on using hashes here.

dumb-datasets 0.0.1a0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dumb-datasets

Features

Installation

Quick Usage

Advanced Usage

Getting started with your project

Releasing a new version

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes