Skip to main content

Add your description here

Project description

prt-datasets

prt-datasets is a small collection of synthetic and common example datasets packaged as PyTorch Datasets and Lightning DataModules. It provides utilities and ready-to-use DataModules for common examples used in experiments and tutorials such as MNIST (classification) and synthetic regression datasets (circle, cubic, thermistor). The goal of this project is to make it easy to prototype training and uncertainty estimation workflows with minimal setup.

Features

  • Lightweight PyTorch Dataset implementations for common toy problems
  • Lightning DataModule wrappers for easy integration with PyTorch Lightning
  • Built-in examples: MNIST (wrapper), Circle, Cubic, Thermistor

Installation

Requires Python 3.11 or later. The project declares the following runtime dependencies in pyproject.toml:

  • lightning
  • numpy
  • requests
  • torch

To install from source (editable) with pip and the dev/test extras, run:

python -m pip install -e .[dev]

Or install the package normally:

python -m pip install .

If you only want runtime dependencies, install them directly:

python -m pip install lightning numpy requests torch

Quick examples

Below are short examples showing how to use DataModules and Datasets in this repository.

Note: the package exposes modules under prt_datasets. Import paths shown assume the package is installed or the repository root is on PYTHONPATH.

Circle (regression)

The CircleDataModule creates a synthetic 2D circle dataset and exposes train/val/test dataloaders.

from prt_datasets.regression.circle import CircleDataModule

dm = CircleDataModule(batch_size=128, num_workers=4, seed=0)
dm.prepare_data()
dm.setup()

train_loader = dm.train_dataloader()
for x, y in train_loader:
	# x: angle values, y: 2D coordinates on noisy circle
	break

Cubic (regression)

The CubicDataModule provides samples of the function y = x^3 + noise with separate train/test ranges so you can experiment with interpolation/epistemic uncertainty.

from prt_datasets.regression.cubic import CubicDataModule

dm = CubicDataModule(batch_size=64, num_workers=4, seed=42)
dm.setup()
loader = dm.train_dataloader()
for x, y in loader:
	# x, y are tensors shaped (B, 1)
	break

MNIST (classification)

MNISTDataModule is a thin wrapper around torchvision.datasets.MNIST. It normalizes data to the standard MNIST mean/std and provides Lightning DataModule loaders.

from prt_datasets.classification.mnist import MNISTDataModule

dm = MNISTDataModule(root='data', batch_size=64)
dm.prepare_data()
dm.setup()
train_loader = dm.train_dataloader()
for imgs, labels in train_loader:
	break

API overview

  • prt_datasets.classification.MNISTDataset, MNISTDataModule
  • prt_datasets.regression.CircleDataset, CircleDataModule
  • prt_datasets.regression.CubicDataset, CubicDataModule
  • prt_datasets.regression.ThermistorDataset, ThermistorModel

Refer to the docstrings in the source files for parameter details and behaviors.

Tests

This repository uses pytest for tests. To run the test suite:

python -m pip install -e .[dev]
pytest -q

There are tests under tests/ that exercise basic dataset behaviors.

Contributing

Contributions are welcome. A few guidelines:

  • Open an issue to discuss larger changes before implementing them.
  • Keep changes small and focused. Add tests for new functionality.
  • Follow the repository style and type annotations where present.

License

This project is provided under the terms of the license in LICENSE.md.

Maintainer

Gavin Strunk

If you spot mistakes or want more example datasets, file an issue or send a PR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prt_datasets-0.1.1.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prt_datasets-0.1.1-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file prt_datasets-0.1.1.tar.gz.

File metadata

  • Download URL: prt_datasets-0.1.1.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for prt_datasets-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fd23bc65c35f886c3e0e774e237f66ee44216dd0022d814cd85b0df1f7679164
MD5 9c6a3728f4795f2487b1954bef5d7d06
BLAKE2b-256 ffff2c6a5b8615637fea9fb00fac102729008246ef65696b5d9be91302d98162

See more details on using hashes here.

File details

Details for the file prt_datasets-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for prt_datasets-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7f84e8fafce738f2b63c787de77019f63aa4eb8039c804d304991427497a43e3
MD5 532ce2179f7690cc9fbe3ab9b7319fe3
BLAKE2b-256 3fe00c9b17c2fefa978ac9f10aa25a2b3ab0e8ce59c8c336ae19ed8b80138319

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page