Skip to main content

Add your description here

Project description

prt-datasets

prt-datasets is a small collection of synthetic and common example datasets packaged as PyTorch Datasets and Lightning DataModules. It provides utilities and ready-to-use DataModules for common examples used in experiments and tutorials such as MNIST (classification) and synthetic regression datasets (circle, cubic, thermistor). The goal of this project is to make it easy to prototype training and uncertainty estimation workflows with minimal setup.

Features

  • Lightweight PyTorch Dataset implementations for common toy problems
  • Lightning DataModule wrappers for easy integration with PyTorch Lightning
  • Built-in examples: MNIST (wrapper), Circle, Cubic, Thermistor

Installation

Requires Python 3.11 or later. The project declares the following runtime dependencies in pyproject.toml:

  • lightning
  • numpy
  • requests
  • torch

To install from source (editable) with pip and the dev/test extras, run:

python -m pip install -e .[dev]

Or install the package normally:

python -m pip install .

If you only want runtime dependencies, install them directly:

python -m pip install lightning numpy requests torch

Quick examples

Below are short examples showing how to use DataModules and Datasets in this repository.

Note: the package exposes modules under prt_datasets. Import paths shown assume the package is installed or the repository root is on PYTHONPATH.

Circle (regression)

The CircleDataModule creates a synthetic 2D circle dataset and exposes train/val/test dataloaders.

from prt_datasets.regression.circle import CircleDataModule

dm = CircleDataModule(batch_size=128, num_workers=4, seed=0)
dm.prepare_data()
dm.setup()

train_loader = dm.train_dataloader()
for x, y in train_loader:
	# x: angle values, y: 2D coordinates on noisy circle
	break

Cubic (regression)

The CubicDataModule provides samples of the function y = x^3 + noise with separate train/test ranges so you can experiment with interpolation/epistemic uncertainty.

from prt_datasets.regression.cubic import CubicDataModule

dm = CubicDataModule(batch_size=64, num_workers=4, seed=42)
dm.setup()
loader = dm.train_dataloader()
for x, y in loader:
	# x, y are tensors shaped (B, 1)
	break

MNIST (classification)

MNISTDataModule is a thin wrapper around torchvision.datasets.MNIST. It normalizes data to the standard MNIST mean/std and provides Lightning DataModule loaders.

from prt_datasets.classification.mnist import MNISTDataModule

dm = MNISTDataModule(root='data', batch_size=64)
dm.prepare_data()
dm.setup()
train_loader = dm.train_dataloader()
for imgs, labels in train_loader:
	break

API overview

  • prt_datasets.classification.MNISTDataset, MNISTDataModule
  • prt_datasets.regression.CircleDataset, CircleDataModule
  • prt_datasets.regression.CubicDataset, CubicDataModule
  • prt_datasets.regression.ThermistorDataset, ThermistorModel

Refer to the docstrings in the source files for parameter details and behaviors.

Tests

This repository uses pytest for tests. To run the test suite:

python -m pip install -e .[dev]
pytest -q

There are tests under tests/ that exercise basic dataset behaviors.

Contributing

Contributions are welcome. A few guidelines:

  • Open an issue to discuss larger changes before implementing them.
  • Keep changes small and focused. Add tests for new functionality.
  • Follow the repository style and type annotations where present.

License

This project is provided under the terms of the license in LICENSE.md.

Maintainer

Gavin Strunk

If you spot mistakes or want more example datasets, file an issue or send a PR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prt_datasets-0.1.0.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prt_datasets-0.1.0-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file prt_datasets-0.1.0.tar.gz.

File metadata

  • Download URL: prt_datasets-0.1.0.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for prt_datasets-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f19646e3f0039d376f1f0a989c39b9b4f790bd0a66d2eed75a00f83b8c04eac0
MD5 7624084684c75d65601a1ebb2b031876
BLAKE2b-256 de94cd98535a1e048c5152ca0eed9d0e8aece4265f5d5168877ab2d980d89394

See more details on using hashes here.

File details

Details for the file prt_datasets-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for prt_datasets-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3084ad57d394e152344613c0208eafe3650ae7edcb326c23e5f4d6a78415a371
MD5 46e573697e3f41c2428648bf648d2022
BLAKE2b-256 a0fdfe193ef142688cc42e66709fea10f1d84f9bd81fc40ec473712db1e18795

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page