Skip to main content

Add your description here

Project description

prt-datasets

prt-datasets is a small collection of synthetic and common example datasets packaged as PyTorch Datasets and Lightning DataModules. It provides utilities and ready-to-use DataModules for common examples used in experiments and tutorials such as MNIST (classification) and synthetic regression datasets (circle, cubic, thermistor). The goal of this project is to make it easy to prototype training and uncertainty estimation workflows with minimal setup.

Features

  • Lightweight PyTorch Dataset implementations for common toy problems
  • Lightning DataModule wrappers for easy integration with PyTorch Lightning
  • Built-in examples: MNIST (wrapper), Circle, Cubic, Thermistor

Installation

Requires Python 3.11 or later. The project declares the following runtime dependencies in pyproject.toml:

  • lightning
  • numpy
  • requests
  • torch

To install from source (editable) with pip and the dev/test extras, run:

python -m pip install -e .[dev]

Or install the package normally:

python -m pip install .

If you only want runtime dependencies, install them directly:

python -m pip install lightning numpy requests torch

Quick examples

Below are short examples showing how to use DataModules and Datasets in this repository.

Note: the package exposes modules under prt_datasets. Import paths shown assume the package is installed or the repository root is on PYTHONPATH.

Circle (regression)

The CircleDataModule creates a synthetic 2D circle dataset and exposes train/val/test dataloaders.

from prt_datasets.regression.circle import CircleDataModule

dm = CircleDataModule(batch_size=128, num_workers=4, seed=0)
dm.prepare_data()
dm.setup()

train_loader = dm.train_dataloader()
for x, y in train_loader:
	# x: angle values, y: 2D coordinates on noisy circle
	break

Cubic (regression)

The CubicDataModule provides samples of the function y = x^3 + noise with separate train/test ranges so you can experiment with interpolation/epistemic uncertainty.

from prt_datasets.regression.cubic import CubicDataModule

dm = CubicDataModule(batch_size=64, num_workers=4, seed=42)
dm.setup()
loader = dm.train_dataloader()
for x, y in loader:
	# x, y are tensors shaped (B, 1)
	break

MNIST (classification)

MNISTDataModule is a thin wrapper around torchvision.datasets.MNIST. It normalizes data to the standard MNIST mean/std and provides Lightning DataModule loaders.

from prt_datasets.classification.mnist import MNISTDataModule

dm = MNISTDataModule(root='data', batch_size=64)
dm.prepare_data()
dm.setup()
train_loader = dm.train_dataloader()
for imgs, labels in train_loader:
	break

API overview

  • prt_datasets.classification.MNISTDataset, MNISTDataModule
  • prt_datasets.regression.CircleDataset, CircleDataModule
  • prt_datasets.regression.CubicDataset, CubicDataModule
  • prt_datasets.regression.ThermistorDataset, ThermistorModel

Refer to the docstrings in the source files for parameter details and behaviors.

Tests

This repository uses pytest for tests. To run the test suite:

python -m pip install -e .[dev]
pytest -q

There are tests under tests/ that exercise basic dataset behaviors.

Contributing

Contributions are welcome. A few guidelines:

  • Open an issue to discuss larger changes before implementing them.
  • Keep changes small and focused. Add tests for new functionality.
  • Follow the repository style and type annotations where present.

License

This project is provided under the terms of the license in LICENSE.md.

Maintainer

Gavin Strunk

If you spot mistakes or want more example datasets, file an issue or send a PR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prt_datasets-0.1.2.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prt_datasets-0.1.2-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file prt_datasets-0.1.2.tar.gz.

File metadata

  • Download URL: prt_datasets-0.1.2.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for prt_datasets-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9d1590d18c24e4b731f21a0448d92fd56255a617e410bf0ef2517b4354806615
MD5 31698660cf443d97d0f74123765b3ecb
BLAKE2b-256 993fdd898de80c27776028740eab295609560646341e235f70bd3fc162c99fda

See more details on using hashes here.

File details

Details for the file prt_datasets-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: prt_datasets-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for prt_datasets-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f2011975715602e65ef361e2cbabb276b086f45ef24a1898d757ad382ef638bb
MD5 1908967126d0f3c3979a65251f294798
BLAKE2b-256 fa69c5a7c019f5920381ff94e8258d9078e0d0ad388d2d2227ef2a1db2a112c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page