Add your description here
Project description
prt-datasets
prt-datasets is a small collection of synthetic and common example datasets packaged as PyTorch Datasets and Lightning DataModules. It provides utilities and ready-to-use DataModules for common examples used in experiments and tutorials such as MNIST (classification) and synthetic regression datasets (circle, cubic, thermistor). The goal of this project is to make it easy to prototype training and uncertainty estimation workflows with minimal setup.
Features
- Lightweight PyTorch Dataset implementations for common toy problems
- Lightning DataModule wrappers for easy integration with PyTorch Lightning
- Built-in examples: MNIST (wrapper), Circle, Cubic, Thermistor
Installation
Requires Python 3.11 or later. The project declares the following runtime dependencies in pyproject.toml:
- lightning
- numpy
- requests
- torch
To install from source (editable) with pip and the dev/test extras, run:
python -m pip install -e .[dev]
Or install the package normally:
python -m pip install .
If you only want runtime dependencies, install them directly:
python -m pip install lightning numpy requests torch
Quick examples
Below are short examples showing how to use DataModules and Datasets in this repository.
Note: the package exposes modules under prt_datasets. Import paths shown assume the package is
installed or the repository root is on PYTHONPATH.
Circle (regression)
The CircleDataModule creates a synthetic 2D circle dataset and exposes train/val/test dataloaders.
from prt_datasets.regression.circle import CircleDataModule
dm = CircleDataModule(batch_size=128, num_workers=4, seed=0)
dm.prepare_data()
dm.setup()
train_loader = dm.train_dataloader()
for x, y in train_loader:
# x: angle values, y: 2D coordinates on noisy circle
break
Cubic (regression)
The CubicDataModule provides samples of the function y = x^3 + noise with separate train/test ranges
so you can experiment with interpolation/epistemic uncertainty.
from prt_datasets.regression.cubic import CubicDataModule
dm = CubicDataModule(batch_size=64, num_workers=4, seed=42)
dm.setup()
loader = dm.train_dataloader()
for x, y in loader:
# x, y are tensors shaped (B, 1)
break
MNIST (classification)
MNISTDataModule is a thin wrapper around torchvision.datasets.MNIST. It normalizes data to the
standard MNIST mean/std and provides Lightning DataModule loaders.
from prt_datasets.classification.mnist import MNISTDataModule
dm = MNISTDataModule(root='data', batch_size=64)
dm.prepare_data()
dm.setup()
train_loader = dm.train_dataloader()
for imgs, labels in train_loader:
break
API overview
- prt_datasets.classification.MNISTDataset, MNISTDataModule
- prt_datasets.regression.CircleDataset, CircleDataModule
- prt_datasets.regression.CubicDataset, CubicDataModule
- prt_datasets.regression.ThermistorDataset, ThermistorModel
Refer to the docstrings in the source files for parameter details and behaviors.
Tests
This repository uses pytest for tests. To run the test suite:
python -m pip install -e .[dev]
pytest -q
There are tests under tests/ that exercise basic dataset behaviors.
Contributing
Contributions are welcome. A few guidelines:
- Open an issue to discuss larger changes before implementing them.
- Keep changes small and focused. Add tests for new functionality.
- Follow the repository style and type annotations where present.
License
This project is provided under the terms of the license in LICENSE.md.
Maintainer
Gavin Strunk
If you spot mistakes or want more example datasets, file an issue or send a PR.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prt_datasets-0.1.2.tar.gz.
File metadata
- Download URL: prt_datasets-0.1.2.tar.gz
- Upload date:
- Size: 21.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d1590d18c24e4b731f21a0448d92fd56255a617e410bf0ef2517b4354806615
|
|
| MD5 |
31698660cf443d97d0f74123765b3ecb
|
|
| BLAKE2b-256 |
993fdd898de80c27776028740eab295609560646341e235f70bd3fc162c99fda
|
File details
Details for the file prt_datasets-0.1.2-py3-none-any.whl.
File metadata
- Download URL: prt_datasets-0.1.2-py3-none-any.whl
- Upload date:
- Size: 24.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2011975715602e65ef361e2cbabb276b086f45ef24a1898d757ad382ef638bb
|
|
| MD5 |
1908967126d0f3c3979a65251f294798
|
|
| BLAKE2b-256 |
fa69c5a7c019f5920381ff94e8258d9078e0d0ad388d2d2227ef2a1db2a112c9
|