Custom PyTorch Dataset implementations for publicly available datasets across various modalities
Project description
any-gold
Have you ever been in a situation where you wanted to experiment with a new dataset and wasted a few hours of your time before even having access to the data? We did, and we truly believe that it should not be like that anymore.
Any Gold is thus a comprehensive collection of custom PyTorch Dataset implementations for publicly available datasets across various modalities.
Purpose
The goal of this repository is to provide custom PyTorch Dataset classes
that are compatible with PyTorch's DataLoader to facilitate experimentation
with publicly available datasets. Each dataset implementation includes
automated download functionality to locally cache the data before use. Instead of spending time to access the data,
you can focus on experimenting with it.
Features
- PyTorch Integration: All datasets implement the PyTorch
Datasetinterface - Automatic Downloads: Built-in functionality to download and cache datasets locally
- Multimodal Support: Datasets spanning various data types and domains
- Consistent API: Uniform interface across different dataset implementations
- Minimal Dependencies: Core dependencies are managed with
uv
Available Datasets
Image Datasets
PlantSeg: Large-scale in-the-wild dataset for plant disease segmentation (Paper, Zenodo)MVTecADDataset: Anomaly detection dataset for industrial inspection (Paper, Hugging Face)KPITask1PatchLevel: A dataset for kidney disease segmentation (Paper, Synapse)DeepGlobeRoadExtraction: Road extraction from satellite images (Paper, Kaggle)ISIC2018SkinLesionDataset: A dataset for skin lesion segmentation(Paper, Hugging Face)
Usage
import any_gold as ag
from torch.utils.data import DataLoader
# Initialize dataset (downloads data if not already present)
dataset = ag.AnyDataset()
# Create DataLoader
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
# Iterate through batches
for batch in dataloader:
# Your training/evaluation code here
pass
Contributing
Process
Contributions are welcome! To contribute to this project:
- Fork the repository on GitHub
- Clone your fork:
git clone https://github.com/yourusername/any-gold.git - Create a new branch for your feature:
git checkout -b feature-name - Install development dependencies (see below)
- Set up pre-commit hooks:
uv run pre-commit install - Implement a new class that inherits from
AnyDataset - Include download functionality for the dataset
- Add appropriate documentation and tests (pytest) for your dataset class
- Ensure code passes all pre-commit checks
- Submit a pull request to the main repository
We use pre-commit hooks to maintain code quality:
- Ruff for linting and formatting
- MyPy for type checking
Installation
Dependencies in this repository are managed with uv,
a fast Python package installer and resolver. The dependencies are defined in the
pyproject.toml file.
# Clone the repository
git clone https://github.com/yourusername/any-gold.git
cd any-gold
# Install dependencies with uv
uv sync --all-extras
source .venv/bin/activate
Release Process
To release a new version of the any-gold package:
- Create a new branch for the release:
git checkout -b release-vX.Y.Z - Update the version
vX.Y.Zinpyproject.toml - Commit the changes with a message like
release vX.Y.Z - Merge the branch into
main - trigger a new release on GitHub with the tag
vX.Y.Z
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file any_gold-0.2.0.tar.gz.
File metadata
- Download URL: any_gold-0.2.0.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66fd36fd51fc905d8519b1ff8a76d72c48203239f140122e273deb54cedf53f4
|
|
| MD5 |
378dab7dca4a880f9b15d129a4e9b052
|
|
| BLAKE2b-256 |
66a5589d2e087649a9d695991e809b14f675a73c3e86316ac1a344d668c28fc2
|
Provenance
The following attestation bundles were made for any_gold-0.2.0.tar.gz:
Publisher:
release.yaml on goldener-data/any-gold
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
any_gold-0.2.0.tar.gz -
Subject digest:
66fd36fd51fc905d8519b1ff8a76d72c48203239f140122e273deb54cedf53f4 - Sigstore transparency entry: 462647320
- Sigstore integration time:
-
Permalink:
goldener-data/any-gold@8221b414b13d318a5056df684dd4778cea6ea946 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/goldener-data
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@8221b414b13d318a5056df684dd4778cea6ea946 -
Trigger Event:
release
-
Statement type:
File details
Details for the file any_gold-0.2.0-py3-none-any.whl.
File metadata
- Download URL: any_gold-0.2.0-py3-none-any.whl
- Upload date:
- Size: 26.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20ad19976ab5219295d6ec25ec5972006cc482800af026bbf0195041292b3759
|
|
| MD5 |
31973c7516f0e3331fb5486870b94b08
|
|
| BLAKE2b-256 |
520882777d49696167f1cfdb2773d47ae7dc8f63173388b180eda2bc957bf8e3
|
Provenance
The following attestation bundles were made for any_gold-0.2.0-py3-none-any.whl:
Publisher:
release.yaml on goldener-data/any-gold
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
any_gold-0.2.0-py3-none-any.whl -
Subject digest:
20ad19976ab5219295d6ec25ec5972006cc482800af026bbf0195041292b3759 - Sigstore transparency entry: 462647323
- Sigstore integration time:
-
Permalink:
goldener-data/any-gold@8221b414b13d318a5056df684dd4778cea6ea946 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/goldener-data
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@8221b414b13d318a5056df684dd4778cea6ea946 -
Trigger Event:
release
-
Statement type: