Skip to main content

Custom PyTorch Dataset implementations for publicly available datasets across various modalities

Project description

any-gold

Have you ever been in a situation where you wanted to experiment with a new dataset and wasted a few hours of your time before even having access to the data? We did, and we truly believe that it should not be like that anymore.

Any Gold is thus a comprehensive collection of custom PyTorch Dataset implementations for publicly available datasets across various modalities.

Purpose

The goal of this repository is to provide custom PyTorch Dataset classes that are compatible with PyTorch's DataLoader to facilitate experimentation with publicly available datasets. Each dataset implementation includes automated download functionality to locally cache the data before use. Instead of spending time to access the data, you can focus on experimenting with it.

Features

  • PyTorch Integration: All datasets implement the PyTorch Dataset interface
  • Automatic Downloads: Built-in functionality to download and cache datasets locally
  • Multimodal Support: Datasets spanning various data types and domains
  • Consistent API: Uniform interface across different dataset implementations
  • Minimal Dependencies: Core dependencies are managed with uv

Available Datasets

Image Datasets

  • PlantSeg: Large-scale in-the-wild dataset for plant disease segmentation (Paper, Zenodo)
  • MVTecADDataset: Anomaly detection dataset for industrial inspection (Paper, Hugging Face)
  • KPITask1PatchLevel: A dataset for kidney disease segmentation (Paper, Synapse)
  • DeepGlobeRoadExtraction: Road extraction from satellite images (Paper, Kaggle)
  • ISIC2018SkinLesionDataset: A dataset for skin lesion segmentation(Paper, Hugging Face)

Usage

import any_gold as ag
from torch.utils.data import DataLoader

# Initialize dataset (downloads data if not already present)
dataset = ag.AnyDataset()

# Create DataLoader
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Iterate through batches
for batch in dataloader:
  # Your training/evaluation code here
  pass

Contributing

Process

Contributions are welcome! To contribute to this project:

  1. Fork the repository on GitHub
  2. Clone your fork: git clone https://github.com/yourusername/any-gold.git
  3. Create a new branch for your feature: git checkout -b feature-name
  4. Install development dependencies (see below)
  5. Set up pre-commit hooks: uv run pre-commit install
  6. Implement a new class that inherits from AnyDataset
  7. Include download functionality for the dataset
  8. Add appropriate documentation and tests (pytest) for your dataset class
  9. Ensure code passes all pre-commit checks
  10. Submit a pull request to the main repository

We use pre-commit hooks to maintain code quality:

  • Ruff for linting and formatting
  • MyPy for type checking

Installation

Dependencies in this repository are managed with uv, a fast Python package installer and resolver. The dependencies are defined in the pyproject.toml file.

# Clone the repository
git clone https://github.com/yourusername/any-gold.git
cd any-gold

# Install dependencies with uv
uv sync --all-extras
source .venv/bin/activate

Release Process

To release a new version of the any-gold package:

  1. Create a new branch for the release: git checkout -b release-vX.Y.Z
  2. Update the version vX.Y.Z in pyproject.toml
  3. Commit the changes with a message like release vX.Y.Z
  4. Merge the branch into main
  5. trigger a new release on GitHub with the tag vX.Y.Z

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

any_gold-0.2.0.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

any_gold-0.2.0-py3-none-any.whl (26.9 kB view details)

Uploaded Python 3

File details

Details for the file any_gold-0.2.0.tar.gz.

File metadata

  • Download URL: any_gold-0.2.0.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for any_gold-0.2.0.tar.gz
Algorithm Hash digest
SHA256 66fd36fd51fc905d8519b1ff8a76d72c48203239f140122e273deb54cedf53f4
MD5 378dab7dca4a880f9b15d129a4e9b052
BLAKE2b-256 66a5589d2e087649a9d695991e809b14f675a73c3e86316ac1a344d668c28fc2

See more details on using hashes here.

Provenance

The following attestation bundles were made for any_gold-0.2.0.tar.gz:

Publisher: release.yaml on goldener-data/any-gold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file any_gold-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: any_gold-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 26.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for any_gold-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 20ad19976ab5219295d6ec25ec5972006cc482800af026bbf0195041292b3759
MD5 31973c7516f0e3331fb5486870b94b08
BLAKE2b-256 520882777d49696167f1cfdb2773d47ae7dc8f63173388b180eda2bc957bf8e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for any_gold-0.2.0-py3-none-any.whl:

Publisher: release.yaml on goldener-data/any-gold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page