Skip to main content

Custom PyTorch Dataset implementations for publicly available datasets across various modalities

Project description

any-gold

Have you ever been in a situation where you wanted to experiment with a new dataset and wasted a few hours of your time before even having access to the data? We did, and we truly believe that it should not be like that anymore.

Any Gold is thus a comprehensive collection of custom PyTorch Dataset implementations for publicly available datasets across various modalities.

Purpose

The goal of this repository is to provide custom PyTorch Dataset classes that are compatible with PyTorch's DataLoader to facilitate experimentation with publicly available datasets. Each dataset implementation includes automated download functionality to locally cache the data before use. Instead of spending time to access the data, you can focus on experimenting with it.

Features

  • PyTorch Integration: All datasets implement the PyTorch Dataset interface
  • Automatic Downloads: Built-in functionality to download and cache datasets locally
  • Multimodal Support: Datasets spanning various data types and domains
  • Consistent API: Uniform interface across different dataset implementations
  • Minimal Dependencies: Core dependencies are managed with uv

Available Datasets

Image Datasets

  • PlantSeg: Large-scale in-the-wild dataset for plant disease segmentation (Paper, Zenodo)
  • MVTecADDataset: Anomaly detection dataset for industrial inspection (Paper, Hugging Face)
  • KPITask1PatchLevel: A dataset for kidney disease segmentation (Paper, Synapse)
  • DeepGlobeRoadExtraction: Road extraction from satellite images (Paper, Kaggle)
  • ISIC2018SkinLesionDataset: A dataset for skin lesion segmentation(Paper, Hugging Face)
  • PascalVOC2012Segmentation: A dataset for semantic segmentation (Website, Kaggle)

Usage

import any_gold as ag
from torch.utils.data import DataLoader

# Initialize dataset (downloads data if not already present)
dataset = ag.AnyDataset()

# Create DataLoader
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Iterate through batches
for batch in dataloader:
  # Your training/evaluation code here
  pass

Contributing

Process

Contributions are welcome! To contribute to this project:

  1. Fork the repository on GitHub
  2. Clone your fork: git clone https://github.com/yourusername/any-gold.git
  3. Create a new branch for your feature: git checkout -b feature-name
  4. Install development dependencies (see below)
  5. Set up pre-commit hooks: uv run pre-commit install
  6. Implement a new class that inherits from AnyDataset
  7. Include download functionality for the dataset
  8. Add appropriate documentation and tests (pytest) for your dataset class
  9. Ensure code passes all pre-commit checks
  10. Submit a pull request to the main repository

We use pre-commit hooks to maintain code quality:

  • Ruff for linting and formatting
  • MyPy for type checking

Installation

Dependencies in this repository are managed with uv, a fast Python package installer and resolver. The dependencies are defined in the pyproject.toml file.

# Clone the repository
git clone https://github.com/yourusername/any-gold.git
cd any-gold

# Install dependencies with uv
uv sync --all-extras
source .venv/bin/activate

Release Process

To release a new version of the any-gold package:

  1. Create a new branch for the release: git checkout -b release-vX.Y.Z
  2. Update the version vX.Y.Z in pyproject.toml
  3. Run uv sync to update the lock file
  4. Commit the changes with a message like release vX.Y.Z
  5. Merge the branch into main
  6. trigger a new release on GitHub with the tag vX.Y.Z

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

any_gold-0.3.0.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

any_gold-0.3.0-py3-none-any.whl (30.4 kB view details)

Uploaded Python 3

File details

Details for the file any_gold-0.3.0.tar.gz.

File metadata

  • Download URL: any_gold-0.3.0.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for any_gold-0.3.0.tar.gz
Algorithm Hash digest
SHA256 31d121d9b69e30686dddfc1f8f54bfac2c3894ff3bb572b383c7670b488452a3
MD5 30afbac72f2b011f24af8aa7cab31c59
BLAKE2b-256 8f6d03b8c8dd18ce134c1a05d83f15288a9a6548954fc79c6fa3eee40918a1e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for any_gold-0.3.0.tar.gz:

Publisher: release.yaml on goldener-data/any-gold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file any_gold-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: any_gold-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 30.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for any_gold-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 af4997ded2cab3e7fb092f9afae65cb61aaaa7267976a79eeb43737ebcd5949e
MD5 22e6cc586e69b3585b3c70064d58ebcd
BLAKE2b-256 3d54bbf21fce74560089586809e18805d35c0f37e6c1daf61029fd7eb2e858a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for any_gold-0.3.0-py3-none-any.whl:

Publisher: release.yaml on goldener-data/any-gold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page