Skip to main content

Custom PyTorch Dataset implementations for publicly available datasets across various modalities

Project description

any-gold

Have you ever been in a situation where you wanted to experiment with a new dataset and wasted a few hours of your time before even having access to the data? We did, and we truly believe that it should not be like that anymore.

Any Gold is thus a comprehensive collection of custom PyTorch Dataset implementations for publicly available datasets across various modalities.

Purpose

The goal of this repository is to provide custom PyTorch Dataset classes that are compatible with PyTorch's DataLoader to facilitate experimentation with publicly available datasets. Each dataset implementation includes automated download functionality to locally cache the data before use. Instead of spending time to access the data, you can focus on experimenting with it.

Features

  • PyTorch Integration: All datasets implement the PyTorch Dataset interface
  • Automatic Downloads: Built-in functionality to download and cache datasets locally
  • Multimodal Support: Datasets spanning various data types and domains
  • Consistent API: Uniform interface across different dataset implementations
  • Minimal Dependencies: Core dependencies are managed with uv

Available Datasets

Image Datasets

  • PlantSeg: Large-scale in-the-wild dataset for plant disease segmentation (Paper, Zenodo)
  • MVTecADDataset: Anomaly detection dataset for industrial inspection (Paper, Hugging Face)
  • KPITask1PatchLevel: A dataset for kidney disease segmentation (Paper, Synapse)
  • DeepGlobeRoadExtraction: Road extraction from satellite images (Paper, Kaggle)

Usage

import any_gold as ag
from torch.utils.data import DataLoader

# Initialize dataset (downloads data if not already present)
dataset = ag.AnyDataset()

# Create DataLoader
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Iterate through batches
for batch in dataloader:
  # Your training/evaluation code here
  pass

Contributing

Process

Contributions are welcome! To contribute to this project:

  1. Fork the repository on GitHub
  2. Clone your fork: git clone https://github.com/yourusername/any-gold.git
  3. Create a new branch for your feature: git checkout -b feature-name
  4. Install development dependencies (see below)
  5. Set up pre-commit hooks: uv run pre-commit install
  6. Implement a new class that inherits from AnyDataset
  7. Include download functionality for the dataset
  8. Add appropriate documentation and tests (pytest) for your dataset class
  9. Ensure code passes all pre-commit checks
  10. Submit a pull request to the main repository

We use pre-commit hooks to maintain code quality:

  • Ruff for linting and formatting
  • MyPy for type checking

Installation

Dependencies in this repository are managed with uv, a fast Python package installer and resolver. The dependencies are defined in the pyproject.toml file.

# Clone the repository
git clone https://github.com/yourusername/any-gold.git
cd any-gold

# Install dependencies with uv
uv sync --all-extras
source .venv/bin/activate

Release Process

To release a new version of the any-gold package:

  1. Create a new branch for the release: git checkout -b release-vX.Y.Z
  2. Update the version vX.Y.Z in pyproject.toml
  3. Commit the changes with a message like release vX.Y.Z
  4. Merge the branch into main
  5. trigger a new release on GitHub with the tag vX.Y.Z

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

any_gold-0.1.1.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

any_gold-0.1.1-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file any_gold-0.1.1.tar.gz.

File metadata

  • Download URL: any_gold-0.1.1.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for any_gold-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e8bd5e288af48f7ea94a1c2dafb564b2dd7d9c831e77f34a10fad85cd5958d93
MD5 2cee033ba76186cefdf18f670eea6b3e
BLAKE2b-256 25502dd5c97d66ce28ca557811aa8b4dd934a9e4e182fcd9aa38630711b62eeb

See more details on using hashes here.

Provenance

The following attestation bundles were made for any_gold-0.1.1.tar.gz:

Publisher: release.yaml on goldener-data/any-gold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file any_gold-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: any_gold-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for any_gold-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4cab851054e050785bd007bd6e61fe9b18155f354a199ab1045ea97b8e8aba41
MD5 464363c0303ddb44a1f32573abe2b9b3
BLAKE2b-256 860f9c7f4757dc8743c3f383f6cf7d9079e85956e940d0b39413027182ef7c7b

See more details on using hashes here.

Provenance

The following attestation bundles were made for any_gold-0.1.1-py3-none-any.whl:

Publisher: release.yaml on goldener-data/any-gold

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page