Skip to main content

SpuCo: Spurious Correlations Datasets and Benchmarks

Project description

SpuCo (Spurious Correlations Datasets and Benchmarks)

Documentation Status

SpuCo is a Python package developed to further research to address spurious correlations. Spurious correlations arise when machine learning models learn to exploit easy features that are not predictive of class membership but are correlated with a given class in the training data. This leads to catastrophically poor performance on the groups of data without such spurious features at test time.

Diagram illustrating the spurious correlations problem

Link to Paper: https://arxiv.org/abs/2306.11957

The SpuCo package is designed to help researchers and practitioners evaluate the robustness of their machine learning algorithms against spurious correlations that may exist in real-world data. SpuCo provides:

  • Modular implementations of current state-of-the-art (SOTA) methods to address spurious correlations
  • SpuCoMNIST: a controllable synthetic dataset that explores real-world data properties such as spurious feature difficulty, label noise, and feature noise
  • SpuCoAnimals: a large-scale vision dataset curated from ImageNet to explore real-world spurious correlations
  • SpuCoSun: a large-scale vision dataset with created using backgrounds from SUN397 (class feature) and foregrounds (spurious feature) created using a text-to-image diffusion model corresponding to OpenImagesV7. Two versions of this dataset are provided: SpuCoSun Easy and SpuCoSun Hard with easy and hard spurious features, respectively.

Note: This project is under active development.

Quickstart

Refer to quickstart for scripts and notebooks to get started with SpuCo

You can explore the data with the notebook: Explore Data

You can find scripts / notebooks for training with SOTA methods in the folders under quickstart. These are organized by dataset name.

Installation

pip install spuco

Requires >= Python 3.10

Using with GuildAI

Creating gpu-affinitized queues

for i in {0..7}; do guild run queue -b --gpus="$i" -y; done

About Us

This package is maintained by Siddharth Joshi from the BigML group at UCLA, headed by Professor Baharan Mirzasoleiman.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spuco-2.0.3.tar.gz (95.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spuco-2.0.3-py3-none-any.whl (127.4 kB view details)

Uploaded Python 3

File details

Details for the file spuco-2.0.3.tar.gz.

File metadata

  • Download URL: spuco-2.0.3.tar.gz
  • Upload date:
  • Size: 95.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.0

File hashes

Hashes for spuco-2.0.3.tar.gz
Algorithm Hash digest
SHA256 d8f6431a573953d16da2cc5815ad3c98fcf1964624e4549d709edba6976a0b8f
MD5 41e4046a3a0ef22ba92b380523f779bb
BLAKE2b-256 47673f362357bdad7f052239a3061d396651b1a06ce2498d6d5c624ced07e9b2

See more details on using hashes here.

File details

Details for the file spuco-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: spuco-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 127.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.0

File hashes

Hashes for spuco-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 79e3ac8b6dd6b08616ac6a20ae41eb635ca553ea2b75e1a6f04a28dd6fb2ce86
MD5 083ac5b01f92f3ea82d6bc0ac2d4376f
BLAKE2b-256 c779e9b08f4900a6aa475e4852d1f64401b337eabd987048292f9a76323a1f25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page