SpuCo: Spurious Correlations Datasets and Benchmarks
Project description
SpuCo (Spurious Correlations Datasets and Benchmarks)
SpuCo is a Python package developed to further research to address spurious correlations. Spurious correlations arise when machine learning models learn to exploit easy features that are not predictive of class membership but are correlated with a given class in the training data. This leads to catastrophically poor performance on the groups of data without such spurious features at test time.
Link to Paper: https://arxiv.org/abs/2306.11957
The SpuCo package is designed to help researchers and practitioners evaluate the robustness of their machine learning algorithms against spurious correlations that may exist in real-world data. SpuCo provides:
- Modular implementations of current state-of-the-art (SOTA) methods to address spurious correlations
- SpuCoMNIST: a controllable synthetic dataset that explores real-world data properties such as spurious feature difficulty, label noise, and feature noise
- SpuCoAnimals: a large-scale vision dataset curated from ImageNet to explore real-world spurious correlations
- SpuCoSun: a large-scale vision dataset with created using backgrounds from SUN397 (class feature) and foregrounds (spurious feature) created using a text-to-image diffusion model corresponding to OpenImagesV7. Two versions of this dataset are provided: SpuCoSun Easy and SpuCoSun Hard with easy and hard spurious features, respectively.
Note: This project is under active development.
Quickstart
Refer to quickstart for scripts and notebooks to get started with SpuCo
You can explore the data with the notebook: Explore Data
You can find scripts / notebooks for training with SOTA methods in the folders under quickstart. These are organized by dataset name.
Installation
pip install spuco
Requires >= Python 3.10
Using with GuildAI
Creating gpu-affinitized queues
for i in {0..7}; do guild run queue -b --gpus="$i" -y; done
About Us
This package is maintained by Siddharth Joshi from the BigML group at UCLA, headed by Professor Baharan Mirzasoleiman.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spuco-2.0.3.tar.gz.
File metadata
- Download URL: spuco-2.0.3.tar.gz
- Upload date:
- Size: 95.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8f6431a573953d16da2cc5815ad3c98fcf1964624e4549d709edba6976a0b8f
|
|
| MD5 |
41e4046a3a0ef22ba92b380523f779bb
|
|
| BLAKE2b-256 |
47673f362357bdad7f052239a3061d396651b1a06ce2498d6d5c624ced07e9b2
|
File details
Details for the file spuco-2.0.3-py3-none-any.whl.
File metadata
- Download URL: spuco-2.0.3-py3-none-any.whl
- Upload date:
- Size: 127.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79e3ac8b6dd6b08616ac6a20ae41eb635ca553ea2b75e1a6f04a28dd6fb2ce86
|
|
| MD5 |
083ac5b01f92f3ea82d6bc0ac2d4376f
|
|
| BLAKE2b-256 |
c779e9b08f4900a6aa475e4852d1f64401b337eabd987048292f9a76323a1f25
|