Skip to main content

Erasing concepts from neural representations with provable guarantees

Project description

Least-Squares Concept Erasure (LEACE)

Concept erasure aims to remove specified features from a representation. It can be used to improve fairness (e.g. preventing a classifier from using gender or race) and interpretability (e.g. removing a concept to observe changes in model behavior). This is the repo for LEAst-squares Concept Erasure (LEACE), a closed-form method which provably prevents all linear classifiers from detecting a concept while inflicting the least possible damage to the representation. You can check out the paper here.

Installation

We require Python 3.10 or later. You can install the package from PyPI:

pip install concept-erasure

Usage

The two main classes in this repo are LeaceFitter and LeaceEraser.

  • LeaceFitter keeps track of the covariance and cross-covariance statistics needed to compute the LEACE erasure function. These statistics can be updated in an incremental fashion with LeaceFitter.update(). The erasure function is lazily computed when the .eraser property is accessed. This class uses O(d2) memory, where d is the dimensionality of the representation, so you may want to discard it after computing the erasure function.
  • LeaceEraser is a compact representation of the LEACE erasure function, using only O(dk) memory, where k is the number of classes in the concept you're trying to erase (or equivalently, the dimensionality of the concept if it's not categorical).

Batch usage

In most cases, you probably have a batch of feature vectors X and concept labels Z and want to erase the concept from X. The easiest way to do this is by using the LeaceEraser.fit() convenience method:

import torch
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

from concept_erasure import LeaceEraser

n, d, k = 2048, 128, 2

X, Y = make_classification(
    n_samples=n,
    n_features=d,
    n_classes=k,
    random_state=42,
)
X_t = torch.from_numpy(X)
Y_t = torch.from_numpy(Y)

# Logistic regression does learn something before concept erasure
real_lr = LogisticRegression(max_iter=1000).fit(X, Y)
beta = torch.from_numpy(real_lr.coef_)
assert beta.norm(p=torch.inf) > 0.1

eraser = LeaceEraser.fit(X_t, Y_t)
X_ = eraser(X_t)

# But learns nothing after
null_lr = LogisticRegression(max_iter=1000, tol=0.0).fit(X_.numpy(), Y)
beta = torch.from_numpy(null_lr.coef_)
assert beta.norm(p=torch.inf) < 1e-4

Streaming usage

If you have a stream of data, you can use LeaceFitter.update() to update the statistics. This is useful if you have a large dataset and want to avoid storing it all in memory.

from concept_erasure import LeaceFitter
from sklearn.datasets import make_classification
import torch

n, d, k = 2048, 128, 2

X, Y = make_classification(
    n_samples=n,
    n_features=d,
    n_classes=k,
    random_state=42,
)
X_t = torch.from_numpy(X)
Y_t = torch.from_numpy(Y)

fitter = LeaceFitter(d, 1, dtype=X_t.dtype)

# Compute cross-covariance matrix using batched updates
for x, y in zip(X_t.chunk(2), Y_t.chunk(2)):
    fitter.update(x, y)

# Erase the concept from the data
x_ = fitter.eraser(X_t[0])

Paper replication

Scripts used to generate the part-of-speech tags for the concept scrubbing experiments can be found in this repo. We plan to upload the tagged datasets to the HuggingFace Hub shortly.

Concept scrubbing

The concept scrubbing code is a bit messy right now, and will probably be refactored soon. We found it necessary to write bespoke implementations for different HuggingFace model families. So far we've implemented LLaMA and GPT-NeoX. These can be found in the concept_erasure.scrubbing submodule.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

concept-erasure-0.2.4.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

concept_erasure-0.2.4-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file concept-erasure-0.2.4.tar.gz.

File metadata

  • Download URL: concept-erasure-0.2.4.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for concept-erasure-0.2.4.tar.gz
Algorithm Hash digest
SHA256 3f2a8f3a9713c7a906c4013b1d94594ca588af7124f04541ca6b4ea72f04ce90
MD5 86759d3b74fcc4876ae39e31069bb1cc
BLAKE2b-256 a433e2b5ad6331faa1fc6441c4d48c8c59013da126b2aa61ae299cf4529b8e26

See more details on using hashes here.

File details

Details for the file concept_erasure-0.2.4-py3-none-any.whl.

File metadata

File hashes

Hashes for concept_erasure-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a7665b49d97c6aa3bfd504ca724ef460bb61237aa510ae31e9550de8619806af
MD5 28c3d569a5aa395f71080b2025688463
BLAKE2b-256 053f859285794eba7e6d215e2bdf1ff81eb957b69a6c30577cf04e64782d4411

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page