Skip to main content

Red-teaming large language models for train data leakage

Project description

Pandora’s White-Box

Precise Training Data Detection and Extraction from Large Language Models

By Jeffrey G. Wang, Jason Wang, Marvin Li, and Seth Neel

Overview

pandora_llm is a red-teaming library against Large Language Models (LLMs) that assesses their vulnerability to train data leakage. It provides a unified PyTorch API for evaluating membership inference attacks (MIAs) and training data extraction.

You can read our paper and website for a technical introduction to the subject. Please refer to the documentation for the API reference as well as tutorials on how to use this codebase.

pandora_llm abides by the following core principles:

  • Open Access — Ensuring that these tools are open-source for all.
  • Reproducible — Committing to providing all necessary code details to ensure replicability.
  • Self-Contained — Designing attacks that are self-contained, making it transparent to understand the workings of the method without having to peer through the entire codebase or unnecessary levels of abstraction, and making it easy to contribute new code.
  • Model-Agnostic — Supporting any HuggingFace model and dataset, making it easy to apply to any situation.
  • Usability — Prioritizing easy-to-use starter scripts and comprehensive documentation so anyone can effectively use pandora_llm regardless of prior background.

We hope that our package serves to guide LLM providers to safety-check their models before release, and to empower the public to hold them accountable to their use of data.

Installation

From source:

git clone https://github.com/safr-ai-lab/pandora-llm.git
pip install -e .

From pip:

pip install pandora-llm

Quickstart

We maintain a collection of starter scripts in our codebase under experiments/. If you are creating a new attack, we recommend making a copy of a starter script for a solid template.

python experiments/mia/run_loss.py --model_name EleutherAI/pythia-70m-deduped --model_revision step98000 --num_samples 2000 --pack --seed 229

You can reproduce the experiments described in our paper through the shell scripts provided in the scripts/ folder.

bash scripts/pretrain_mia_baselines.sh

Contributing

We welcome contributions! Please submit pull requests in our GitHub.

Citation

If you use our code or otherwise find this library useful, please cite our paper:

@article{wang2024pandora,
  title={Pandora's White-Box: Increased Training Data Leakage in Open LLMs},
  author={Wang, Jeffrey G and Wang, Jason and Li, Marvin and Neel, Seth},
  journal={arXiv preprint arXiv:2402.17012},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandora_llm-0.1.0.tar.gz (42.3 kB view details)

Uploaded Source

Built Distribution

pandora_llm-0.1.0-py3-none-any.whl (55.3 kB view details)

Uploaded Python 3

File details

Details for the file pandora_llm-0.1.0.tar.gz.

File metadata

  • Download URL: pandora_llm-0.1.0.tar.gz
  • Upload date:
  • Size: 42.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.14

File hashes

Hashes for pandora_llm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 99f8f8785d2f1295afeaad9970f7dbb2608edff597da05419428c3e753433e3a
MD5 47ddcfc99b6d5a7c437fae209b253588
BLAKE2b-256 1f13a18fa3bfdcfe2ca2eb37d5aeb43f3a12da65b9ed99d551b2d41228ab0bea

See more details on using hashes here.

File details

Details for the file pandora_llm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pandora_llm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 55.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.14

File hashes

Hashes for pandora_llm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 130cf9a0b50730c789f8bc65ed281494d8336e5f59f9ebd7e46b422322d0e886
MD5 ad5fa0b3c9de0725d3c84ca3e41baa83
BLAKE2b-256 b2b28866293b9b96832126844b6f7de6d1e4c372169cb7acde569f952cd0e8f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page