Red-teaming large language models for train data leakage
Project description
Pandora’s White-Box
Precise Training Data Detection and Extraction from Large Language Models
By Jeffrey G. Wang, Jason Wang, Marvin Li, and Seth Neel
Overview
pandora_llm
is a red-teaming library against Large Language Models (LLMs) that assesses their vulnerability to train data leakage.
It provides a unified PyTorch API for evaluating membership inference attacks (MIAs) and training data extraction.
You can read our paper and website for a technical introduction to the subject. Please refer to the documentation for the API reference as well as tutorials on how to use this codebase.
pandora_llm
abides by the following core principles:
- Open Access — Ensuring that these tools are open-source for all.
- Reproducible — Committing to providing all necessary code details to ensure replicability.
- Self-Contained — Designing attacks that are self-contained, making it transparent to understand the workings of the method without having to peer through the entire codebase or unnecessary levels of abstraction, and making it easy to contribute new code.
- Model-Agnostic — Supporting any HuggingFace model and dataset, making it easy to apply to any situation.
- Usability — Prioritizing easy-to-use starter scripts and comprehensive documentation so anyone can effectively use
pandora_llm
regardless of prior background.
We hope that our package serves to guide LLM providers to safety-check their models before release, and to empower the public to hold them accountable to their use of data.
Installation
From source:
git clone https://github.com/safr-ai-lab/pandora-llm.git
pip install -e .
From pip:
pip install pandora-llm
Quickstart
We maintain a collection of starter scripts in our codebase under experiments/
. If you are creating a new attack, we recommend making a copy of a starter script for a solid template.
python experiments/mia/run_loss.py --model_name EleutherAI/pythia-70m-deduped --model_revision step98000 --num_samples 2000 --pack --seed 229
You can reproduce the experiments described in our paper through the shell scripts provided in the scripts/
folder.
bash scripts/pretrain_mia_baselines.sh
Contributing
We welcome contributions! Please submit pull requests in our GitHub.
Citation
If you use our code or otherwise find this library useful, please cite our paper:
@article{wang2024pandora,
title={Pandora's White-Box: Increased Training Data Leakage in Open LLMs},
author={Wang, Jeffrey G and Wang, Jason and Li, Marvin and Neel, Seth},
journal={arXiv preprint arXiv:2402.17012},
year={2024}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pandora_llm-0.1.0.tar.gz
.
File metadata
- Download URL: pandora_llm-0.1.0.tar.gz
- Upload date:
- Size: 42.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99f8f8785d2f1295afeaad9970f7dbb2608edff597da05419428c3e753433e3a |
|
MD5 | 47ddcfc99b6d5a7c437fae209b253588 |
|
BLAKE2b-256 | 1f13a18fa3bfdcfe2ca2eb37d5aeb43f3a12da65b9ed99d551b2d41228ab0bea |
File details
Details for the file pandora_llm-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pandora_llm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 55.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 130cf9a0b50730c789f8bc65ed281494d8336e5f59f9ebd7e46b422322d0e886 |
|
MD5 | ad5fa0b3c9de0725d3c84ca3e41baa83 |
|
BLAKE2b-256 | b2b28866293b9b96832126844b6f7de6d1e4c372169cb7acde569f952cd0e8f0 |