Skip to main content

Red-teaming large language models for train data leakage

Project description

Documentation Python version Code license GitHub release

drawing

Overview

pandora_llm is a red-teaming library against Large Language Models (LLMs) that assesses their vulnerability to train data leakage.

It provides a unified PyTorch API for evaluating membership inference attacks (MIAs).

Please refer to the documentation for the API reference as well as tutorials on how to use this codebase.

pandora_llm abides by the following core principles:

  • Open Access — Ensuring that these tools are open-source for all.
  • Reproducible — Committing to providing all necessary code details to ensure replicability.
  • Self-Contained — Designing attacks that are self-contained, making it transparent to understand the workings of the method without having to peer through the entire codebase or unnecessary levels of abstraction, and making it easy to contribute new code.
  • Model-Agnostic — Supporting any HuggingFace model and dataset, making it easy to apply to any situation.
  • Usability — Prioritizing easy-to-use starter scripts and comprehensive documentation so anyone can effectively use pandora_llm regardless of prior background.

We hope that our package serves to guide LLM providers to safety-check their models before release, and to empower the public to hold them accountable to their use of data.

Installation

From pip:

pip install pandora-llm

From source:

git clone https://github.com/safr-ai-lab/pandora-llm.git
pip install -e .

Quickstart

We maintain a collection of starter scripts in our codebase under experiments/. If you are creating a new attack, we recommend making a copy of a starter script for a solid template.

python experiments/mia/run_loss.py --model_name EleutherAI/pythia-70m-deduped --model_revision step98000 --num_samples 2000 --pack --seed 229
bash scripts/run_mia_baselines_olmo.sh
bash scripts/run_mia_baselines_pile.sh

Contributing

We welcome contributions! Please submit pull requests in our GitHub.

Authors

This library was created by Jeffrey G. Wang, Jason Wang, Marvin Li, and Seth Neel.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandora_llm-0.0.0.tar.gz (64.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandora_llm-0.0.0-py3-none-any.whl (82.5 kB view details)

Uploaded Python 3

File details

Details for the file pandora_llm-0.0.0.tar.gz.

File metadata

  • Download URL: pandora_llm-0.0.0.tar.gz
  • Upload date:
  • Size: 64.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pandora_llm-0.0.0.tar.gz
Algorithm Hash digest
SHA256 27678fb3e3bde6a39bba6be40b9c2993e598f1287ce4332f78765b87c949fe3b
MD5 fce1a3ac92aea11c415c16712df8da08
BLAKE2b-256 6070fe436d76fca3712ed36b114dcdb530285e4d60d02859b17188dc81743bb8

See more details on using hashes here.

Provenance

The following attestation bundles were made for pandora_llm-0.0.0.tar.gz:

Publisher: python-publish.yml on safr-ai-lab/pandora_llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pandora_llm-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: pandora_llm-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 82.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pandora_llm-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e0230d498021528f8599dc23b949926c8a633eddee96d12152d0680e14f14810
MD5 df0b1f21e6c6608bdc1b749e3394c858
BLAKE2b-256 ae42d7972d9fd7327f39b9fa4bcd3c3d606bee36ce8df8c441468610aac83c02

See more details on using hashes here.

Provenance

The following attestation bundles were made for pandora_llm-0.0.0-py3-none-any.whl:

Publisher: python-publish.yml on safr-ai-lab/pandora_llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page