Skip to main content

Library for extracting and analyzing persona vectors

Project description

Persona Vectors

Docs PyPI

Extract persona-aligned activation vectors from language models and analyze how persona prompts move hidden states.

[!WARNING] This is very experimental currently ๐Ÿšจ

Overview

Given a set of personas and evaluation questions, this project:

  1. Formats each persona as a system prompt (short templated or long biography)
  2. Extracts hidden states at each layer with configurable token masking
  3. Averages masked hidden states across QA pairs and saves one persona-level vector per layer

The resulting vectors can be compared across layers (cosine similarity) and eventually used for steering experiments.

Repository Layout

persona-vectors/
โ”œโ”€โ”€ notebooks/
โ”‚   โ”œโ”€โ”€ notebook_extract.py      # Extraction pipeline (primary working script)
โ”‚   โ”œโ”€โ”€ notebook_compare.py      # Compare Hub or local activation artifacts
โ”‚   โ””โ”€โ”€ notebook_steer.py        # Steering experiments
โ”œโ”€โ”€ src/persona_vectors/
โ”‚   โ”œโ”€โ”€ activations.py           # Core extraction helpers
โ”‚   โ”œโ”€โ”€ analysis.py              # PCA / UMAP projections and scatter plots
โ”‚   โ”œโ”€โ”€ artifacts.py             # Local and Hugging Face activation artifact stores
โ”‚   โ”œโ”€โ”€ preview.py               # Token-mask preview helpers for CLI/UI rendering
โ”‚   โ”œโ”€โ”€ plots.py                 # Plotly figures for layer-wise analysis
โ”‚   โ”œโ”€โ”€ steering.py              # Steering vector computation and application
โ”‚   โ””โ”€โ”€ parser.py                # CLI argument parsing
โ”œโ”€โ”€ artifacts/                   # Saved activations (gitignored)
โ”œโ”€โ”€ docs/                        # Reference documentation
โ””โ”€โ”€ main.py                      # CLI entry point

Dataset loading (SynthPersonaDataset) and environment helpers come from the sibling persona-data package.

For local development, uncomment the path source in pyproject.toml and keep persona-data checked out next to this repo.

Installation

uv sync
cp .env.example .env

Python >=3.12 is required.

Quickstart

# Extract activations (run this first)
uv run python -m notebooks.notebook_extract

# Compare Hub artifacts, or local artifacts by uncommenting the local store
uv run python -m notebooks.notebook_compare

# Build interactive persona-vector PCA and similarity plots from saved activations
uv run python main.py analyze --model google/gemma-2-9b-it --variant biography --mask-strategy answer_mean

# Compute a steering vector from saved activations
uv run python main.py steer --persona-id <UUID> --model google/gemma-2-9b-it --layer 20

Streamlit App

The Streamlit UI lives in the sibling persona-ui repo.

How It Works

Notebooks

notebook_extract.py runs a small end-to-end extraction example:

  1. Load dataset questions and answers
  2. Build masks for the selected token spans
  3. Extract activations and average them across QA pairs
  4. Save the persona-level activation tensor to disk

notebook_compare.py uses HFActivationStore by default to load the published Hub dataset, compares shared persona vectors across variants, and runs PCA and similarity views. It includes commented lines for switching to local ActivationStore artifacts.

notebook_steer.py loads saved activations and computes a steering vector for a selected persona.

Saved Format

Each extraction produces:

artifacts/activations/<model_dir>/<mask_strategy>/<prompt_variant>/
โ”œโ”€โ”€ manifest.json             # tensor shape, persona names, sample ids
โ””โ”€โ”€ <persona_id>.safetensors

<model_dir> is the model name with / replaced by __.

The manifest stores compact sample ids (qa.qid) instead of full question text, plus tensor shape fields used for validation. Each safetensors file contains a single activations tensor with shape (num_layers, hidden_size).

CLI

extract, analyze, and steer are implemented.

# Extract activations
# Defaults to all supported variants: templated and biography.
python main.py extract --model google/gemma-2-2b-it

# Extract only the Assistant baseline
python main.py extract --model google/gemma-2-2b-it --persona-id baseline_assistant

# Re-run personas already present in the local manifest
python main.py extract --model google/gemma-2-2b-it --persona-id baseline_assistant --force

# Run remotely on NDIF. If the remote fast path OOMs, extraction automatically
# retries that persona/variant with layer-chunked traces.
python main.py extract --model google/gemma-2-9b-it --backend remote

# Analyze saved activations
python main.py analyze --model google/gemma-2-9b-it --variant biography --mask-strategy answer_mean --out ./plots

# Run steering (example)
python main.py steer --layer 10 --model "google/gemma-2-9b-it" --persona-id 005e1868-4e17-47e3-94fa-0d20e8d93662

Publishing to the Hugging Face Hub

Saved activations can be packaged as a Hugging Face dataset and pushed to the Hub. Each (model, mask_strategy) pair is a dataset config, and each prompt variant is a split. Each row is one persona with a (num_layers, hidden_size) vector.

# One-time: huggingface-cli login (or set HF_TOKEN in .env)
uv run python main.py push \
    --model google/gemma-2-9b-it \
    --repo implicit-personalization/synth-persona-vectors

Adding more personas later: re-run extract (it skips personas already in the local manifest unless --force is passed), then re-run main.py push. Python callers can use persona_vectors.hub.push_to_hub(...) directly.

scripts/extraction.sh extracts baseline_assistant plus the first N personas in one batch, then pushes to the Hub:

MODEL=google/gemma-2-9b-it N=100 BACKEND=remote VARIANT=templated scripts/extraction.sh

Loading an existing Hub dataset

from persona_vectors.artifacts import HFActivationStore

store = HFActivationStore(
    "implicit-personalization/synth-persona-vectors",
    "google/gemma-2-9b-it",
    mask_strategy="answer_mean",
)

available_variants = store.available_variants(["biography", "templated"])
variant = available_variants[0]
vectors = store.load(variant, "<UUID>")
persona_ids = store.list_personas([variant])

HFActivationStore is read-only, but exposes the same core methods as the local ActivationStore: load, available_variants, list_personas, and persona_names. Request variants in preference order when the published dataset does not have every local prompt variant yet.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persona_vectors-0.6.4.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

persona_vectors-0.6.4-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file persona_vectors-0.6.4.tar.gz.

File metadata

  • Download URL: persona_vectors-0.6.4.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_vectors-0.6.4.tar.gz
Algorithm Hash digest
SHA256 c8fc3e44210e30ec4d310b0f35cea8d709e899734cd0ff8b7d74c2024f7502f4
MD5 39594d17bc6213a845f7b493d72e6ffa
BLAKE2b-256 4d7286e352f285e7154e94ef2783d60af28b852db670d6de12bd869e8ca90d1e

See more details on using hashes here.

File details

Details for the file persona_vectors-0.6.4-py3-none-any.whl.

File metadata

  • Download URL: persona_vectors-0.6.4-py3-none-any.whl
  • Upload date:
  • Size: 29.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_vectors-0.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 493249e6cdcdab29d4850f1d57b3c5af9bbb7be3635289332ff41ffe4ee76c14
MD5 ea1b37305e1cc7d111da5cb16db1620b
BLAKE2b-256 4180fa8365b4e3fecc23e78291b9e066de109cd427804b8406e2a851b4e71ab4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page