Library for extracting and analyzing persona vectors

Project description

Persona Vectors

Extract persona-aligned activation vectors from language models and analyze how persona prompts move hidden states.

[!WARNING] This is very experimental currently 🚨

Overview

Given a set of personas and evaluation questions, this project:

Formats each persona as a system prompt (short templated or long biography)
Extracts hidden states at each layer with configurable token masking
Averages masked hidden states across QA pairs and saves one persona-level vector per layer

The resulting vectors can be compared across layers (cosine similarity) and eventually used for steering experiments.

Repository Layout

persona-vectors/
├── notebooks/
│   ├── notebook_extract.py      # Extraction pipeline (primary working script)
│   ├── notebook_compare.py      # Load saved activations and compare variants
│   └── notebook_steer.py        # Steering experiments
├── src/persona_vectors/
│   ├── activations.py           # Core extraction helpers
│   ├── analysis.py              # PCA / UMAP projections and scatter plots
│   ├── artifacts.py             # Save/load/query activation artifact helpers
│   ├── plots.py                 # Plotly figures for layer-wise analysis
│   ├── steering.py              # Steering vector computation and application
│   └── parser.py                # CLI argument parsing
├── artifacts/                   # Saved activations (gitignored)
├── docs/                        # Reference documentation
└── main.py                      # CLI entry point

Dataset loading (SynthPersonaDataset) and environment helpers come from the sibling persona-data package.

For local development, uncomment the path source in pyproject.toml and keep persona-data checked out next to this repo.

Installation

uv sync
cp .env.example .env

Python >=3.12 is required.

Quickstart

# Extract activations (run this first)
uv run python -m notebooks.notebook_extract

# Load saved activations / compare variants
uv run python -m notebooks.notebook_compare

# Build interactive persona-mean PCA and similarity plots from saved activations
uv run python main.py analyze --model google/gemma-2-9b-it --variant biography --mask-strategy answer_mean

# Compute a steering vector from saved activations
uv run python main.py steer --persona-id <UUID> --model google/gemma-2-9b-it --layer 20

Streamlit App

The Streamlit UI lives in the sibling persona-ui repo.

How It Works

Notebooks

notebook_extract.py runs a small end-to-end extraction example:

Load dataset questions and answers
Build masks for the selected token spans
Extract activations and average them across QA pairs
Save the persona-level activation tensor to disk

notebook_compare.py uses ActivationStore to discover saved variants/personas, then compares shared persona means across variants.

notebook_steer.py loads saved activations and computes a steering vector for a selected persona.

Saved Format

Each extraction produces:

artifacts/activations/<model_dir>/<mask_strategy>/<prompt_variant>/
├── manifest.json             # tensor shape, persona names, sample ids
└── <persona_id>.safetensors

<model_dir> is the model name with / replaced by __.

The manifest stores compact sample ids (qa.qid) instead of full question text, plus tensor shape fields used for validation. Each safetensors file contains a single activations tensor with shape (num_layers, hidden_size).

CLI

extract, analyze, and steer are implemented.

# Extract activations
# Defaults to all supported variants: templated and biography.
python main.py extract --model google/gemma-2-2b-it

# Extract only the Assistant baseline
python main.py extract --model google/gemma-2-2b-it --persona-id baseline_assistant

# Pick specific variants
python main.py extract --model google/gemma-2-2b-it --variants biography

# Re-run personas already present in the local manifest
python main.py extract --model google/gemma-2-2b-it --persona-id baseline_assistant --force

# Run remotely on NDIF. If the remote fast path OOMs, extraction automatically
# retries that persona/variant with layer-chunked traces.
python main.py extract --model google/gemma-2-9b-it --backend remote

# Analyze saved activations
python main.py analyze --model google/gemma-2-9b-it --variant biography --mask-strategy answer_mean --out ./plots

# Run steering (example)
python main.py steer --layer 10 --model "google/gemma-2-9b-it" --persona-id 005e1868-4e17-47e3-94fa-0d20e8d93662

# Load steering activations extracted with a non-default mask strategy
python main.py steer --layer 10 --model "google/gemma-2-9b-it" --persona-id <UUID> --mask-strategy answer_previous

Publishing to the Hugging Face Hub

Saved activations can be packaged as a Hugging Face dataset and pushed to the Hub. One config per (model, mask_strategy) pair, with templated / biography as splits. Each row is one persona with a (num_layers, hidden_size) vector.

# One-time: huggingface-cli login (or set HF_TOKEN in .env)
uv run python scripts/push_to_hf.py \
    --model google/gemma-2-9b-it \
    --repo implicit-personalization/synth-persona-vectors

Adding more personas later: re-run extract (skips personas already in the local manifest; pass --force to re-run them), then re-run the push script.

scripts/extraction.sh extracts baseline_assistant plus the first N personas, then pushes to the Hub:

MODEL=google/gemma-2-9b-it N=100 BACKEND=remote VARIANT=templated scripts/extraction.sh

Loading the dataset elsewhere:

from datasets import load_dataset

ds = load_dataset("implicit-personalization/synth-persona-vectors", "google__gemma-2-9b-it__answer_mean", split="biography",)
row = ds.filter(lambda r: r["persona_id"] == "<UUID>")[0]
# row["vector"] is a (num_layers, hidden_size) list[list[float]]

Project details

Release history Release notifications | RSS feed

0.6.1

May 7, 2026

This version

0.5.3

May 7, 2026

0.5.2

May 7, 2026

0.5.1

May 7, 2026

0.5.0

May 6, 2026

0.4.4

May 5, 2026

0.4.3

May 1, 2026

0.4.2

Apr 29, 2026

0.4.1

Apr 29, 2026

0.4.0

Apr 29, 2026

0.3.2

Apr 20, 2026

0.3.0

Apr 20, 2026

0.2.1

Apr 13, 2026

0.2.0

Apr 13, 2026

0.1.2

Apr 9, 2026

0.1.1

Apr 9, 2026

0.1.0

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persona_vectors-0.5.3.tar.gz (22.8 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

persona_vectors-0.5.3-py3-none-any.whl (26.8 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file persona_vectors-0.5.3.tar.gz.

File metadata

Download URL: persona_vectors-0.5.3.tar.gz
Upload date: May 7, 2026
Size: 22.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_vectors-0.5.3.tar.gz
Algorithm	Hash digest
SHA256	`d8bcb088a1814702401d22e21c39662ab840a4fb4b4f57dfd79999c9debfc1b8`
MD5	`4ffd5ba3df7fcf5a708515a47b0822c0`
BLAKE2b-256	`c75320c77e298eb864ab917d58312b679007b936af64ede5fbb72d409268d62e`

See more details on using hashes here.

File details

Details for the file persona_vectors-0.5.3-py3-none-any.whl.

File metadata

Download URL: persona_vectors-0.5.3-py3-none-any.whl
Upload date: May 7, 2026
Size: 26.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_vectors-0.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e44af7a6846d6d9249da12707dfae57f89d2a50a4ced05cdb4a844d39a9f03e8`
MD5	`68d46640ee50d848882a18ad6eec9517`
BLAKE2b-256	`3226a0197928e5202403094883331ff87799b54803bd3fd749b7d9c11f7332b3`

See more details on using hashes here.

persona-vectors 0.5.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Persona Vectors

Overview

Repository Layout

Installation

Quickstart

Streamlit App

How It Works

Notebooks

Saved Format

CLI

Publishing to the Hugging Face Hub

Loading the dataset elsewhere:

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes