Library for extracting and analyzing persona vectors
Project description
Persona Vectors
Extract persona-aligned activation vectors from language models and analyze how persona prompts move hidden states.
[!WARNING] This is very experimental currently ๐จ
Overview
Given a set of personas and evaluation questions, this project:
- Formats each persona as a system prompt (short
templatedor longbiography) - Extracts hidden states at each layer with configurable token masking
- Averages masked hidden states across QA pairs and saves one persona-level vector per layer
The resulting vectors can be compared across layers (cosine similarity) and eventually used for steering experiments.
Repository Layout
persona-vectors/
โโโ notebooks/
โ โโโ notebook_extract.py # Extraction pipeline (primary working script)
โ โโโ notebook_compare.py # Load saved activations and compare variants
โ โโโ notebook_hf_compare.py # Load Hub activations and run persona PCA
โ โโโ notebook_steer.py # Steering experiments
โโโ src/persona_vectors/
โ โโโ activations.py # Core extraction helpers
โ โโโ analysis.py # PCA / UMAP projections and scatter plots
โ โโโ artifacts.py # Local and Hugging Face activation artifact stores
โ โโโ preview.py # Token-mask preview helpers for CLI/UI rendering
โ โโโ plots.py # Plotly figures for layer-wise analysis
โ โโโ steering.py # Steering vector computation and application
โ โโโ parser.py # CLI argument parsing
โโโ artifacts/ # Saved activations (gitignored)
โโโ docs/ # Reference documentation
โโโ main.py # CLI entry point
Dataset loading (SynthPersonaDataset) and environment helpers come from the
sibling persona-data package.
For local development, uncomment the path source in pyproject.toml and keep
persona-data checked out next to this repo.
Installation
uv sync
cp .env.example .env
Python >=3.12 is required.
Quickstart
# Extract activations (run this first)
uv run python -m notebooks.notebook_extract
# Load saved activations / compare variants
uv run python -m notebooks.notebook_compare
# Load an existing Hub dataset directly and run PCA/similarity views
uv run python -m notebooks.notebook_hf_compare
# Build interactive persona-vector PCA and similarity plots from saved activations
uv run python main.py analyze --model google/gemma-2-9b-it --variant biography --mask-strategy answer_mean
# Compute a steering vector from saved activations
uv run python main.py steer --persona-id <UUID> --model google/gemma-2-9b-it --layer 20
Streamlit App
The Streamlit UI lives in the sibling persona-ui repo.
How It Works
Notebooks
notebook_extract.py runs a small end-to-end extraction example:
- Load dataset questions and answers
- Build masks for the selected token spans
- Extract activations and average them across QA pairs
- Save the persona-level activation tensor to disk
notebook_compare.py uses ActivationStore to discover saved variants/personas,
then compares shared persona vectors across variants.
notebook_hf_compare.py uses HFActivationStore to load a published Hub
dataset directly, then runs PCA and similarity views over the selected variant.
notebook_steer.py loads saved activations and computes a steering vector for a
selected persona.
Saved Format
Each extraction produces:
artifacts/activations/<model_dir>/<mask_strategy>/<prompt_variant>/
โโโ manifest.json # tensor shape, persona names, sample ids
โโโ <persona_id>.safetensors
<model_dir> is the model name with / replaced by __.
The manifest stores compact sample ids (qa.qid) instead of full question text,
plus tensor shape fields used for validation. Each safetensors file contains a
single activations tensor with shape (num_layers, hidden_size).
CLI
extract, analyze, and steer are implemented.
# Extract activations
# Defaults to all supported variants: templated and biography.
python main.py extract --model google/gemma-2-2b-it
# Extract only the Assistant baseline
python main.py extract --model google/gemma-2-2b-it --persona-id baseline_assistant
# Re-run personas already present in the local manifest
python main.py extract --model google/gemma-2-2b-it --persona-id baseline_assistant --force
# Run remotely on NDIF. If the remote fast path OOMs, extraction automatically
# retries that persona/variant with layer-chunked traces.
python main.py extract --model google/gemma-2-9b-it --backend remote
# Analyze saved activations
python main.py analyze --model google/gemma-2-9b-it --variant biography --mask-strategy answer_mean --out ./plots
# Run steering (example)
python main.py steer --layer 10 --model "google/gemma-2-9b-it" --persona-id 005e1868-4e17-47e3-94fa-0d20e8d93662
Publishing to the Hugging Face Hub
Saved activations can be packaged as a Hugging Face dataset and pushed to the
Hub. Each (model, mask_strategy) pair is a dataset config, and each prompt
variant is a split. Each row is one persona with a
(num_layers, hidden_size) vector.
# One-time: huggingface-cli login (or set HF_TOKEN in .env)
uv run python scripts/push_to_hf.py \
--model google/gemma-2-9b-it \
--repo implicit-personalization/synth-persona-vectors
Adding more personas later: re-run extract (it skips personas already in the
local manifest unless --force is passed), then re-run the push script.
scripts/extraction.sh extracts baseline_assistant plus the first N
personas in one batch, then pushes to the Hub:
MODEL=google/gemma-2-9b-it N=100 BACKEND=remote VARIANT=templated scripts/extraction.sh
Loading an existing Hub dataset
from persona_vectors.artifacts import HFActivationStore
store = HFActivationStore(
"implicit-personalization/synth-persona-vectors",
"google/gemma-2-9b-it",
mask_strategy="answer_mean",
)
available_variants = store.available_variants(["biography", "templated"])
variant = available_variants[0]
vectors = store.load(variant, "<UUID>")
persona_ids = store.list_personas([variant])
HFActivationStore is read-only, but exposes the same core methods as the
local ActivationStore: load, available_variants, list_personas, and
persona_names.
Request variants in preference order when the published dataset does not have
every local prompt variant yet.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file persona_vectors-0.6.1.tar.gz.
File metadata
- Download URL: persona_vectors-0.6.1.tar.gz
- Upload date:
- Size: 24.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
552ac9a0d739a453c5d9eb612cb0d0d2820a1b53ce84f490295a84105a71f7cc
|
|
| MD5 |
5da3650e65fb156732342b70a52376f7
|
|
| BLAKE2b-256 |
69f36da35af90c8ea5333db1763ece04a3230353ac5a76c0dc8fea705a6e86cf
|
File details
Details for the file persona_vectors-0.6.1-py3-none-any.whl.
File metadata
- Download URL: persona_vectors-0.6.1-py3-none-any.whl
- Upload date:
- Size: 28.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
593977ad19c9f23df7d86e302fe4bcf49159425da67d83281a11858026c5e85e
|
|
| MD5 |
fbdee407dd1863e0a796ba64ed01ffe6
|
|
| BLAKE2b-256 |
866691df378258e2c0cbc7860652b07b5e65ee1949ba14be2efdb6c646a933f1
|