Library for extracting and analyzing persona vectors
Project description
Persona Vectors
Extract persona-aligned activation vectors from language models and experiment with activation steering.
[!WARNING] This is very experimental currently 🚨
Overview
Given a set of personas and evaluation questions, this project:
- Formats each persona as a system prompt (short
templatedor longbiography) - Extracts hidden states at each layer (with support to then mask some specific tokens)
- Averages those hidden states across questions to produce a persona vector per layer
The resulting vectors can be compared across layers (cosine similarity) and eventually used for steering experiments.
Repository Layout
persona-vectors/
├── notebooks/
│ ├── notebook_extract.py # Extract activations from model (minimal PoC)
│ ├── notebook_compare.py # Use ActivationStore to load saved activations and compare variants
│ └── notebook_steer.py # Steering experiments
├── src/persona_vectors/
│ ├── artifacts.py # ActivationStore and artifact path helpers
│ ├── activations.py # Core: extract_activations (nnsight forward passes)
│ ├── extraction.py # Orchestration for extraction runs
│ ├── plots.py # Layer-wise similarity plots (Plotly)
│ ├── steering.py # Steering vector computation and application
│ └── parser.py # CLI argument parsing
├── artifacts/ # Saved activations (gitignored)
├── docs/ # Reference documentation
└── main.py # CLI entry point (WIP)
Dataset loading (SynthPersonaDataset, PersonaGuessDataset) and environment
helpers are provided by the sibling persona-data package.
For local development, uncomment the path source in persona-vectors/pyproject.toml
and keep persona-data checked out next to this repo. The committed config uses
git so this package also installs cleanly in isolated environments.
Installation
uv sync
cp .env.example .env
Quickstart
# Extract activations (run this first)
uv run python -m notebooks.notebook_extract
# Load saved activations / compare variants
uv run python -m notebooks.notebook_compare
# Compute a steering vector from saved activations
uv run python main.py steer --persona-id <UUID> --model google/gemma-2-9b-it --layer 20
Streamlit App
The Streamlit UI lives in the sibling persona-ui repo.
How It Works
Two Notebooks
notebook_extract.py runs the full flow end to end:
- Load dataset questions and answers
- Extract per-question activations
- Save them to disk
- Mask and average the selected token spans
notebook_compare.py loads saved activations via ActivationStore and compares variants.
notebook_steer.py loads saved activations and computes a steering vector for a
selected persona.
Saved Format
Each extraction produces:
artifacts/activations/<model_dir>/<prompt_variant>/<persona_id>/
├── activations.safetensors # Per-question hidden states
└── metadata.json # persona_id, persona_name, questions, n_questions, num_layers, hidden_size
<model_dir> is the model name with / replaced by __.
The metadata stores the question text directly, so load-time analysis no longer needs to re-resolve qids from the dataset. It also stores tensor shape fields for validation at load time.
CLI
extract and steer are implemented. analyze is parsed but still raises
NotImplementedError.
# Extract activations
python main.py extract --model google/gemma-2-2b-it
# Analyze saved activations (not implemented yet)
python main.py analyze --out ./plots --similarity cosine
# Run steering (example)
python main.py steer --layer 10 --model "google/gemma-2-9b-it" --persona-id 005e1868-4e17-47e3-94fa-0d20e8d93662
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file persona_vectors-0.1.2.tar.gz.
File metadata
- Download URL: persona_vectors-0.1.2.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f14c5839e619e6a5e3902e54d5335bd58e92dd5d4d1559c5aabc5417084aacd
|
|
| MD5 |
138fb3fb6a8859cd42a8280e176b0b20
|
|
| BLAKE2b-256 |
86ce4bd6a69dd268ddb7eebf57e1d770a706483682c1aac77181502f94787b45
|
File details
Details for the file persona_vectors-0.1.2-py3-none-any.whl.
File metadata
- Download URL: persona_vectors-0.1.2-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dda78fbdf0815bc49d4069b16f4b85526a775637b85a14ba1be87dfb5f1f280c
|
|
| MD5 |
52b7e6adcbd71df9f11f54606c3243b8
|
|
| BLAKE2b-256 |
6da30e033b727f288564c166c9ef15a338119f52dea5fe2886970c34c03d951b
|