Library for extracting and analyzing persona vectors

Project description

Persona Vectors

Extract persona-aligned activation vectors from language models and experiment with activation steering.

[!WARNING] This is very experimental currently 🚨

Overview

Given a set of personas and evaluation questions, this project:

Formats each persona as a system prompt (short templated or long biography)
Extracts hidden states at each layer (with support to then mask some specific tokens)
Averages those hidden states across questions to produce a persona vector per layer

The resulting vectors can be compared across layers (cosine similarity) and eventually used for steering experiments.

Repository Layout

persona-vectors/
├── notebooks/
│   ├── notebook_extract.py      # Extract activations from model (minimal PoC)
│   ├── notebook_compare.py      # Use ActivationStore to load saved activations and compare variants
│   └── notebook_steer.py        # Steering experiments
├── src/persona_vectors/
│   ├── artifacts.py             # ActivationStore and artifact path helpers
│   ├── activations.py           # Core: extract_activations (nnsight forward passes)
│   ├── extraction.py            # Orchestration for extraction runs
│   ├── plots.py                 # Layer-wise similarity plots (Plotly)
│   ├── steering.py              # Steering vector computation and application
│   └── parser.py                # CLI argument parsing
├── artifacts/                   # Saved activations (gitignored)
├── docs/                        # Reference documentation
└── main.py                      # CLI entry point (WIP)

Dataset loading (SynthPersonaDataset, PersonaGuessDataset) and environment helpers are provided by the sibling persona-data package.

For local development, uncomment the path source in persona-vectors/pyproject.toml and keep persona-data checked out next to this repo. The committed config uses git so this package also installs cleanly in isolated environments.

Installation

uv sync
cp .env.example .env

Quickstart

# Extract activations (run this first)
uv run python -m notebooks.notebook_extract

# Load saved activations / compare variants
uv run python -m notebooks.notebook_compare

# Compute a steering vector from saved activations
uv run python main.py steer --persona-id <UUID> --model google/gemma-2-9b-it --layer 20

Streamlit App

The Streamlit UI lives in the sibling persona-ui repo.

How It Works

Two Notebooks

notebook_extract.py runs the full flow end to end:

Load dataset questions and answers
Extract per-question activations
Save them to disk
Mask and average the selected token spans

notebook_compare.py loads saved activations via ActivationStore and compares variants.

notebook_steer.py loads saved activations and computes a steering vector for a selected persona.

Saved Format

Each extraction produces:

artifacts/activations/<model_dir>/<prompt_variant>/<persona_id>/
├── activations.safetensors   # Per-question hidden states
└── metadata.json            # persona_id, persona_name, questions, n_questions, num_layers, hidden_size

<model_dir> is the model name with / replaced by __.

The metadata stores the question text directly, so load-time analysis no longer needs to re-resolve qids from the dataset. It also stores tensor shape fields for validation at load time.

CLI

extract and steer are implemented. analyze is parsed but still raises NotImplementedError.

# Extract activations
python main.py extract --model google/gemma-2-2b-it

# Analyze saved activations (not implemented yet)
python main.py analyze --out ./plots --similarity cosine

# Run steering (example)
python main.py steer --layer 10 --model "google/gemma-2-9b-it" --persona-id 005e1868-4e17-47e3-94fa-0d20e8d93662

Project details

Release history Release notifications | RSS feed

0.5.0

May 6, 2026

0.4.4

May 5, 2026

0.4.3

May 1, 2026

0.4.2

Apr 29, 2026

0.4.1

Apr 29, 2026

0.4.0

Apr 29, 2026

0.3.2

Apr 20, 2026

0.3.0

Apr 20, 2026

0.2.1

Apr 13, 2026

0.2.0

Apr 13, 2026

This version

0.1.2

Apr 9, 2026

0.1.1

Apr 9, 2026

0.1.0

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persona_vectors-0.1.2.tar.gz (10.9 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

persona_vectors-0.1.2-py3-none-any.whl (14.5 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file persona_vectors-0.1.2.tar.gz.

File metadata

Download URL: persona_vectors-0.1.2.tar.gz
Upload date: Apr 9, 2026
Size: 10.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_vectors-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`8f14c5839e619e6a5e3902e54d5335bd58e92dd5d4d1559c5aabc5417084aacd`
MD5	`138fb3fb6a8859cd42a8280e176b0b20`
BLAKE2b-256	`86ce4bd6a69dd268ddb7eebf57e1d770a706483682c1aac77181502f94787b45`

See more details on using hashes here.

File details

Details for the file persona_vectors-0.1.2-py3-none-any.whl.

File metadata

Download URL: persona_vectors-0.1.2-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 14.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_vectors-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dda78fbdf0815bc49d4069b16f4b85526a775637b85a14ba1be87dfb5f1f280c`
MD5	`52b7e6adcbd71df9f11f54606c3243b8`
BLAKE2b-256	`6da30e033b727f288564c166c9ef15a338119f52dea5fe2886970c34c03d951b`

See more details on using hashes here.

persona-vectors 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Persona Vectors

Overview

Repository Layout

Installation

Quickstart

Streamlit App

How It Works

Two Notebooks

Saved Format

CLI

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes