Skip to main content

Library for extracting and analyzing persona vectors

Project description

Persona Vectors

Docs

Extract persona-aligned activation vectors from language models and experiment with activation steering.

[!WARNING] This is very experimental currently 🚨

Overview

Given a set of personas and evaluation questions, this project:

  1. Formats each persona as a system prompt (short templated or long biography)
  2. Extracts hidden states at each layer (with support to then mask some specific tokens)
  3. Averages those hidden states across questions to produce a persona vector per layer

The resulting vectors can be compared across layers (cosine similarity) and eventually used for steering experiments.

Repository Layout

persona-vectors/
├── notebooks/
│   ├── notebook_extract.py      # Extraction pipeline (primary working script)
│   ├── notebook_compare.py      # Load saved activations and compare variants
│   └── notebook_steer.py        # Steering experiments
├── src/persona_vectors/
│   ├── activations.py           # Core extraction helpers
│   ├── analysis.py              # PCA / UMAP projections and scatter plots
│   ├── artifacts.py             # Save/load/query activation artifact helpers
│   ├── plots.py                 # Layer-wise cosine similarity plots
│   ├── steering.py              # Steering vector computation and application
│   └── parser.py                # CLI argument parsing
├── artifacts/                   # Saved activations (gitignored)
├── docs/                        # Reference documentation
└── main.py                      # CLI entry point

Dataset loading (SynthPersonaDataset, PersonaGuessDataset) and environment helpers come from the sibling persona-data package.

For local development, uncomment the path source in pyproject.toml and keep persona-data checked out next to this repo.

Installation

uv sync
cp .env.example .env

Python >=3.12 is required.

Quickstart

# Extract activations (run this first)
uv run python -m notebooks.notebook_extract

# Load saved activations / compare variants
uv run python -m notebooks.notebook_compare

# Analyze saved activations (parsed, not implemented yet)
uv run python main.py analyze --out ./plots --similarity cosine

# Compute a steering vector from saved activations
uv run python main.py steer --persona-id <UUID> --model google/gemma-2-9b-it --layer 20

Streamlit App

The Streamlit UI lives in the sibling persona-ui repo.

How It Works

Two Notebooks

notebook_extract.py runs the full flow end to end:

  1. Load dataset questions and answers
  2. Extract per-question activations
  3. Save them to disk
  4. Mask and average the selected token spans

notebook_compare.py loads saved activations via ActivationStore and compares variants.

notebook_steer.py loads saved activations and computes a steering vector for a selected persona.

Saved Format

Each extraction produces:

artifacts/activations/<model_dir>/<prompt_variant>/<persona_id>/
├── activations.safetensors   # Per-question hidden states
└── metadata.json            # persona_id, persona_name, questions, n_questions, num_layers, hidden_size

<model_dir> is the model name with / replaced by __.

The metadata stores the question text directly, so load-time analysis no longer needs to re-resolve qids from the dataset. It also stores tensor shape fields for validation at load time.

CLI

extract and steer are implemented. analyze is parsed but still raises NotImplementedError.

# Extract activations
python main.py extract --model google/gemma-2-2b-it

# Analyze saved activations (not implemented yet)
python main.py analyze --out ./plots --similarity cosine

# Run steering (example)
python main.py steer --layer 10 --model "google/gemma-2-9b-it" --persona-id 005e1868-4e17-47e3-94fa-0d20e8d93662

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persona_vectors-0.3.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

persona_vectors-0.3.0-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file persona_vectors-0.3.0.tar.gz.

File metadata

  • Download URL: persona_vectors-0.3.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_vectors-0.3.0.tar.gz
Algorithm Hash digest
SHA256 95fc8bba9f356e4a793f77622366503f7c88e34e38fbe3d6c86e08fa4c5d7ad3
MD5 20f5e1f560d8b753c8b8ca8cbc02a00a
BLAKE2b-256 bb5cfc8902745013ef8565859f3cec86b92518ad82da0dca2348490fbc9ed639

See more details on using hashes here.

File details

Details for the file persona_vectors-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: persona_vectors-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_vectors-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea20f6fa35d5ebfdd2c0cc6db35cb92893e3355f94ad700589f0a4156d16aa6d
MD5 465b5b757d62a7f41d3e2975ce5bf2ee
BLAKE2b-256 8e0d8274beae02b7d5ec571c24a3e9b963d3fc3e87512f1b9245fe048908a5f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page