Shared dataset loading and prompt formatting for implicit-personalization projects
Project description
persona-data
Shared dataset loading, prompt formatting, and environment utilities for the implicit-personalization projects.
What's in the box
SynthPersonaDataset— persona profiles plus QA pairs (docs)PersonaGuessDataset— turn-based persona games (docs)NemotronPersonasFranceDataset/NemotronPersonasUSADataset— NVIDIA persona-only datasets (docs)- Roleplay and multiple-choice prompt helpers (docs)
- Environment helpers:
set_seed,get_device,get_artifacts_dir
Installation
Add as a uv git source in your project's pyproject.toml:
[project]
dependencies = ["persona-data"]
[tool.uv.sources]
persona-data = { git = "ssh://git@github.com/implicit-personalization/persona-data.git" }
For local development alongside other repos:
[tool.uv.sources]
persona-data = { path = "../persona-data", editable = true }
Then uv sync.
Testing
uv run --with pytest pytest tests/test_datasets.py
The release workflow also runs tests/smoke_test.py against the built wheel and source distribution.
Package layout
src/persona_data/
├── synth_persona.py # SynthPersonaDataset, PersonaDataset, PersonaData, QAPair, Statement
├── persona_guess.py # PersonaGuessDataset, GameRecord, Turn
├── nemotron_personas.py # NemotronPersonasFranceDataset, NemotronPersonasUSADataset
├── prompts.py # format_prompt, format_mc_question, format_messages
└── environment.py # set_seed, get_device, get_artifacts_dir
Quick start
from persona_data.synth_persona import SynthPersonaDataset
from persona_data.prompts import format_messages, format_prompt
dataset = SynthPersonaDataset()
persona = dataset[0]
system_prompt = format_prompt(persona, "biography")
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Where did you grow up?"},
]
full_prompt, response_start_idx = format_messages(
messages, tokenizer, add_generation_prompt=True
)
# Leakage-aware train/test split: FRQs for train, shared MCQs for test.
train_qa, test_qa = dataset.train_test_split(persona.id)
See the docs for full APIs.
Used by
- persona-vectors — activation extraction and steering
- cues_attribution — section-level ablation attribution
- persona-2-lora — LoRA-based persona internalization
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file persona_data-0.5.2.tar.gz.
File metadata
- Download URL: persona_data-0.5.2.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d8249d572a26b917d219eca3da0210160191b7dca935d609f4b94d5100fd6f1
|
|
| MD5 |
48d527d8e775988c3667a15f1a6ff24f
|
|
| BLAKE2b-256 |
02e3d8f93f3d9a2532aa4ee12e7cd59b597ef48ede3c9ec82143912ef7c324b6
|
File details
Details for the file persona_data-0.5.2-py3-none-any.whl.
File metadata
- Download URL: persona_data-0.5.2-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49280849ce3368e3d4ff638f9753bcdd8c9f158f7e5a036d4d058227c0a4829c
|
|
| MD5 |
7b5af351e5ac5851010e405b507de00b
|
|
| BLAKE2b-256 |
f7a8f006940f642152133e32f82493da74c51dcd7622a718c62cb437115afa8e
|