Skip to main content

Shared dataset loading and prompt formatting for implicit-personalization projects

Project description

persona-data

Docs

Shared dataset loading, prompt formatting, and environment utilities for the implicit-personalization projects.

Overview

persona-data provides the common dataset and prompt helpers used across the persona projects:

  • SynthPersonaDataset for persona profiles plus QA pairs
  • PersonaGuessDataset for turn-based persona games
  • prompt helpers for roleplay and multiple-choice evaluation
  • environment helpers for seeds, devices, and artifact paths

Installation

Add as a uv git source in your project's pyproject.toml:

[project]
dependencies = ["persona-data"]

[tool.uv.sources]
persona-data = { git = "ssh://git@github.com/implicit-personalization/persona-data.git" }

Then run uv sync.

For local development alongside other repos, use an editable path source:

[tool.uv.sources]
persona-data = { path = "../persona-data", editable = true }

Package layout

src/persona_data/
├── __init__.py
├── synth_persona.py       # SynthPersonaDataset, PersonaDataset, PersonaData, QAPair, BiographySection
├── persona_guess.py       # PersonaGuessDataset, GameRecord, Turn
├── prompts.py             # format_roleplay_prompt, format_mc_question, format_messages
└── environment.py         # load_env, set_seed, get_device, get_artifacts_dir

Datasets

Each dataset is a module with its own types and a loader that downloads from Hugging Face, cached via HF_HOME.

SynthPersona

from persona_data.synth_persona import SynthPersonaDataset

dataset = SynthPersonaDataset()

persona = dataset[0]
persona.name              # "Ethan Robinson"
persona.templated_view    # short attribute-based system prompt
persona.biography_view    # full biography text
persona.sections          # list of BiographySection

qa_pairs = dataset.get_qa(persona.id, type="implicit", difficulty=[1, 2])
questions = dataset.questions(persona.id, type="explicit")

PersonaGuess

from persona_data.persona_guess import PersonaGuessDataset

games = PersonaGuessDataset()
game = games[0]
turns = games.get_qa(game.game_id, player="A")
questions = games.questions(game.game_id, player="B")

Prompt formatting

from persona_data.prompts import format_messages, format_roleplay_prompt

system_prompt = format_roleplay_prompt(persona.biography_view)

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Where did you grow up?"},
    {"role": "assistant", "content": "I grew up in Little Rock, Arkansas."},
]
full_prompt, response_start_idx = format_messages(messages, tokenizer)

format_roleplay_prompt supports mode="roleplay" (default), mode="conversational", and mode="mc".

format_messages handles tokenizers that do not support the "system" role (for example Gemma 2) by merging system content into the first user message.

For multiple-choice evaluation, use format_mc_question(qa) and mc_correct_letter(qa).

Environment helpers

from persona_data.environment import load_env, set_seed, get_device, get_artifacts_dir

load_env()            # loads .env from cwd (searches parent dirs)
set_seed(1337)        # sets random, numpy, and torch seeds
device = get_device() # cuda > mps > cpu

Used by

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persona_data-0.1.0.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

persona_data-0.1.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file persona_data-0.1.0.tar.gz.

File metadata

  • Download URL: persona_data-0.1.0.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_data-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4a3983352eb88300306ba085ed26aa16defef40890d2265809d4483130a41ca6
MD5 d842192ea2576f5eca7b6205926dc87d
BLAKE2b-256 e461eaa9242c55a424a241f123df71940ea9559e8ca908cc4e8012ff3d523cb6

See more details on using hashes here.

File details

Details for the file persona_data-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: persona_data-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_data-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e03a40dc92388f56b332b0bf64d798eb4380e555753df017b57acb520483be0
MD5 8f3e09dddc7243d1db9627e9823b5bdb
BLAKE2b-256 0216351ffb9e72d1b5720c661327beda1e74203683232ed2f1a19c01b296a4ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page