Skip to main content

Shared dataset loading and prompt formatting for implicit-personalization projects

Project description

persona-data

Docs PyPI

Shared dataset loading, prompt formatting, and environment utilities for the implicit-personalization projects.

What's in the box

  • SynthPersonaDataset — persona profiles plus QA pairs (docs)
  • PersonaGuessDataset — turn-based persona games (docs)
  • NemotronPersonasFranceDataset / NemotronPersonasUSADataset — NVIDIA persona-only datasets (docs)
  • Roleplay and multiple-choice prompt helpers (docs)
  • Environment helpers: set_seed, get_device, get_artifacts_dir

Installation

Add as a uv git source in your project's pyproject.toml:

[project]
dependencies = ["persona-data"]

[tool.uv.sources]
persona-data = { git = "ssh://git@github.com/implicit-personalization/persona-data.git" }

For local development alongside other repos:

[tool.uv.sources]
persona-data = { path = "../persona-data", editable = true }

Then uv sync.

Testing

uv run --with pytest pytest tests/test_datasets.py

The release workflow also runs tests/smoke_test.py against the built wheel and source distribution.

Package layout

src/persona_data/
├── synth_persona.py       # SynthPersonaDataset, PersonaDataset, PersonaData, QAPair, Statement
├── persona_guess.py       # PersonaGuessDataset, GameRecord, Turn
├── nemotron_personas.py   # NemotronPersonasFranceDataset, NemotronPersonasUSADataset
├── prompts.py             # format_prompt, format_mc_question, format_messages
└── environment.py         # set_seed, get_device, get_artifacts_dir

Quick start

from persona_data.synth_persona import SynthPersonaDataset
from persona_data.prompts import format_messages, format_prompt

dataset = SynthPersonaDataset()
persona = dataset[0]

system_prompt = format_prompt(persona, "biography")
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Where did you grow up?"},
]
full_prompt, response_start_idx = format_messages(
    messages, tokenizer, add_generation_prompt=True
)

# Leakage-aware train/test split: FRQs for train, shared MCQs for test.
train_qa, test_qa = dataset.train_test_split(persona.id)

See the docs for full APIs.

Used by

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persona_data-0.6.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

persona_data-0.6.0-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file persona_data-0.6.0.tar.gz.

File metadata

  • Download URL: persona_data-0.6.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_data-0.6.0.tar.gz
Algorithm Hash digest
SHA256 430a42eeb2c66d4108291e24df6ab337e2b4793341f54e86f114dab909e557a6
MD5 daf4a6295c4e88e60ef2bc10fbf06207
BLAKE2b-256 b993ed0be3acd15615005124001cafbd356d4d7c790234957e9f8ce82712b523

See more details on using hashes here.

File details

Details for the file persona_data-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: persona_data-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_data-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 056823704e85ea39ce6b06f7a8188695e5b84f0d5b26e90b6e27a5d6fbfc5e31
MD5 ae587ad36835aada1dfcfcd6762ad6e4
BLAKE2b-256 1dfacf6419a454e38366d345b7337b335364c2f44a63e916be9a27b34e48fee8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page