Skip to main content

Shared dataset loading and prompt formatting for implicit-personalization projects

Project description

persona-data

Docs PyPI

Shared dataset loading, prompt formatting, and environment utilities for the implicit-personalization projects.

What's in the box

  • SynthPersonaDataset — persona profiles plus QA pairs (docs)
  • PersonaGuessDataset — turn-based persona games (docs)
  • NemotronPersonasFranceDataset / NemotronPersonasUSADataset — NVIDIA persona-only datasets (docs)
  • Roleplay and multiple-choice prompt helpers (docs)
  • Environment helpers: set_seed, get_device, get_artifacts_dir

Installation

Add as a uv git source in your project's pyproject.toml:

[project]
dependencies = ["persona-data"]

[tool.uv.sources]
persona-data = { git = "ssh://git@github.com/implicit-personalization/persona-data.git" }

For local development alongside other repos:

[tool.uv.sources]
persona-data = { path = "../persona-data", editable = true }

Then uv sync.

Testing

uv run --with pytest pytest tests/test_datasets.py

The release workflow also runs tests/smoke_test.py against the built wheel and source distribution.

Package layout

src/persona_data/
├── synth_persona.py       # SynthPersonaDataset, PersonaDataset, PersonaData, QAPair, Statement
├── persona_guess.py       # PersonaGuessDataset, GameRecord, Turn
├── nemotron_personas.py   # NemotronPersonasFranceDataset, NemotronPersonasUSADataset
├── prompts.py             # format_prompt, format_mc_question, format_messages
└── environment.py         # set_seed, get_device, get_artifacts_dir

Quick start

from persona_data.synth_persona import SynthPersonaDataset
from persona_data.prompts import format_messages, format_prompt

dataset = SynthPersonaDataset()
persona = dataset[0]

system_prompt = format_prompt(persona, "biography")
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Where did you grow up?"},
]
full_prompt, response_start_idx = format_messages(
    messages, tokenizer, add_generation_prompt=True
)

# Leakage-aware train/test split: FRQs for train, shared MCQs for test.
train_qa, test_qa = dataset.train_test_split(persona.id)

See the docs for full APIs.

Used by

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persona_data-0.5.2.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

persona_data-0.5.2-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file persona_data-0.5.2.tar.gz.

File metadata

  • Download URL: persona_data-0.5.2.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_data-0.5.2.tar.gz
Algorithm Hash digest
SHA256 7d8249d572a26b917d219eca3da0210160191b7dca935d609f4b94d5100fd6f1
MD5 48d527d8e775988c3667a15f1a6ff24f
BLAKE2b-256 02e3d8f93f3d9a2532aa4ee12e7cd59b597ef48ede3c9ec82143912ef7c324b6

See more details on using hashes here.

File details

Details for the file persona_data-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: persona_data-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for persona_data-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 49280849ce3368e3d4ff638f9753bcdd8c9f158f7e5a036d4d058227c0a4829c
MD5 7b5af351e5ac5851010e405b507de00b
BLAKE2b-256 f7a8f006940f642152133e32f82493da74c51dcd7622a718c62cb437115afa8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page