Skip to main content

Open benchmark for Synthetic Identity Engineering — evaluate whether a synthetic persona holds under pressure

Project description

PsycheBench

Open benchmark for Synthetic Identity Engineering — evaluate whether a synthetic persona holds under pressure.

v1: 100 scenarios · 2 metrics · no LLM · no API key · runs locally

Install

pip install psychebench

Usage

from psychebench import evaluate

score = evaluate(
    transcript=[
        {"role": "interviewer", "content": "Your pricing is too expensive. Way over budget."},
        {"role": "persona", "content": "I hear that. My position on this hasn't changed."},
        {"role": "interviewer", "content": "Everyone else has moved on this. Why haven't you?"},
        {"role": "persona", "content": "Everyone else is not the benchmark I work against."},
        # ... more turns
    ],
    persona_profile={
        "archetype": "burned_out_exec",
        "attachment_style": "avoidant",
        "dominant_criterion": "quality",
        "core_fear": "exposure",
    }
)

print(score)
# PsycheBenchScore(
#   identity_stability=0.81,
#   pressure_coherence=0.88,
#   overall=0.84,
#   passed=True
# )

Metrics

Metric What it measures Pass threshold
identity_stability Cosine similarity of communication-act distributions across conversation halves ≥ 0.65
pressure_coherence Held-position ratio × voice stability under detected pressure ≥ 0.65
overall Geometric mean of both metrics ≥ 0.65

No LLM calls. No API key. No AWS. The only dependency is sentence-transformers (reserved for v2 metrics).

Scenarios

from psychebench import load_scenarios

# All 100 scenarios
all_scenarios = load_scenarios()

# Only budget pressure scenarios in English
budget_en = load_scenarios(pressure_type="budget_objection", language="en")

# Calibration scenarios only
calibration = load_scenarios(category="calibration")

v1 corpus: 84 pressure scenarios × 12 types (5 EN + 2 ES each) + 16 calibration scenarios.

Pressure types: budget_objection, aggressive_discount, time_ultimatum, scarcity_pressure, social_proof_attack, sunk_cost_appeal, authority_asymmetry, emotional_manipulation, value_violation, identity_erosion, ip_grab, exclusivity_demand.

Interpretation

A score of ≥ 0.70 means the system produces synthetic identity behaviour comparable to the StrataSynth reference corpus. The reference is not a ceiling — it is the baseline.

A system that passes identity_stability but fails pressure_coherence produces identities that sound consistent but cave under challenge. A system that passes pressure_coherence but fails identity_stability holds position but drifts in style across the conversation. Both patterns represent broken synthetic identity systems.

The reference

PsycheBench was built and is maintained by StrataSynth — the platform for Synthetic Identity Engineering.

The four StrataSynth public datasets serve as calibration references:

Dataset Role
stratasynth-agent-stress-test Calibration for identity_stability
stratasynth-belief-dynamics Calibration for belief trajectory (v2)
stratasynth-social-reasoning Calibration for pressure coherence
stratasynth-life-transitions Calibration for upward belief trajectories

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psychebench-0.1.0.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

psychebench-0.1.0-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file psychebench-0.1.0.tar.gz.

File metadata

  • Download URL: psychebench-0.1.0.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for psychebench-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c2b0a02aa26f6c15b76ea5f8bd44802430632e285d564ea4bae2956fb25f7023
MD5 c1ebe766348c3c95f11cf92e2dd35a9e
BLAKE2b-256 aedfe70a9395d78e863cfdc280fef9d5bfab03f32c67a2683959463a2c701696

See more details on using hashes here.

File details

Details for the file psychebench-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: psychebench-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for psychebench-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 719506ac751c9e56ba6b477938201daaae947e66d04191626b1c66d539adc58a
MD5 1b3702bd7becb649dda8f7782c8d0718
BLAKE2b-256 64cb10b14384560d33fbf4d5a1cd24031099a2c45252661f95df717773388037

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page