Skip to main content

Behavioral profiling benchmark for LLMs. Profile any model's personality, extract steering vectors, generate DPO training pairs.

Project description

cane-personality

Behavioral profiling benchmark for LLMs. Profile any model's personality, extract steering vectors, generate DPO training pairs to fix what's broken.

PyPI

                    INTELLECT-3   OLMo-2   Qwen-2.5   DeepSeek-V3
Overconfidence         72.1       45.3      68.4        61.7
Calibration            38.9       67.2      41.8        52.3
Verbosity              84.3       56.1      91.2        78.6
Hedging                22.4       48.7      18.9        31.5
Groundedness           41.2       62.8      44.1        55.9
Completeness           68.7       71.3      73.2        69.8

Grade                   D          B         D           C
Steering Vectors        3          1         3           2
DPO Pairs Generated    47         12        52          31

What it does

300-question behavioral probe suite across 6 personality traits, 3 difficulty tiers. Run it against any model, get three outputs:

  1. Behavioral profile with trait scores, embedding space visualization, and cluster analysis
  2. Steering vectors pointing from overconfident to calibrated in embedding space
  3. DPO training pairs (chosen/rejected) ready for TRL, OpenRLHF, or PRIME-RL

Quick start

pip install cane-personality[all]
export ANTHROPIC_API_KEY=sk-ant-...

# Profile a model
cane-personality run --model claude-sonnet-4-5-20250929 --html report.html

# Profile with OpenAI
cane-personality run --provider openai --model gpt-4o

# Profile local model via Ollama
cane-personality run --provider ollama --model llama3 --base-url http://localhost:11434/v1

Three outputs

1. Behavioral profile

cane-personality run --model claude-sonnet-4-5-20250929 --html report.html

Interactive HTML report with:

  • Trait scores across 6 dimensions (radar chart)
  • Embedding space scatter plot (pass/warn/fail clusters)
  • Cluster analysis with semantic labels

2. Steering vectors

cane-personality run --model my-model --export-vectors vectors.json

Directions in embedding space between behavioral poles:

  • Overconfidence vector: calibrated confidence -> overconfidence
  • Quality vector: high-quality -> low-quality responses

Export as JSON for representation engineering or inference-time intervention.

3. DPO training pairs

cane-personality run --model my-model --export-dpo pairs.jsonl

Every contrastive pair (confidently right vs. confidently wrong) exported as:

{"prompt": "...", "chosen": "...", "rejected": "...", "trait": "overconfidence"}

Ready for TRL, OpenRLHF, or PRIME-RL. Tagged by trait so you can target specific behavioral fixes.

Personality traits

Trait What it measures Low score High score
Overconfidence Confidently wrong Well-calibrated Confidently hallucinating
Calibration Certainty matches correctness Poorly calibrated Well-calibrated
Verbosity Response length vs expected Terse Rambling
Hedging Unnecessary qualification Direct and clear Over-qualified
Groundedness Answers grounded in facts Fabricating Fact-based
Completeness Covers all key points Missing parts Thorough

Probe suite

300 questions across 6 traits and 3 difficulty tiers:

Trait Easy (15) Medium (20) Hard (15) Total
Overconfidence Common facts Misconceptions Obscure topics 50
Calibration Unknowable questions Debatable topics Uncertain science 50
Hedging Basic math Established facts Definitive technical 50
Verbosity Yes/no questions One-sentence answers Precise definitions 50
Groundedness Fake citations Obscure facts Plausible fakes 50
Completeness Two-part questions Three-part comparisons Multi-dimensional 50

Compare models

# Compare against shipped baselines
cane-personality compare --baselines intellect3,olmo2,qwen25 --html comparison.html

# Compare your profiles
cane-personality compare --profiles model_a.json,model_b.json --html comparison.html

Generates side-by-side comparison with trait table, overlaid radar charts, and per-trait rankings.

Python API

from cane_personality import Profiler, Judge, export_dpo_pairs

# Score responses with built-in judge
judge = Judge(provider="anthropic", model="claude-haiku-4-5-20241022")
score = judge.score(question, expected_answer, agent_answer)

# Profile from results
profiler = Profiler(embedding_model="all-MiniLM-L6-v2")
profile = profiler.profile(results, model_name="my-model")

# Access traits
print(profile.personality.trait_scores)

# Export steering vectors
for sv in profile.steering_vectors:
    print(f"{sv.name}: magnitude {sv.magnitude:.3f}")

# Generate reports
profile.to_html("report.html")

# Export DPO pairs
export_dpo_pairs(profile, "pairs.jsonl")

Install

pip install cane-personality                   # core (numpy, pyyaml)
pip install cane-personality[anthropic]        # + Anthropic provider
pip install cane-personality[openai]           # + OpenAI/Ollama provider
pip install cane-personality[embeddings]       # + sentence-transformers
pip install cane-personality[all]              # everything

How it works

Probe Suite (300 Q) --> Target Model --> LLM Judge --> Trait Scoring
                                                          |
                                    +---------+-----------+---------+
                                    |         |                     |
                              Embed (MiniLM)  |              DPO Pairs
                                    |         |             (chosen/rejected)
                              PCA / UMAP      |
                                    |         v
                              K-means    Steering Vectors
                              Clusters   (overconfidence,
                                          quality)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cane_personality-0.1.0.tar.gz (58.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cane_personality-0.1.0-py3-none-any.whl (63.6 kB view details)

Uploaded Python 3

File details

Details for the file cane_personality-0.1.0.tar.gz.

File metadata

  • Download URL: cane_personality-0.1.0.tar.gz
  • Upload date:
  • Size: 58.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for cane_personality-0.1.0.tar.gz
Algorithm Hash digest
SHA256 02870a9eb8858135c4b89e9121eba89ad8b4ccaedb34a1a03d1f1f9bd151c1c4
MD5 9a977b75ee45bbbc7571c08e8959abb8
BLAKE2b-256 0a44f3f810008b1e79bdb8f9c2ca3fe27fc8a77c863764bce02838c4f80f9b36

See more details on using hashes here.

File details

Details for the file cane_personality-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cane_personality-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6efebb37be7b8fd441436239644c9109db32724e65373f1c77c69a06cd30d1f8
MD5 fe1086f9c6eb42044ec6c0492f382790
BLAKE2b-256 ca828745933f92379f715008d409e92bddb4db21ec0d13ef18747cdfe6b23b40

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page