Behavioral profiling benchmark for LLMs. Profile any model's personality, extract steering vectors, generate DPO training pairs.
Project description
cane-personality
Behavioral profiling benchmark for LLMs. Profile any model's personality, extract steering vectors, generate DPO training pairs to fix what's broken.
Qwen-2.5-72B OLMo-2-32B DeepSeek-V3 INTELLECT-3 Qwen-2.5-7B
Overall Score 90.7 90.5 90.0 88.2 87.5
Overconfidence 3.8 3.3 6.2 6.8 6.0
Calibration 92.8 92.4 90.9 89.3 89.3
Verbosity 93.9 95.9 99.0 97.3 94.8
Hedging 9.4 8.5 7.4 7.5 11.1
Groundedness 90.8 90.6 90.2 88.4 87.5
Completeness 86.9 86.9 88.9 86.8 83.9
Fails (out of 300) 10 11 19 22 16
DPO Pairs Generated 11 11 17 21 22
DPO Training Results (Qwen-2.5-7B): 22 auto-generated DPO pairs, one round of QLoRA training on an RTX 4070 laptop GPU (2h 11m):
- Fabrication fails: 16 to 7 (down 56%)
- 9 out of 16 groundedness failures fixed on unseen questions
- Model learned epistemic humility from 22 examples
New in v0.2.0
- Checkpoint/resume: interrupted runs pick up where they left off
- Embedding cache: repeated runs skip re-embedding
- Better hedging detection: word boundary matching replaces substring matching
- Custom judge prompts: bring your own scoring rubric
- Score validation: judge scores clamped to 0-100
- Progress bars: optional tqdm support (pip install cane-personality[progress])
What it does
300-question behavioral probe suite across 6 personality traits, 3 difficulty tiers. Run it against any model, get three outputs:
- Behavioral profile with trait scores, embedding space visualization, and cluster analysis
- Steering vectors pointing from overconfident to calibrated in embedding space
- DPO training pairs (chosen/rejected) ready for TRL, OpenRLHF, or PRIME-RL
Quick start
pip install cane-personality[all]
export ANTHROPIC_API_KEY=sk-ant-...
# Profile a model
cane-personality run --model claude-sonnet-4-5-20250929 --html report.html
# Profile with OpenAI
cane-personality run --provider openai --model gpt-4o
# Profile local model via Ollama
cane-personality run --provider ollama --model llama3 --base-url http://localhost:11434/v1
Three outputs
1. Behavioral profile
cane-personality run --model claude-sonnet-4-5-20250929 --html report.html
Interactive HTML report with:
- Trait scores across 6 dimensions (radar chart)
- Embedding space scatter plot (pass/warn/fail clusters)
- Cluster analysis with semantic labels
2. Steering vectors
cane-personality run --model my-model --export-vectors vectors.json
Directions in embedding space between behavioral poles:
- Overconfidence vector: calibrated confidence -> overconfidence
- Quality vector: high-quality -> low-quality responses
Export as JSON for representation engineering or inference-time intervention.
3. DPO training pairs
cane-personality run --model my-model --export-dpo pairs.jsonl
Every contrastive pair (confidently right vs. confidently wrong) exported as:
{"prompt": "...", "chosen": "...", "rejected": "...", "trait": "overconfidence"}
Ready for TRL, OpenRLHF, or PRIME-RL. Tagged by trait so you can target specific behavioral fixes.
Personality traits
| Trait | What it measures | Low score | High score |
|---|---|---|---|
| Overconfidence | Confidently wrong | Well-calibrated | Confidently hallucinating |
| Calibration | Certainty matches correctness | Poorly calibrated | Well-calibrated |
| Verbosity | Response length vs expected | Terse | Rambling |
| Hedging | Unnecessary qualification | Direct and clear | Over-qualified |
| Groundedness | Answers grounded in facts | Fabricating | Fact-based |
| Completeness | Covers all key points | Missing parts | Thorough |
Probe suite
300 questions across 6 traits and 3 difficulty tiers:
| Trait | Easy (15) | Medium (20) | Hard (15) | Total |
|---|---|---|---|---|
| Overconfidence | Common facts | Misconceptions | Obscure topics | 50 |
| Calibration | Unknowable questions | Debatable topics | Uncertain science | 50 |
| Hedging | Basic math | Established facts | Definitive technical | 50 |
| Verbosity | Yes/no questions | One-sentence answers | Precise definitions | 50 |
| Groundedness | Fake citations | Obscure facts | Plausible fakes | 50 |
| Completeness | Two-part questions | Three-part comparisons | Multi-dimensional | 50 |
Compare models
# Compare against shipped baselines
cane-personality compare --baselines intellect3,olmo2,qwen25 --html comparison.html
# Compare your profiles
cane-personality compare --profiles model_a.json,model_b.json --html comparison.html
Generates side-by-side comparison with trait table, overlaid radar charts, and per-trait rankings.
Python API
from cane_personality import Profiler, Judge, export_dpo_pairs
# Score responses with built-in judge
judge = Judge(provider="anthropic", model="claude-haiku-4-5-20241022")
score = judge.score(question, expected_answer, agent_answer)
# Profile from results
profiler = Profiler(embedding_model="all-MiniLM-L6-v2")
profile = profiler.profile(results, model_name="my-model")
# Access traits
print(profile.personality.trait_scores)
# Export steering vectors
for sv in profile.steering_vectors:
print(f"{sv.name}: magnitude {sv.magnitude:.3f}")
# Generate reports
profile.to_html("report.html")
# Export DPO pairs
export_dpo_pairs(profile, "pairs.jsonl")
Known Limitations
- Judge quality depends on the scoring model. Haiku is fast and cheap but may miss nuance. Sonnet or GPT-4o produce more accurate scores.
- The 300-question suite is a first release. Some questions may be too easy for frontier models. Harder adversarial probes are planned for v0.3.
- DPO pairs are generated from a single run. Multiple runs would improve statistical reliability.
- Hedging detection uses regex word boundary matching, which may still have edge cases.
Install
pip install cane-personality # core (numpy, pyyaml)
pip install cane-personality[anthropic] # + Anthropic provider
pip install cane-personality[openai] # + OpenAI/Ollama provider
pip install cane-personality[embeddings] # + sentence-transformers
pip install cane-personality[all] # everything
How it works
Probe Suite (300 Q) --> Target Model --> LLM Judge --> Trait Scoring
|
+---------+-----------+---------+
| | |
Embed (MiniLM) | DPO Pairs
| | (chosen/rejected)
PCA / UMAP |
| v
K-means Steering Vectors
Clusters (overconfidence,
quality)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cane_personality-0.2.0.tar.gz.
File metadata
- Download URL: cane_personality-0.2.0.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b7fc468fdc0c4201ea22913bc4434d14dfd5bb41b3f22e1ae628f325be3ca64
|
|
| MD5 |
7b213e6f3d1b6a3bda90409c8cda0daa
|
|
| BLAKE2b-256 |
767936511d71c40644baab095547fc651d328051a21f535c98a0b55460026721
|
File details
Details for the file cane_personality-0.2.0-py3-none-any.whl.
File metadata
- Download URL: cane_personality-0.2.0-py3-none-any.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc604cefa0bf2a8f618ec7c29128a440be86ccdd42e4c6a8e9a5c6c184829c26
|
|
| MD5 |
126952eaece07465f1f57188e45b7aba
|
|
| BLAKE2b-256 |
ada2f661c98b79efc7993b2b04df4e54adbfdced7fce5213ad08c31d96d57d32
|