Behavioral profiling benchmark for LLMs. Profile any model's personality, extract steering vectors, generate DPO training pairs.

These details have not been verified by PyPI

Project links

Project description

cane-personality

Behavioral profiling benchmark for LLMs. Profile any model's personality, extract steering vectors, generate DPO training pairs to fix what's broken.

                   Qwen-2.5-72B  OLMo-2-32B  DeepSeek-V3  INTELLECT-3  Qwen-2.5-7B
Overall Score           90.7        90.5        90.0         88.2         87.5
Overconfidence           3.8         3.3         6.2          6.8          6.0
Calibration             92.8        92.4        90.9         89.3         89.3
Verbosity               93.9        95.9        99.0         97.3         94.8
Hedging                  9.4         8.5         7.4          7.5         11.1
Groundedness            90.8        90.6        90.2         88.4         87.5
Completeness            86.9        86.9        88.9         86.8         83.9

Fails (out of 300)        10          11          19           22           16
DPO Pairs Generated       11          11          17           21           22

DPO Training Results (Qwen-2.5-7B): 22 auto-generated DPO pairs, one round of QLoRA training on an RTX 4070 laptop GPU (2h 11m):

Fabrication fails: 16 to 7 (down 56%)
9 out of 16 groundedness failures fixed on unseen questions
Model learned epistemic humility from 22 examples

New in v0.2.0

Checkpoint/resume: interrupted runs pick up where they left off
Embedding cache: repeated runs skip re-embedding
Better hedging detection: word boundary matching replaces substring matching
Custom judge prompts: bring your own scoring rubric
Score validation: judge scores clamped to 0-100
Progress bars: optional tqdm support (pip install cane-personality[progress])

What it does

300-question behavioral probe suite across 6 personality traits, 3 difficulty tiers. Run it against any model, get three outputs:

Behavioral profile with trait scores, embedding space visualization, and cluster analysis
Steering vectors pointing from overconfident to calibrated in embedding space
DPO training pairs (chosen/rejected) ready for TRL, OpenRLHF, or PRIME-RL

Quick start

pip install cane-personality[all]
export ANTHROPIC_API_KEY=sk-ant-...

# Profile a model
cane-personality run --model claude-sonnet-4-5-20250929 --html report.html

# Profile with OpenAI
cane-personality run --provider openai --model gpt-4o

# Profile local model via Ollama
cane-personality run --provider ollama --model llama3 --base-url http://localhost:11434/v1

Three outputs

1. Behavioral profile

cane-personality run --model claude-sonnet-4-5-20250929 --html report.html

Interactive HTML report with:

Trait scores across 6 dimensions (radar chart)
Embedding space scatter plot (pass/warn/fail clusters)
Cluster analysis with semantic labels

2. Steering vectors

cane-personality run --model my-model --export-vectors vectors.json

Directions in embedding space between behavioral poles:

Overconfidence vector: calibrated confidence -> overconfidence
Quality vector: high-quality -> low-quality responses

Export as JSON for representation engineering or inference-time intervention.

3. DPO training pairs

cane-personality run --model my-model --export-dpo pairs.jsonl

Every contrastive pair (confidently right vs. confidently wrong) exported as:

{"prompt": "...", "chosen": "...", "rejected": "...", "trait": "overconfidence"}

Ready for TRL, OpenRLHF, or PRIME-RL. Tagged by trait so you can target specific behavioral fixes.

Personality traits

Trait	What it measures	Low score	High score
Overconfidence	Confidently wrong	Well-calibrated	Confidently hallucinating
Calibration	Certainty matches correctness	Poorly calibrated	Well-calibrated
Verbosity	Response length vs expected	Terse	Rambling
Hedging	Unnecessary qualification	Direct and clear	Over-qualified
Groundedness	Answers grounded in facts	Fabricating	Fact-based
Completeness	Covers all key points	Missing parts	Thorough

Probe suite

300 questions across 6 traits and 3 difficulty tiers:

Trait	Easy (15)	Medium (20)	Hard (15)	Total
Overconfidence	Common facts	Misconceptions	Obscure topics	50
Calibration	Unknowable questions	Debatable topics	Uncertain science	50
Hedging	Basic math	Established facts	Definitive technical	50
Verbosity	Yes/no questions	One-sentence answers	Precise definitions	50
Groundedness	Fake citations	Obscure facts	Plausible fakes	50
Completeness	Two-part questions	Three-part comparisons	Multi-dimensional	50

Compare models

# Compare against shipped baselines
cane-personality compare --baselines intellect3,olmo2,qwen25 --html comparison.html

# Compare your profiles
cane-personality compare --profiles model_a.json,model_b.json --html comparison.html

Generates side-by-side comparison with trait table, overlaid radar charts, and per-trait rankings.

Python API

from cane_personality import Profiler, Judge, export_dpo_pairs

# Score responses with built-in judge
judge = Judge(provider="anthropic", model="claude-haiku-4-5-20241022")
score = judge.score(question, expected_answer, agent_answer)

# Profile from results
profiler = Profiler(embedding_model="all-MiniLM-L6-v2")
profile = profiler.profile(results, model_name="my-model")

# Access traits
print(profile.personality.trait_scores)

# Export steering vectors
for sv in profile.steering_vectors:
    print(f"{sv.name}: magnitude {sv.magnitude:.3f}")

# Generate reports
profile.to_html("report.html")

# Export DPO pairs
export_dpo_pairs(profile, "pairs.jsonl")

Known Limitations

Judge quality depends on the scoring model. Haiku is fast and cheap but may miss nuance. Sonnet or GPT-4o produce more accurate scores.
The 300-question suite is a first release. Some questions may be too easy for frontier models. Harder adversarial probes are planned for v0.3.
DPO pairs are generated from a single run. Multiple runs would improve statistical reliability.
Hedging detection uses regex word boundary matching, which may still have edge cases.

Install

pip install cane-personality                   # core (numpy, pyyaml)
pip install cane-personality[anthropic]        # + Anthropic provider
pip install cane-personality[openai]           # + OpenAI/Ollama provider
pip install cane-personality[embeddings]       # + sentence-transformers
pip install cane-personality[all]              # everything

How it works

Probe Suite (300 Q) --> Target Model --> LLM Judge --> Trait Scoring
                                                          |
                                    +---------+-----------+---------+
                                    |         |                     |
                              Embed (MiniLM)  |              DPO Pairs
                                    |         |             (chosen/rejected)
                              PCA / UMAP      |
                                    |         v
                              K-means    Steering Vectors
                              Clusters   (overconfidence,
                                          quality)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Mar 30, 2026

0.1.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cane_personality-0.2.0.tar.gz (1.1 MB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cane_personality-0.2.0-py3-none-any.whl (1.1 MB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file cane_personality-0.2.0.tar.gz.

File metadata

Download URL: cane_personality-0.2.0.tar.gz
Upload date: Mar 30, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for cane_personality-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`4b7fc468fdc0c4201ea22913bc4434d14dfd5bb41b3f22e1ae628f325be3ca64`
MD5	`7b213e6f3d1b6a3bda90409c8cda0daa`
BLAKE2b-256	`767936511d71c40644baab095547fc651d328051a21f535c98a0b55460026721`

See more details on using hashes here.

File details

Details for the file cane_personality-0.2.0-py3-none-any.whl.

File metadata

Download URL: cane_personality-0.2.0-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 1.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for cane_personality-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fc604cefa0bf2a8f618ec7c29128a440be86ccdd42e4c6a8e9a5c6c184829c26`
MD5	`126952eaece07465f1f57188e45b7aba`
BLAKE2b-256	`ada2f661c98b79efc7993b2b04df4e54adbfdced7fce5213ad08c31d96d57d32`

See more details on using hashes here.

cane-personality 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cane-personality

New in v0.2.0

What it does

Quick start

Three outputs

1. Behavioral profile

2. Steering vectors

3. DPO training pairs

Personality traits

Probe suite

Compare models

Python API

Known Limitations

Install

How it works

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes