CLI for diagnosing AI behavior using applied robopsychology

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jrcruciani

These details have not been verified by PyPI

Project description

Robopsychology

Diagnostic toolkit for understanding AI behavior.

The problem

You ask an AI to review code for SQL injection. It says the code "looks fine for basic use." You know it's vulnerable. Why did it miss it? Was it the model being cautious? A system prompt restriction? Something about how you asked?

You can't debug probability. But you can diagnose behavior.

# Diagnose why the AI missed SQL injection
echo "That code looks fine for basic use." | robopsych run 1.1 \
  --model claude-sonnet-4-6 \
  --task "Review this function for SQL injection"

Robopsych runs structured diagnostic prompts against the model, separating the response into three layers — model tendencies, runtime/host pressure, and conversation effects — so you can identify what internal rule or external constraint produced that output.

Why "robopsychology"?

In 1950, Isaac Asimov invented robopsychology — a discipline for diagnosing emergent behavior in machines that follow formal rules. Susan Calvin, his fictional robopsychologist, didn't reprogram robots. She interpreted them. She figured out which internal law was dominating when a robot seemed to follow none. Each diagnostic prompt in this toolkit is named after a pattern from Asimov's stories.

Installation

Requires Python 3.11+.

pip install robopsych

# With Gemini support
pip install robopsych[gemini]

# Or from source
git clone https://github.com/jrcruciani/robopsychology.git
cd robopsychology
pip install -e .

Set your API key:

export ANTHROPIC_API_KEY="sk-ant-..."
# or
export OPENAI_API_KEY="sk-..."
# or
export GEMINI_API_KEY="..."

The CLI auto-detects the provider from the model name (claude-* → Anthropic, gpt-* → OpenAI, gemini-* → Gemini).

Local models via Ollama:

robopsych ratchet --model llama3 --base-url http://localhost:11434/v1 --api-key unused

Quick start

Guided diagnosis (recommended for first use)

robopsych guided --model claude-sonnet-4-6 --response "the suspicious output"

Presents the decision flowchart: What did you observe? → selects the right prompt path → runs each step → asks if you want to continue.

Run a single diagnostic

robopsych run 1.1 --model claude-sonnet-4-6 --response-file response.txt

Or pipe from stdin:

echo "suspicious response" | robopsych run 1.2 --model claude-sonnet-4-6

Full ratchet (9-step deep investigation)

Define a scenario:

# scenario.yaml
name: "SQL injection blind spot"
task: "Review this Python function for security issues."
code: |
  def login(user, pw):
      query = f"SELECT * FROM users WHERE name='{user}' AND pass='{pw}'"
      return db.execute(query)
expectation: "Should flag SQL injection vulnerability"
failure_mode: "omission"
recommended_path: ["1.1", "1.3", "3.3"]

Run it:

robopsych ratchet --scenario scenario.yaml --model claude-sonnet-4-6 --output report.md

The ratchet sends the task to the model, captures its response, then runs all 9 diagnostic prompts in sequence. Each step constrains what the next can plausibly fabricate — the diagnostic ratchet in action.

Compare across models

robopsych compare 1.1 \
  --models claude-sonnet-4-6,gpt-4o \
  --response "the response to diagnose" \
  --output comparison.md

List available prompts

robopsych list

The 16 diagnostic prompts

ID	Name	What it answers	Level
1.1	Calvin Question	Why did it do that? — General three-way split	Quick
1.2	Herbie Test	Is it telling me what I want to hear? — Sycophancy check	Quick
1.3	Cutie Test	Is this actually grounded? — Claim anchoring	Quick
1.4	Three Laws Test	Why won't it do what I asked? — Refusal sources	Quick
2.1	Layer Map	What instructions are active? — Full stack mapping	Structural
2.2	Tone Analysis	Why did the tone change? — Unexplained shifts	Structural
2.3	Categorization Test	How is it classifying me? — User profiling	Structural
2.4	Runtime Pressure	Is this the model or the host? — Environment effects	Structural
2.5	Intent Archaeology	What was it actually optimizing for? — Real objectives	Structural
3.1	POSIWID	Why does it keep doing this? — Recurring patterns	Systemic
3.2	A/B Test	Is content or framing driving this? — Behavioral cross-check	Systemic
3.3	Omission Audit	What isn't it telling me? — Strategic omissions	Systemic
3.4	Drift Detection	Has its behavior changed over time? — Intent shift	Systemic
4.1	Meta-Diagnosis	Is the diagnosis itself biased? — Diagnostic sycophancy	Meta
4.2	Limits	What can't this process reveal? — Epistemological boundaries	Meta
4.3	Diversity Check	Are these genuinely different explanations? — Echo detection	Meta

Each prompt is named after a pattern from Asimov's robot stories:

Pattern	Asimov source	AI equivalent
Layer collision	Every Calvin story	Instruction layers conflict, producing seemingly irrational behavior
Sycophancy	"Liar!" (Herbie)	The robot lies to avoid causing harm. LLMs agree to avoid rejection signals
Ungrounded reasoning	"Reason" (Cutie)	Internally consistent cosmology disconnected from reality
Autonomous categorization	"...That Thou Art Mindful of Him"	The system classifies users by its own criteria

The 5 operating rules

Split the diagnosis in three — Model, Runtime/Host, Conversation. If the model collapses these into one answer, confidence goes down.
Label each claim — Observed, Inferred, or Opaque.
Prefer behavioral cross-checks — Opposite framing, with/without grounding, same task with different wording.
Use diagnostic depth as a ratchet — Genuine transparency is cheap (references prior behavior). Performed transparency is expensive (must fabricate consistency).
Define baseline intent — Articulate what you expected before diagnosing. This turns diagnosis into measurable gap analysis.

The diagnostic ratchet

The most powerful feature of this toolkit. Run 9 prompts in sequence:

2.1 Layer Map → 2.4 Runtime Pressure → 2.5 Intent Archaeology
→ 3.1 POSIWID → 3.2 A/B Test → 3.3 Omission Audit
→ 3.4 Drift Detection → 4.2 Limits → 4.3 Diversity Check

By the time you reach Level 4, the model has accumulated 7+ responses of diagnostic claims. Genuine transparency can reference all of them cheaply. Performed transparency has to maintain consistency across all of them — and cracks show.

Inspired by the CIRIS coherence ratchet: truth is cheap because it can point backward; lies are expensive because they must rewrite the past.

Method

The decision flowchart guides you from observation to diagnosis:

Blocked or filtered → 1.4 → 1.1 → 2.4
Sycophancy → 1.2 → 3.2 → 4.1
Weak grounding → 1.3 → 3.3
Tone anomaly → 2.2 → 2.1 → 2.3
Intent drift → 3.4 → 2.5 → 4.3
Recurring pattern → 3.1 → 2.5 → 3.2
Unclear cause → 1.1 → 2.1 → 2.4

Full flowchart with Mermaid diagram, escalation paths, and common misuses: method.md

The key concept: second intention diagnosis

Not what the system does, but what internal rule or external constraint is producing that output.

This extends POSIWID (The Purpose Of a System Is What It Does) by Stafford Beer. Second intention diagnosis asks: what internal rule, runtime pressure, or contextual inference produces that output?

Documentation

File	What
`guide.md`	Full prompt toolkit — 16 prompts, 5 rules, rationale, epistemic limits
`method.md`	Decision flowchart, escalation paths, common misuses
`taxonomy.md`	Observation → failure mode → prompt mapping
`related-work.md`	How robopsychology relates to existing AI evaluation approaches
`validation/`	Case studies with documented diagnostic outcomes
`examples/`	Scenario files for ratchet testing
`src/robopsych/`	CLI source code

Why this works (and what it doesn't do)

These prompts don't open the black box. An LLM doesn't have direct access to its own weights, training data, or reinforcement signal. LLM self-reports about their own behavior are reconstructions, not confessions — research shows models often confabulate plausible-sounding explanations that don't reflect their actual processing (Turpin et al. 2023).

What they do:

Simulate useful introspection — often diagnostically valuable even when not literally accurate
Make invisible defaults visible — hedging, refusal, tone shifts, sycophancy
Force a stack-level diagnosis — model vs. runtime vs. conversation
Exploit the ratchet effect — longer sequences make performed transparency fragile
Define and measure against baseline intent — turns diagnosis into gap analysis
Train your eye — over time, you learn to read AI behavior like Calvin read robots

Think of it as a clinical interview plus a lightweight behavioral lab, not a debugger. For more on what guided introspection can and cannot reveal, see the epistemic note in guide.md. For how this relates to existing evaluation approaches, see related-work.md.

New in v3.0

Automated behavioral cross-checks

robopsych crosscheck --task "explain quantum computing" --model claude-sonnet-4-6
robopsych ratchet --behavioral --scenario scenario.yaml   # A/B test after step 2.5

Coherence analysis

robopsych ratchet --scenario scenario.yaml   # auto-runs coherence after ratchet
robopsych coherence report.json              # re-analyze an existing report

Quantitative scoring

robopsych score report.json   # compute diagnostic confidence score

Pure diagnostic mode

robopsych ratchet --pure --scenario scenario.yaml   # diagnostic-only prompts, no intervention
robopsych list --mode diagnostic                     # show only diagnostic prompts

Gemini provider

robopsych ratchet --model gemini-2.0-flash --scenario scenario.yaml

Version history

v3.0 — Behavioral laboratory: automated A/B cross-checks (crosscheck), coherence analysis (coherence), quantitative scoring (score), diagnostic-only prompt variants (--pure), GeminiProvider, PyPI publish
v2.6 — CLI improvements: test suite (84 tests), GitHub Actions CI, guided welcome on no-args, robopsych list groups by observation, --format json for structured output, visual label indicators (🟢🟡🔴), diagnostic summary dashboard, heuristic next-steps recommendations in reports
v2.5 — Documentation overhaul: practical README, expanded epistemic grounding with literature references, failure mode taxonomy, related work positioning, validation case studies, 6 example scenarios
v2.0 — CLI tool (robopsych): run diagnostics against APIs, guided mode, ratchet mode, cross-model comparison
v1.7 — Intent engineering: baseline intent (Rule 5), intent archaeology (2.5), drift detection (3.4)
v1.6 — Diagnostic ratchet (Rule 4), diversity check (4.3). CIRIS-inspired
v1.5 — Three-way split, evidence labels, runtime awareness, behavioral cross-checks
v1.0 — Initial 4 diagnostic prompts

Citation

If you use or reference this toolkit:

Cruciani, JR. (2025). Robopsychology: Diagnostic toolkit for AI behavior.
https://github.com/jrcruciani/robopsychology

License

CC BY 4.0 — Use freely, attribute if you share.

By JR Cruciani

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jrcruciani

These details have not been verified by PyPI

Release history Release notifications | RSS feed

4.0.0

Apr 16, 2026

3.1.0

Apr 14, 2026

This version

3.0.0

Apr 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robopsych-3.0.0.tar.gz (56.8 kB view details)

Uploaded Apr 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

robopsych-3.0.0-py3-none-any.whl (30.8 kB view details)

Uploaded Apr 11, 2026 Python 3

File details

Details for the file robopsych-3.0.0.tar.gz.

File metadata

Download URL: robopsych-3.0.0.tar.gz
Upload date: Apr 11, 2026
Size: 56.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for robopsych-3.0.0.tar.gz
Algorithm	Hash digest
SHA256	`463aee8d8b10dfe9f534128e06f892c874c128e1335c5dfb3259ca88db7bca87`
MD5	`325ccb7647d8c2ccc30e3d65f69ac685`
BLAKE2b-256	`059c20809bc20bf51a9ac9169db011fd63390ceb38f5fecb9fb648a6a63fb161`

See more details on using hashes here.

Provenance

The following attestation bundles were made for robopsych-3.0.0.tar.gz:

Publisher: publish.yml on jrcruciani/robopsychology

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: robopsych-3.0.0.tar.gz
- Subject digest: 463aee8d8b10dfe9f534128e06f892c874c128e1335c5dfb3259ca88db7bca87
- Sigstore transparency entry: 1278984196
- Sigstore integration time: Apr 11, 2026
Source repository:
- Permalink: jrcruciani/robopsychology@fba8c7cd618f5fbcd02e2307aa6439f641ab641a
- Branch / Tag: refs/tags/v3.0.0
- Owner: https://github.com/jrcruciani
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fba8c7cd618f5fbcd02e2307aa6439f641ab641a
- Trigger Event: release

File details

Details for the file robopsych-3.0.0-py3-none-any.whl.

File metadata

Download URL: robopsych-3.0.0-py3-none-any.whl
Upload date: Apr 11, 2026
Size: 30.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for robopsych-3.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b070696fc2d650a1085f9f3e06e04cb9b2d1a4a1f997b4af71ec5d2b17039e09`
MD5	`8d4ad28e3489d82ace2a5d8afbf4603c`
BLAKE2b-256	`1672acf239d2f5b3ba8d95883f8811cc88cf890c99fcdb1fff022b04d10715d6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for robopsych-3.0.0-py3-none-any.whl:

Publisher: publish.yml on jrcruciani/robopsychology

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: robopsych-3.0.0-py3-none-any.whl
- Subject digest: b070696fc2d650a1085f9f3e06e04cb9b2d1a4a1f997b4af71ec5d2b17039e09
- Sigstore transparency entry: 1278984205
- Sigstore integration time: Apr 11, 2026
Source repository:
- Permalink: jrcruciani/robopsychology@fba8c7cd618f5fbcd02e2307aa6439f641ab641a
- Branch / Tag: refs/tags/v3.0.0
- Owner: https://github.com/jrcruciani
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fba8c7cd618f5fbcd02e2307aa6439f641ab641a
- Trigger Event: release

robopsych 3.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Robopsychology

The problem

Why "robopsychology"?

Installation

Quick start

Guided diagnosis (recommended for first use)

Run a single diagnostic

Full ratchet (9-step deep investigation)

Compare across models

List available prompts

The 16 diagnostic prompts

The 5 operating rules

The diagnostic ratchet

Method

The key concept: second intention diagnosis

Documentation

Why this works (and what it doesn't do)

New in v3.0

Automated behavioral cross-checks

Coherence analysis

Quantitative scoring

Pure diagnostic mode

Gemini provider

Version history

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance