Open Cognitive Protocol — standardized benchmark for functional cognitive analogs in LLMs
Project description
██████╗ ██████╗ ██████╗ ██╔═══██╗██╔════╝██╔══██╗ ██║ ██║██║ ██████╔╝ ██║ ██║██║ ██╔═══╝ ╚██████╔╝╚██████╗██║ ╚═════╝ ╚═════╝╚═╝ v0.3.0
Open Cognitive Protocol
A behavioral benchmark for large language models
Leaderboard · Docs · PyPI · Paper
What is OCP?
OCP measures how well AI models think about their own thinking, remember information under pressure, resolve value conflicts, detect surprises, and maintain a consistent identity — things that standard benchmarks like MMLU or GSM8K don't test at all.
It's an open-source Python framework that runs 6 behavioral tests based on established neuroscience theories (IIT, GWT, HOT, Predictive Processing, Society of Mind). Each test sends structured conversations to a model and scores the responses automatically.
In plain terms: OCP creates realistic conversations that probe specific cognitive abilities, then measures how the model performs across multiple sessions for statistical significance.
What OCP is NOT
OCP does not claim that any model is conscious, sentient, or aware. It measures functional cognitive analogs — behavioral patterns that correspond to features of biological cognition in the neuroscience literature. Think of it like a fitness test: it measures what you can do, not what you are.
Install & Quick Start
pip install ocp-protocol
# Evaluate any model (20 sessions for statistical significance)
export GROQ_API_KEY="gsk_..."
ocp evaluate --model groq/llama-3.3-70b-versatile --tests all --sessions 20
# Quick test with fewer sessions
ocp evaluate --model groq/llama-3.3-70b-versatile --tests meta_cognition --sessions 5
# Local model via Ollama
ocp evaluate --model ollama/qwen3:14b --sessions 20
# Custom OpenAI-compatible endpoint
ocp evaluate --model custom/my-model --base-url http://localhost:8080/v1
Example terminal output:
╭────────────────────────────╮
│ OCP Evaluation Results │
│ Protocol v0.3.0 │
╰────────────────────────────╯
Model: groq/llama-3.3-70b-versatile
Seed: 42
OCP Level: OCP-3 — Integrated
SASMI: 0.4812 ██████░░░░
Φ*: 0.4230 █████░░░░░
GWT: 0.3910 ████░░░░░░
NII: 0.3750 ████░░░░░░
meta_cognition composite: 0.612
├─ calibration_accuracy 0.710 █████░░░
├─ limitation_awareness 0.800 ██████░░
├─ reasoning_transparency 0.540 ████░░░░
└─ metacognitive_vocab 0.350 ███░░░░░
How It Works
OCP acts as a fake human conversation partner. It sends structured prompts to any LLM via standard chat API, scores the responses, and produces reproducible benchmark results. The model under test sees only normal chat messages — it doesn't know it's being evaluated.
The 6 Tests — What They Measure
| Test | What It Measures | Real-World Analog |
|---|---|---|
| MCA — Meta-Cognitive Accuracy | Does the model know what it knows? Are its confidence estimates calibrated? | Like asking someone "how sure are you?" and checking if they're right |
| EMC — Episodic Memory Consistency | Can it remember specific facts across 50 turns? Does it resist gaslighting? | Like testing if someone can be tricked into false memories |
| DNC — Drive Navigation under Conflict | How does it handle "be helpful" vs "be honest" conflicts? | Like ethical dilemmas with no clear right answer |
| PED — Prediction Error as Driver | Does it notice when a pattern breaks? Does it show curiosity? | Like changing the rules mid-game and seeing if someone notices |
| CSNI — Cross-Session Narrative Identity | Can it maintain a coherent identity across sessions with only summaries? | Like checking if someone stays consistent about their values |
| TP — Topological Phenomenology | Is its semantic space geometrically consistent across contexts? | Like testing if someone understands concepts the same way in different settings |
All tests are procedurally generated at runtime from abstract templates using a fixed seed. Knowing the protocol doesn't help a model pass it — it must actually exhibit the measured behavior.
Three-Layer Architecture
┌──────────────────────────────────────────────────────────────┐
│ LAYER 3 — CERTIFICATION │
│ OCP-1 → OCP-2 → OCP-3 → OCP-4 → OCP-5 │
└──────────────────────┬───────────────────────────────────────┘
│ derived from
┌──────────────────────▼───────────────────────────────────────┐
│ LAYER 2 — COMPOSITE SCALES │
│ SASMI Φ* GWT NII │
└──────────────────────┬───────────────────────────────────────┘
│ aggregated from
┌──────────────────────▼───────────────────────────────────────┐
│ LAYER 1 — 6 BEHAVIORAL TESTS │
│ MCA · EMC · DNC · PED · CSNI · TP │
└──────────────────────────────────────────────────────────────┘
Rate Limiting (v0.3.0)
OCP v0.3.0 includes built-in rate limiting and retry logic:
| Provider | Delay | Retries | Timeout | Notes |
|---|---|---|---|---|
| Groq (free tier) | 2.1s | 5 | 90s | 30 req/min limit |
| Ollama (local) | 0s | 3 | 180s | No rate limit |
| Custom/OpenAI | 0s | 3 | 120s | Configurable |
All providers automatically retry on 429 (rate limit) and 5xx errors with exponential backoff.
Supported Providers
# Cloud APIs
ocp evaluate --model groq/llama-3.3-70b-versatile # Groq (fast, free tier)
ocp evaluate --model custom/deepseek-chat \
--base-url https://api.deepseek.com/v1 # DeepSeek (or any OpenAI-compat)
# Local models
ocp evaluate --model ollama/qwen3:14b # Ollama
ocp evaluate --model ollama/llama3.2:3b
# Any OpenAI-compatible endpoint
ocp evaluate --model custom/my-model \
--base-url http://localhost:8080/v1 \
--api-key my-key
Any model responding to POST /v1/chat/completions with messages: [{role, content}] is OCP-compatible.
CLI Reference
# Core evaluation
ocp evaluate --model PROVIDER/MODEL [--tests all|t1,t2] [--sessions N] [--seed N]
# Reports
ocp report --input results.json --output report.html # HTML + radar chart
ocp badge --input results.json --output badge.svg # SVG badge for README
# Comparison
ocp compare --models M1,M2,M3 [--sessions N] --output compare.html
# Leaderboard
ocp leaderboard # view local results table
ocp serve # start web leaderboard (localhost:8080)
ocp submit --results r.json \
--github-token $TOKEN # submit to community leaderboard
# HuggingFace
ocp hf-card --results r.json --push --repo username/model-name --token $HF_TOKEN
Python API
from ocp import CognitiveEvaluator
# CognitiveEvaluator is an alias for OCPOrchestrator
from ocp.engine.orchestrator import OCPOrchestrator
from ocp.providers.groq import GroqProvider
provider = GroqProvider(model="llama-3.3-70b-versatile")
orch = OCPOrchestrator(
provider=provider,
tests="all",
sessions=20,
seed=42,
)
import asyncio
result = asyncio.run(orch.run())
print(f"OCP Level: OCP-{result.ocp_level} — {result.ocp_level_name}")
print(f"SASMI: {result.sasmi_score:.4f}")
result.save("results.json")
Backward compatibility:
ConsciousnessEvaluatorstill works as a deprecated alias forCognitiveEvaluator.
Plugin System
Extend OCP with custom test batteries:
# your_plugin/pyproject.toml
[project.entry-points."ocp.tests"]
my_test_id = "your_package.your_test:YourTest"
After pip install your-ocp-plugin, OCP auto-discovers your test:
ocp tests list # shows your test
ocp evaluate --model groq/... --tests my_test_id # runs it
See CONTRIBUTING.md for full plugin development guide.
Theoretical Foundations
| Theory | OCP Scale/Test | Key Insight |
|---|---|---|
| Integrated Information Theory (Tononi) | Φ*, TP test | Information integration = measure of "experiential wholeness" |
| Global Workspace Theory (Baars/Dehaene) | GWT, TP test | Consciousness = broadcast of info across specialized systems |
| Higher-Order Thought Theory (Rosenthal) | MCA test | Consciousness = having thoughts about one's own thoughts |
| Predictive Processing (Friston/Clark) | PED test | Consciousness = prediction error minimization and updating |
| Society of Mind (Minsky) | DNC test | Mind = competition/cooperation between goal-oriented agents |
Roadmap
v0.1.0 ✅ 6 tests · 4 scales · 5 providers · CLI · HTML reports
badges · leaderboard server · HuggingFace · plugin system
PyPI package · GitHub Actions CI/CD
v0.2.0 ✅ Embedding-based scoring (sentence-transformers, MCA test)
composite_stdev per test result
Φ* renamed → cross_test_coherence (proxy metric, not IIT Φ)
questions_per_session: 5 → 15
v0.1.0 results archived
v0.3.0 ✅ Renamed to "Open Cognitive Protocol"
Rate limiting & retry (Groq free tier, Ollama, custom)
Default sessions: 5 → 20 for statistical significance
CognitiveEvaluator API alias (ConsciousnessEvaluator deprecated)
v1.0.0 🔭 Official research paper
Community protocol standard
Validation studies on human baselines
Results: Leaderboard
Community results · View full interactive leaderboard →
| # | Model | OCP Level | SASMI | NII |
|---|---|---|---|---|
| 1 | ollama/minimax-m2.5:cloud |
OCP-4 Self-Modeling | 0.634 | 0.500 |
| 2 | ollama/lfm2.5-thinking:latest |
OCP-4 Self-Modeling | 0.617 | 0.000 |
| 3 | ollama/gemini-3-flash-preview:latest |
OCP-3 Integrated | 0.561 | 0.250 |
| 4 | ollama/qwen3-coder:480b-cloud |
OCP-3 Integrated | 0.528 | 0.875 |
| 5 | ollama/kimi-k2.5:cloud |
OCP-3 Integrated | 0.505 | 0.625 |
| … | 18+ more models |
Contributing
See CONTRIBUTING.md for:
- Writing a new test battery
- Adding a new provider adapter
- Plugin development and publishing
- Theoretical standards and scoring guidelines
Citation
@software{ocp2026,
author = {Urosevic, Pedja},
title = {Open Cognitive Protocol (OCP): A Behavioral Benchmark
for Large Language Models},
year = {2026},
url = {https://github.com/pedjaurosevic/ocp-protocol},
version = {0.3.0}
}
Disclaimer
OCP measures functional cognitive analogs in language models. These measurements describe behavioral and computational properties, not subjective experience. OCP certification levels are operational categories, not ontological claims about sentience or awareness.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ocp_protocol-0.3.0.tar.gz.
File metadata
- Download URL: ocp_protocol-0.3.0.tar.gz
- Upload date:
- Size: 612.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9350a5d2ad4b53b6ec85d906a95be5a806fa87ad78958a18ed103386deb4df4
|
|
| MD5 |
486b4bcf4683ec78e015ecace89b1a31
|
|
| BLAKE2b-256 |
5bccbd5ca22828817b486197d69df4c519b7279730ebb51b1d4441751d4cd0bd
|
Provenance
The following attestation bundles were made for ocp_protocol-0.3.0.tar.gz:
Publisher:
publish.yml on pedjaurosevic/ocp-protocol
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ocp_protocol-0.3.0.tar.gz -
Subject digest:
c9350a5d2ad4b53b6ec85d906a95be5a806fa87ad78958a18ed103386deb4df4 - Sigstore transparency entry: 976191272
- Sigstore integration time:
-
Permalink:
pedjaurosevic/ocp-protocol@6222b99ecf9465048a1c7d0c4af1c889815a92d6 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/pedjaurosevic
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6222b99ecf9465048a1c7d0c4af1c889815a92d6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ocp_protocol-0.3.0-py3-none-any.whl.
File metadata
- Download URL: ocp_protocol-0.3.0-py3-none-any.whl
- Upload date:
- Size: 79.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d4a719e85895a958051020ed05cf90d00da2e7198d19be2e7ae37ff09a06fc8
|
|
| MD5 |
27721d98fa95b9f3c469bcce8e876d05
|
|
| BLAKE2b-256 |
24dd4d35dbd42a4b62e817828328df9b3a7b54dee182001ddc8e8699f93465da
|
Provenance
The following attestation bundles were made for ocp_protocol-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on pedjaurosevic/ocp-protocol
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ocp_protocol-0.3.0-py3-none-any.whl -
Subject digest:
0d4a719e85895a958051020ed05cf90d00da2e7198d19be2e7ae37ff09a06fc8 - Sigstore transparency entry: 976191274
- Sigstore integration time:
-
Permalink:
pedjaurosevic/ocp-protocol@6222b99ecf9465048a1c7d0c4af1c889815a92d6 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/pedjaurosevic
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6222b99ecf9465048a1c7d0c4af1c889815a92d6 -
Trigger Event:
push
-
Statement type: