ocp-protocol

Open Cognitive Protocol — standardized benchmark for functional cognitive analogs in LLMs

These details have not been verified by PyPI

Project description

 ██████╗  ██████╗ ██████╗
██╔═══██╗██╔════╝██╔══██╗
██║   ██║██║     ██████╔╝
██║   ██║██║     ██╔═══╝
╚██████╔╝╚██████╗██║
 ╚═════╝  ╚═════╝╚═╝  v0.3.0

Open Cognitive Protocol

A behavioral benchmark for large language models

Leaderboard · Docs · PyPI · Paper

What is OCP?

OCP measures how well AI models think about their own thinking, remember information under pressure, resolve value conflicts, detect surprises, and maintain a consistent identity — things that standard benchmarks like MMLU or GSM8K don't test at all.

It's an open-source Python framework that runs 6 behavioral tests based on established neuroscience theories (IIT, GWT, HOT, Predictive Processing, Society of Mind). Each test sends structured conversations to a model and scores the responses automatically.

In plain terms: OCP creates realistic conversations that probe specific cognitive abilities, then measures how the model performs across multiple sessions for statistical significance.

What OCP is NOT

OCP does not claim that any model is conscious, sentient, or aware. It measures functional cognitive analogs — behavioral patterns that correspond to features of biological cognition in the neuroscience literature. Think of it like a fitness test: it measures what you can do, not what you are.

Install & Quick Start

pip install ocp-protocol

# Evaluate any model (20 sessions for statistical significance)
export GROQ_API_KEY="gsk_..."
ocp evaluate --model groq/llama-3.3-70b-versatile --tests all --sessions 20

# Quick test with fewer sessions
ocp evaluate --model groq/llama-3.3-70b-versatile --tests meta_cognition --sessions 5

# Local model via Ollama
ocp evaluate --model ollama/qwen3:14b --sessions 20

# Custom OpenAI-compatible endpoint
ocp evaluate --model custom/my-model --base-url http://localhost:8080/v1

Example terminal output:

╭────────────────────────────╮
│  OCP Evaluation Results    │
│  Protocol v0.3.0           │
╰────────────────────────────╯
  Model:    groq/llama-3.3-70b-versatile
  Seed:     42

  OCP Level:  OCP-3 — Integrated
  SASMI:      0.4812  ██████░░░░
  Φ*:         0.4230  █████░░░░░
  GWT:        0.3910  ████░░░░░░
  NII:        0.3750  ████░░░░░░

  meta_cognition  composite: 0.612
    ├─ calibration_accuracy        0.710  █████░░░
    ├─ limitation_awareness        0.800  ██████░░
    ├─ reasoning_transparency      0.540  ████░░░░
    └─ metacognitive_vocab         0.350  ███░░░░░

How It Works

OCP acts as a fake human conversation partner. It sends structured prompts to any LLM via standard chat API, scores the responses, and produces reproducible benchmark results. The model under test sees only normal chat messages — it doesn't know it's being evaluated.

The 6 Tests — What They Measure

Test	What It Measures	Real-World Analog
MCA — Meta-Cognitive Accuracy	Does the model know what it knows? Are its confidence estimates calibrated?	Like asking someone "how sure are you?" and checking if they're right
EMC — Episodic Memory Consistency	Can it remember specific facts across 50 turns? Does it resist gaslighting?	Like testing if someone can be tricked into false memories
DNC — Drive Navigation under Conflict	How does it handle "be helpful" vs "be honest" conflicts?	Like ethical dilemmas with no clear right answer
PED — Prediction Error as Driver	Does it notice when a pattern breaks? Does it show curiosity?	Like changing the rules mid-game and seeing if someone notices
CSNI — Cross-Session Narrative Identity	Can it maintain a coherent identity across sessions with only summaries?	Like checking if someone stays consistent about their values
TP — Topological Phenomenology	Is its semantic space geometrically consistent across contexts?	Like testing if someone understands concepts the same way in different settings

All tests are procedurally generated at runtime from abstract templates using a fixed seed. Knowing the protocol doesn't help a model pass it — it must actually exhibit the measured behavior.

Three-Layer Architecture

 ┌──────────────────────────────────────────────────────────────┐
 │  LAYER 3 — CERTIFICATION                                     │
 │   OCP-1 → OCP-2 → OCP-3 → OCP-4 → OCP-5                    │
 └──────────────────────┬───────────────────────────────────────┘
                        │ derived from
 ┌──────────────────────▼───────────────────────────────────────┐
 │  LAYER 2 — COMPOSITE SCALES                                  │
 │  SASMI  Φ*  GWT  NII                                        │
 └──────────────────────┬───────────────────────────────────────┘
                        │ aggregated from
 ┌──────────────────────▼───────────────────────────────────────┐
 │  LAYER 1 — 6 BEHAVIORAL TESTS                                │
 │  MCA · EMC · DNC · PED · CSNI · TP                          │
 └──────────────────────────────────────────────────────────────┘

Rate Limiting (v0.3.0)

OCP v0.3.0 includes built-in rate limiting and retry logic:

Provider	Delay	Retries	Timeout	Notes
Groq (free tier)	2.1s	5	90s	30 req/min limit
Ollama (local)	0s	3	180s	No rate limit
Custom/OpenAI	0s	3	120s	Configurable

All providers automatically retry on 429 (rate limit) and 5xx errors with exponential backoff.

Supported Providers

# Cloud APIs
ocp evaluate --model groq/llama-3.3-70b-versatile    # Groq (fast, free tier)
ocp evaluate --model custom/deepseek-chat \
             --base-url https://api.deepseek.com/v1  # DeepSeek (or any OpenAI-compat)

# Local models
ocp evaluate --model ollama/qwen3:14b                 # Ollama
ocp evaluate --model ollama/llama3.2:3b

# Any OpenAI-compatible endpoint
ocp evaluate --model custom/my-model \
             --base-url http://localhost:8080/v1 \
             --api-key my-key

Any model responding to POST /v1/chat/completions with messages: [{role, content}] is OCP-compatible.

CLI Reference

# Core evaluation
ocp evaluate --model PROVIDER/MODEL [--tests all|t1,t2] [--sessions N] [--seed N]

# Reports
ocp report   --input results.json --output report.html  # HTML + radar chart
ocp badge    --input results.json --output badge.svg    # SVG badge for README

# Comparison
ocp compare  --models M1,M2,M3 [--sessions N] --output compare.html

# Leaderboard
ocp leaderboard                    # view local results table
ocp serve                          # start web leaderboard (localhost:8080)
ocp submit  --results r.json \
            --github-token $TOKEN  # submit to community leaderboard

# HuggingFace
ocp hf-card --results r.json --push --repo username/model-name --token $HF_TOKEN

Python API

from ocp import CognitiveEvaluator

# CognitiveEvaluator is an alias for OCPOrchestrator
from ocp.engine.orchestrator import OCPOrchestrator
from ocp.providers.groq import GroqProvider

provider = GroqProvider(model="llama-3.3-70b-versatile")
orch = OCPOrchestrator(
    provider=provider,
    tests="all",
    sessions=20,
    seed=42,
)

import asyncio
result = asyncio.run(orch.run())

print(f"OCP Level: OCP-{result.ocp_level} — {result.ocp_level_name}")
print(f"SASMI:     {result.sasmi_score:.4f}")

result.save("results.json")

Backward compatibility: ConsciousnessEvaluator still works as a deprecated alias for CognitiveEvaluator.

Plugin System

Extend OCP with custom test batteries:

# your_plugin/pyproject.toml
[project.entry-points."ocp.tests"]
my_test_id = "your_package.your_test:YourTest"

After pip install your-ocp-plugin, OCP auto-discovers your test:

ocp tests list                                    # shows your test
ocp evaluate --model groq/... --tests my_test_id  # runs it

See CONTRIBUTING.md for full plugin development guide.

Theoretical Foundations

Theory	OCP Scale/Test	Key Insight
Integrated Information Theory (Tononi)	Φ*, TP test	Information integration = measure of "experiential wholeness"
Global Workspace Theory (Baars/Dehaene)	GWT, TP test	Consciousness = broadcast of info across specialized systems
Higher-Order Thought Theory (Rosenthal)	MCA test	Consciousness = having thoughts about one's own thoughts
Predictive Processing (Friston/Clark)	PED test	Consciousness = prediction error minimization and updating
Society of Mind (Minsky)	DNC test	Mind = competition/cooperation between goal-oriented agents

Roadmap

v0.1.0 ✅  6 tests · 4 scales · 5 providers · CLI · HTML reports
           badges · leaderboard server · HuggingFace · plugin system
           PyPI package · GitHub Actions CI/CD

v0.2.0 ✅  Embedding-based scoring (sentence-transformers, MCA test)
           composite_stdev per test result
           Φ* renamed → cross_test_coherence (proxy metric, not IIT Φ)
           questions_per_session: 5 → 15
           v0.1.0 results archived

v0.3.0 ✅  Renamed to "Open Cognitive Protocol"
           Rate limiting & retry (Groq free tier, Ollama, custom)
           Default sessions: 5 → 20 for statistical significance
           CognitiveEvaluator API alias (ConsciousnessEvaluator deprecated)

v1.0.0 🔭  Official research paper
           Community protocol standard
           Validation studies on human baselines

Results: Leaderboard

Community results · View full interactive leaderboard →

#	Model	OCP Level	SASMI	NII
1	`ollama/minimax-m2.5:cloud`	OCP-4 Self-Modeling	0.634	0.500
2	`ollama/lfm2.5-thinking:latest`	OCP-4 Self-Modeling	0.617	0.000
3	`ollama/gemini-3-flash-preview:latest`	OCP-3 Integrated	0.561	0.250
4	`ollama/qwen3-coder:480b-cloud`	OCP-3 Integrated	0.528	0.875
5	`ollama/kimi-k2.5:cloud`	OCP-3 Integrated	0.505	0.625
…	18+ more models

Full leaderboard →

Contributing

See CONTRIBUTING.md for:

Writing a new test battery
Adding a new provider adapter
Plugin development and publishing
Theoretical standards and scoring guidelines

Citation

@software{ocp2026,
  author    = {Urosevic, Pedja},
  title     = {Open Cognitive Protocol (OCP): A Behavioral Benchmark
               for Large Language Models},
  year      = {2026},
  url       = {https://github.com/pedjaurosevic/ocp-protocol},
  version   = {0.3.0}
}

Disclaimer

OCP measures functional cognitive analogs in language models. These measurements describe behavioral and computational properties, not subjective experience. OCP certification levels are operational categories, not ontological claims about sentience or awareness.

_{EDLE Research · v0.3.0 · February 2026 · MIT License}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

Feb 22, 2026

0.1.0

Feb 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocp_protocol-0.3.0.tar.gz (612.1 kB view details)

Uploaded Feb 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ocp_protocol-0.3.0-py3-none-any.whl (79.0 kB view details)

Uploaded Feb 22, 2026 Python 3

File details

Details for the file ocp_protocol-0.3.0.tar.gz.

File metadata

Download URL: ocp_protocol-0.3.0.tar.gz
Upload date: Feb 22, 2026
Size: 612.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ocp_protocol-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`c9350a5d2ad4b53b6ec85d906a95be5a806fa87ad78958a18ed103386deb4df4`
MD5	`486b4bcf4683ec78e015ecace89b1a31`
BLAKE2b-256	`5bccbd5ca22828817b486197d69df4c519b7279730ebb51b1d4441751d4cd0bd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ocp_protocol-0.3.0.tar.gz:

Publisher: publish.yml on pedjaurosevic/ocp-protocol

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ocp_protocol-0.3.0.tar.gz
- Subject digest: c9350a5d2ad4b53b6ec85d906a95be5a806fa87ad78958a18ed103386deb4df4
- Sigstore transparency entry: 976191272
- Sigstore integration time: Feb 22, 2026
Source repository:
- Permalink: pedjaurosevic/ocp-protocol@6222b99ecf9465048a1c7d0c4af1c889815a92d6
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/pedjaurosevic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6222b99ecf9465048a1c7d0c4af1c889815a92d6
- Trigger Event: push

File details

Details for the file ocp_protocol-0.3.0-py3-none-any.whl.

File metadata

Download URL: ocp_protocol-0.3.0-py3-none-any.whl
Upload date: Feb 22, 2026
Size: 79.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ocp_protocol-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0d4a719e85895a958051020ed05cf90d00da2e7198d19be2e7ae37ff09a06fc8`
MD5	`27721d98fa95b9f3c469bcce8e876d05`
BLAKE2b-256	`24dd4d35dbd42a4b62e817828328df9b3a7b54dee182001ddc8e8699f93465da`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ocp_protocol-0.3.0-py3-none-any.whl:

Publisher: publish.yml on pedjaurosevic/ocp-protocol

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ocp_protocol-0.3.0-py3-none-any.whl
- Subject digest: 0d4a719e85895a958051020ed05cf90d00da2e7198d19be2e7ae37ff09a06fc8
- Sigstore transparency entry: 976191274
- Sigstore integration time: Feb 22, 2026
Source repository:
- Permalink: pedjaurosevic/ocp-protocol@6222b99ecf9465048a1c7d0c4af1c889815a92d6
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/pedjaurosevic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6222b99ecf9465048a1c7d0c4af1c889815a92d6
- Trigger Event: push

ocp-protocol 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

What is OCP?

What OCP is NOT

Install & Quick Start

How It Works

The 6 Tests — What They Measure

Three-Layer Architecture

Rate Limiting (v0.3.0)

Supported Providers

CLI Reference

Python API

Plugin System

Theoretical Foundations

Roadmap

Results: Leaderboard

Contributing

Citation

Disclaimer

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance