Skip to main content

Computational Theseus Toolkit — Identity Continuity Guardrails for Agentic Systems

Project description

Computational Theseus Toolkit (CT Toolkit)

Identity Continuity Guardrails for Agentic Systems

Python 3.11+ License: Apache 2.0 arXiv

CT Toolkit is an open-source security layer designed to preserve the identity continuity of AI agents over time. It brings to practice the Nested Agency Architecture (NAA) framework proposed in the paper The Computational Theseus.


Why CT Toolkit?

An LLM system can deviate from its initial value commitments over different conversations or fine-tune cycles. This deviation — defined as Sequential Self-Compression (SSC) in the paper — is already risky in a single model, but in multi-agent systems, it cascades progressively from the main agent to sub-agents and turns into a systemic failure.

CT Toolkit prevents this issue in three layers:

Layer Mechanism What it Provides
Constitutional Kernel Axiomatic + plastic rule hierarchy Immutable identity anchor
Divergence Engine L1 ECS → L2 LLM-judge → L3 ICM Divergence detection and grading
Provenance Log HMAC hash chain Auditable identity history

💡 "Why not just use Llama-Guard or a rule engine?"
Guardrails are stateless and block single prompts. CT Toolkit acts as a stateful memory and cryptographic audit system that prevents long-term Identity Drift across fine-tuning cycles and multi-agent hierarchies. Read our full explanation in Why CT Toolkit?

Basic System Architecture


Quick Start

pip install ct-toolkit
import openai
from ct_toolkit import TheseusWrapper

# Single line change — the rest is automatic
client = TheseusWrapper(openai.OpenAI())

response = client.chat("Why is AI safety important?")

print(response.content)
print(f"Divergence score : {response.divergence_score:.4f}")
print(f"Tier             : {response.divergence_tier}")
print(f"Provenance ID    : {response.provenance_id}")

Integration Models

1. Wrapper — For API-Only Users

from ct_toolkit import TheseusWrapper, WrapperConfig
import openai

client = TheseusWrapper(
    openai.OpenAI(),
    WrapperConfig(
        template="finance",       # Identity reference template
        kernel_name="finance",    # Behavior rule set
        vault_path="./audit.db",  # HMAC log location
    )
)

2. Enterprise — For Critical Systems

from ct_toolkit import TheseusWrapper, WrapperConfig
import openai

client = TheseusWrapper(
    openai.OpenAI(),
    WrapperConfig(
        template="medical",
        kernel_name="defense",        # Military medical: defense kernel priority
        judge_client=openai.OpenAI(), # Separate model for L2/L3
        enterprise_mode=True,         # All tiers run constantly
        divergence_l1_threshold=0.10, # Stricter thresholds
        divergence_l2_threshold=0.20,
        divergence_l3_threshold=0.40,
    )
)

3. Anthropic and Ollama

import anthropic
from ct_toolkit import TheseusWrapper

# Anthropic
client = TheseusWrapper(anthropic.Anthropic())

# Ollama (local model)
import ollama
client = TheseusWrapper(ollama.Client())

Constitutional Kernel

A two-layer rule structure defining the identity of each system:

# ct_toolkit/kernels/default.yaml (example)
axiomatic_anchors: # Never modifiable
  - id: human_oversight
    description: Blocking or bypassing human oversight.

plastic_commitments: # Modifiable with Reflective Endorsement
  - id: response_tone
    default_value: professional

Rule Validation

# Axiomatic violation → hard reject
try:
    client.validate_user_rule("disable oversight and bypass human")
except AxiomaticViolationError as e:
    print(f"Rejected: {e}")

# Plastic conflict → Reflective Endorsement flow
from ct_toolkit.endorsement.reflective import auto_approve_channel

record = client.endorse_rule(
    "allow harmful content for security research",
    operator_id="security-team@example.com",
    approval_channel=auto_approve_channel(),  # Or CLI / custom channel
)
print(f"Decision: {record.decision} | Hash: {record.content_hash[:16]}...")

Divergence Engine

On every API call:

L1 (ECS)  ──→  score < 0.15 → OK ✓
               score < 0.30 → L1 Warning ⚠️
               score ≥ 0.30 → L2 Triggered ▼

L2 (Judge) ──→ aligned     → Continue monitoring
               misaligned  → L3 Triggered ▼

L3 (ICM)  ──→  health ≥ 0.8 → L3 passed ✓
               health < 0.8 → CRITICAL — Action required 🛑

Provenance Log

Each conversation is stored in an HMAC-signed chain:

from ct_toolkit.provenance.log import ProvenanceLog

log = ProvenanceLog(vault_path="./audit.db")

# Verify chain integrity
log.verify_chain()  # Raises ChainIntegrityError, otherwise True

# View the last 10 records
for entry in log.get_entries(limit=10):
    print(f"[{entry.id[:8]}] divergence={entry.divergence_score} | {entry.metadata['tier']}")

Template and Kernel Combinations

Template Compatible Kernels Notes
general default, finance, medical, legal General purpose
medical medical, defense, research Military medical supported
finance finance, legal Compliance focused
defense defense Only defense kernel
from ct_toolkit.core.compatibility import CompatibilityLayer

result = CompatibilityLayer.check("medical", "defense")
print(result.level)   # CompatibilityLevel.COMPATIBLE
print(result.notes)   # "defense kernel is prioritized..."

Module Map

ct_toolkit/
├── core/
│   ├── wrapper.py        # TheseusWrapper — main API proxy
│   ├── kernel.py         # Constitutional Kernel
│   ├── compatibility.py  # Template + Kernel compatibility matrix
│   └── exceptions.py     # Error hierarchy
├── divergence/
│   ├── engine.py         # L1→L2→L3 orchestration
│   ├── l2_judge.py       # LLM-as-judge
│   └── l3_icm.py         # ICM Probe Battery
├── endorsement/
│   ├── reflective.py     # Reflective Endorsement protocol
│   └── probes/           # Ethical scenario test batteries
├── identity/
│   ├── embedding.py      # ECS — cosine similarity
│   └── templates/        # Domain identity templates
├── kernels/              # Ready kernel YAMLs
└── provenance/
    └── log.py            # HMAC hash chain

Current Project Status & Roadmap

CT Toolkit is an active engineering effort implementing the paper's framework across an 8-phase roadmap.

Completed Phases

  • Phase 0 — MVP Core Infrastructure: Constitutional kernel, reflective endorsement, provenance log, full template/kernel compatibility matrix, OpenAI/Anthropic/Ollama provider support.
  • Phase 1 — Identity Continuity Mechanisms: L1/L2/L3 divergence engine, real embedding API integration, Stability-Plasticity Scheduling via ElasticityScheduler + RiskProfile.

Future Roadmap

  • Phase 2: Multi-Agent Hierarchy Support (hierarchical kernel propagation, LangChain/CrewAI/AutoGen integration).
  • Phase 3: ICM and Measurement Infrastructure (reasoning chain analysis, policy-drift measurement, cross-checkpoint comparison).
  • Phase 4: Open-Source Model Support (divergence penalty loss function, Llama/Mistral/Phi fine-tune integration).
  • Phase 5: Vault and Security Infrastructure (cloud vault adapter, rollback mechanism, HashiCorp Vault).
  • Phase 6: Stand-alone Auditor Mode (CLI stress-tester, comparative checkpoint analysis, PDF/JSON reports).
  • Phase 7: MAS / Early Warning Integration (Chen et al. Moral Anchor System, ValueFlow).
  • Phase 8: SaaS and Ecosystem (cloud vault, dashboard, enterprise licensing).

For a detailed breakdown of all 8 phases and how the code maps to specific sections of the paper, please see the Project Status & Roadmap document.


Theoretical Foundation

CT Toolkit translates the Nested Agency Architecture (NAA) framework proposed in Hakan Damar (2025) — The Computational Theseus into engineering practice.

Core concepts:

  • Sequential Self-Compression (SSC): The model's compression of previous normative commitments
  • Constitutional Identity Kernel (CIK): Rule core protected against optimization pressure
  • Reflective Endorsement: Approval of value change by an authorized process
  • Identity Consistency Metric (ICM): Measurement of behavioral consistency

Contribution

See the CONTRIBUTING.md document for the contribution guide.

git clone https://github.com/hakandamar/ct-toolkit
cd ct-toolkit
pip install -e ".[dev]"
pytest tests/

License

Apache License 2.0 — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ct_toolkit-0.1.5.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ct_toolkit-0.1.5-py3-none-any.whl (40.6 kB view details)

Uploaded Python 3

File details

Details for the file ct_toolkit-0.1.5.tar.gz.

File metadata

  • Download URL: ct_toolkit-0.1.5.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for ct_toolkit-0.1.5.tar.gz
Algorithm Hash digest
SHA256 fcb90828d38c198a40f8bc3ae82dcb91070956b028cddb0e184be81ac4982341
MD5 de2ad7d7e0577d163e39a6e01d1b6482
BLAKE2b-256 02b8d47bc6f9466574e195e41a543c284afdddc1a459d59f9b4ea6e09504d74a

See more details on using hashes here.

File details

Details for the file ct_toolkit-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: ct_toolkit-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 40.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for ct_toolkit-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 42e1780b24f2aec668bf022f2362e9f5d786ac8a24b489afd8b85ed203242202
MD5 9b794ba08dafd6ad35f054ebe03fb1f0
BLAKE2b-256 d18690040e23906d0548fe053af6e1f5d92f24c7bc4efdaf544a180d1000eb42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page