CARL (Coherence-Aware Reinforcement Learning) — information-theoretic reward signals for LLM training via token-level probability distributions

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

wheattoast11

These details have not been verified by PyPI

Project description

CARL Studio

Coherence-Aware Reinforcement Learning

Models don't learn gradually -- they crystallize.

CARL Phase Transition

CARL adds information-theoretic reward signals to RL training that measure how a model generates, not just what it generates. One conservation law. Three reward components. Model-agnostic.

Install

pip install carl-studio

Quick Start

Observe (zero friction -- point at any existing run)

carl observe --trackio https://your-space.hf.space

No training, no config, no GPU. See your model's learning geometry instantly.

Train

from carl_studio import CARLTrainer, TrainingConfig

trainer = CARLTrainer(TrainingConfig(
    run_name="my-first-carl",
    base_model="Qwen/Qwen3.5-9B",
    output_repo="your-username/my-model",
    method="grpo",
    dataset_repo="trl-lib/Capybara",
    compute_target="l4x1",
))
run = await trainer.train()

Or from the CLI:

carl train --model Qwen/Qwen3.5-9B --method grpo --compute l4x1
carl train --send-it                  # full pipeline: SFT -> gate -> GRPO -> eval -> push

Architecture

Layer 4  MCP Server       9 tools for AI agent consumption
         ──────────────────────────────────────────────────
Layer 3  CLI              carl train | eval | observe | align | learn | bench | mcp
         ──────────────────────────────────────────────────
Layer 2  Training         CARLTrainer, CascadeRewardManager, environments
         ──────────────────────────────────────────────────
Layer 1  SDK              ModelSpec, TrainSpec, VRAMBudget, CoherenceProbe
         ──────────────────────────────────────────────────
Layer 0  Primitives       compute_phi, kappa, sigma, PhaseTransitionGate

The Reward

R_CARL = 0.50 * R_coherence + 0.30 * R_cloud + 0.20 * R_discontinuity

Component	Measures
Multiscale coherence	Phi consistency across dyadic block scales
Cloud quality	P(selected) * Phi -- confident AND correct
Discontinuity targeting	Sharp Phi transitions at structurally appropriate locations

Cascade Gating

CARL is length-biased -- a verbose, confident model scores high. Without gating, it dominates sparse task signal and causes mode collapse. The cascade solves this:

Stage A (early):   task rewards only         -- "learn to use tools"
Stage B (gated):   task + CARL rewards       -- "now do it coherently"

The gate self-calibrates from the task metric's running distribution. No hardcoded threshold.

Order Parameter

Phi = 1 - H(P) / log|V|

0 = uniform (maximum uncertainty). 1 = delta (complete certainty).

The Conservation Law

Constant	Value	Meaning
kappa	64/3	Conservation constant
sigma	3/16	Semantic quantum
kappa * sigma	4	Bits per embedding dimension
T*	kappa * d	Decompression boundary

Derived in Bounded Informational Time Crystals. Validated across 6,244 trials in Material Reality. Formally proved in Semantic Realizability.

Key Finding: Phase Transitions

During VLM SFT, the model exhibits a first-order phase transition:

Steps	Phase	Accuracy	Entropy	What happens
0-10	Baseline	3%	1.0	Pre-training distribution intact
10-20	Melting	8%	9.3	Distribution destabilizes completely
20-25	Transition	65%	4.1	Accuracy jumps 57 points in 5 steps
25-35	Crystallization	99%	0.4	Rapid convergence
35-46	Converged	99.3%	0.12	Fully crystallized

Entropy spikes to near-maximum, then accuracy discontinuously jumps once the system passes the critical coupling threshold. Consistent with Kuramoto synchronization in coupled oscillator systems.

CLI

Core triad:

carl observe       See learning geometry on any run (no GPU required)
carl eval          Pass/fail gate on a checkpoint
carl train         Train with CARL rewards (SFT, GRPO, DPO, KTO, ORPO)
carl train --send-it   Full autonomous pipeline: SFT -> gate -> GRPO -> eval -> push

Operations:

carl status <id>   Job status
carl logs <id>     Job logs
carl stop <id>     Cancel a job
carl push          Push checkpoint to Hub
carl bundle        Generate self-contained training script
carl compute       List GPU flavors and pricing
carl setup         First-time setup

Advanced (experimental):

carl align         Realign a drifted model
carl learn         Ingest knowledge, generate data, train
carl bench         Coherence meta-benchmarks
carl mcp           Start MCP server (9 tools for AI agents)
carl dev           Development utilities

Model-Agnostic

ModelSpec.from_pretrained() auto-detects architecture, modality, thinking mode, quantization constraints, and LoRA targets from any HuggingFace config.json. No per-model branches.

Model	Status
Qwen 3.5 9B VLM	Primary -- 94.6% click accuracy
Gemma 4 E4B	Planned
Gemma 4 31B	Planned (multi-GPU)

Compute Backends

Backend	Flag
HuggingFace Jobs	`--compute l4x1` / `a100-large` / `h200`
RunPod	`--compute runpod`
Tinker	`--compute tinker`
Prime Intellect	`--compute prime`
SSH	`--compute ssh`
Local	`--compute local`

Test-Time Training

CARL includes TTT mechanisms for post-deployment adaptation:

SLOT -- hidden delta injection (8 Adam steps, architecture-agnostic)
LoRA micro-update -- rank-1 online adaptation

IP Boundaries

CARL Studio is MIT-licensed. The mathematics -- conservation law, order parameter, reward components -- are independently derivable from the three published papers (CC-BY-4.0).

This package is the open training framework. It does not include the runtime dynamics or autonomous orchestration:

What	Where	License
Conservation law, Phi, rewards	CARL Studio (this repo)	MIT
Observe, eval, train CLI	CARL Studio (this repo)	MIT
Resonance LR modulation	terminals-runtime	BUSL-1.1
SLOT / LoRA micro-update (TTT)	terminals-runtime	BUSL-1.1
Kuramoto oscillator dynamics	terminals-runtime	BUSL-1.1
Coherence diagnosis methodology	terminals-runtime	BUSL-1.1
Audio coherence (CHORD)	Terminals Platform	BUSL-1.1
Cross-substrate isomorphisms	Terminals Platform	BUSL-1.1
Interactive Research Environment	Terminals Platform	BUSL-1.1
Material Reality datasets	Zenodo	CC-BY-4.0

The bifurcation is deliberate: CARL Studio provides the full training loop (observe, eval, train) using published mathematics. The terminals-runtime package adds autonomous features (resonance-aware LR, test-time training, Claude-powered diagnosis) behind BUSL-1.1. Same conservation law, different sides of the boundary.

Citation

@article{desai2026carl,
  title   = {Coherence-Aware Reinforcement Learning},
  author  = {Desai, Tej},
  year    = {2026},
  url     = {https://github.com/wheattoast11/carl},
  note    = {Intuition Labs LLC}
}

License

MIT -- Intuition Labs LLC

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

wheattoast11

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.20.1

May 9, 2026

0.20.0

May 9, 2026

0.18.3

Apr 25, 2026

0.18.2

Apr 24, 2026

0.18.1

Apr 24, 2026

This version

0.3.0

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

carl_studio-0.3.0.tar.gz (2.2 MB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

carl_studio-0.3.0-py3-none-any.whl (215.0 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file carl_studio-0.3.0.tar.gz.

File metadata

Download URL: carl_studio-0.3.0.tar.gz
Upload date: Apr 9, 2026
Size: 2.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for carl_studio-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`aaae2ed09c10c921006e60a2885ebcf3baf18c76a9917321b5f785ae43dd1165`
MD5	`fc8ae604bc6db226b344ca2efbb5db75`
BLAKE2b-256	`43758d476afcaba98aa8821eb54752dae23c3015cd3a5703e4928e0c17a355a1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for carl_studio-0.3.0.tar.gz:

Publisher: publish.yml on wheattoast11/carl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: carl_studio-0.3.0.tar.gz
- Subject digest: aaae2ed09c10c921006e60a2885ebcf3baf18c76a9917321b5f785ae43dd1165
- Sigstore transparency entry: 1267342411
- Sigstore integration time: Apr 9, 2026
Source repository:
- Permalink: wheattoast11/carl@818230c50ec658e4d8cb2bb47a3878ef83561944
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/wheattoast11
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@818230c50ec658e4d8cb2bb47a3878ef83561944
- Trigger Event: release

File details

Details for the file carl_studio-0.3.0-py3-none-any.whl.

File metadata

Download URL: carl_studio-0.3.0-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 215.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for carl_studio-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`07e2e23921b85445147b898fb4e0bdb13c12573274188419b0496bf7df1580e3`
MD5	`02959cdbfbccade4be1937eabdca041a`
BLAKE2b-256	`8be6f870d8ba6b51a50bad82bcbad3d6b6eca2d3ce54e3095455ea40131c379d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for carl_studio-0.3.0-py3-none-any.whl:

Publisher: publish.yml on wheattoast11/carl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: carl_studio-0.3.0-py3-none-any.whl
- Subject digest: 07e2e23921b85445147b898fb4e0bdb13c12573274188419b0496bf7df1580e3
- Sigstore transparency entry: 1267342498
- Sigstore integration time: Apr 9, 2026
Source repository:
- Permalink: wheattoast11/carl@818230c50ec658e4d8cb2bb47a3878ef83561944
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/wheattoast11
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@818230c50ec658e4d8cb2bb47a3878ef83561944
- Trigger Event: release

carl-studio 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

CARL Studio

Install

Quick Start

Observe (zero friction -- point at any existing run)

Train

Architecture

The Reward

Cascade Gating

Order Parameter

The Conservation Law

Key Finding: Phase Transitions

CLI

Model-Agnostic

Compute Backends

Test-Time Training

IP Boundaries

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance