Skip to main content

Multi-model AI orchestration platform. Plugin any LLM. Ship better code.

Project description

CRTX

Multi-model AI orchestration with adversarial verification.

Quick StartHow It WorksThe ArbiterSupported ModelsCommandsContributing

python 3.12+ license Apache 2.0 PyPI version


What is CRTX?

Most AI coding tools send your prompt to one model and hope for the best. CRTX sends it to multiple models and makes them argue about it.

Here's what actually happens when you run a task: an Architect designs the approach. An Implementer writes the code. A Refactorer cleans it up. A Verifier checks it. And then an independent Arbiter — running on a different model than the one that wrote the code — reviews everything and decides if it's good enough to ship.

If it's not? The Arbiter sends it back with specific feedback. The pipeline runs again. No human intervention required.

The result is code that's been debated, reviewed, and stress-tested by multiple AI models before you ever see it.

Quick Start

pip install crtx
crtx setup        # configure your API keys
crtx demo         # 60-second guided first run

Or jump straight in:

crtx run "Build a REST API with authentication and rate limiting"

That's it. CRTX handles model selection, stage routing, cost optimization, and cross-model review automatically.

How It Works

Every task flows through a pipeline of specialized stages. Each stage can be assigned to a different AI model based on what it's best at.

Sequential mode (the default) chains four stages together:

  1. Architect — Designs the approach, defines file structure, picks patterns
  2. Implement — Writes the actual code based on the architect's plan
  3. Refactor — Cleans up the implementation: better names, fewer bugs, tighter logic
  4. Verify — Reviews the final output for correctness, edge cases, and test coverage

Each stage receives the output of the previous one. The Architect's plan feeds the Implementer. The Implementer's code feeds the Refactorer. Context accumulates — nothing gets lost between stages.

CRTX also supports parallel mode (all models solve independently, then cross-review and merge the best approach) and debate mode (models write position papers, rebuttals, and final arguments before a judge picks the winner).

The Arbiter

This is the thing that makes CRTX different from just chaining API calls together.

The Arbiter is an independent reviewer that uses a different model than the one that generated the code. It's adversarial by design — its job is to find problems, not to agree.

It returns one of four verdicts:

  • APPROVE — Code meets the spec, no issues found
  • FLAG — Minor concerns, but acceptable to ship
  • REJECT — Significant issues, sends structured feedback back to the pipeline for retry
  • HALT — Critical problems, stops the pipeline immediately

The Arbiter enforces a confidence floor: if a model says "APPROVE" but its confidence score is below 0.50, CRTX automatically downgrades it to FLAG. Low-confidence approvals are meaningless.

You can control how much review you want with --arbiter off|final_only|bookend|full. The default is bookend — the Arbiter reviews the Architect's plan and the Verifier's final output.

Smart Routing

Not every model is good at everything. CRTX knows this.

The routing engine assigns models to stages based on fitness scores, task type, and your chosen strategy:

  • quality_first — Best model for each stage regardless of cost
  • cost_optimized — Cheapest model that meets a minimum quality threshold
  • speed_first — Fastest model per stage
  • hybrid (default) — Quality-first for critical stages (refactor, verify), cost-optimized for everything else

Cross-stage diversity is enforced: no single model gets assigned more than 2 stages. This prevents monoculture — you want different perspectives reviewing the code, not the same model grading its own homework.

Auto-Fallback

If a provider goes down mid-pipeline (rate limit, timeout, outage), CRTX automatically substitutes the next best model and keeps going. No manual intervention, no restart required. A 5-minute cooldown prevents hammering a struggling provider.

Apply Mode

Generated code doesn't have to stay in the terminal. CRTX can write it directly to your project:

crtx run "Add WebSocket support to the chat server" --apply

This gives you an interactive diff preview where you select which files to write. Add --confirm to skip the preview and write immediately.

Safety features: git branch protection (won't write to main/master), conflict detection via SHA-256 checksums, AST-aware patching, and automatic rollback if post-apply tests fail.

Streaming Display

Pipeline output streams in real-time, token by token. You'll see syntax-highlighted code blocks as they're generated, with a pinned status bar at the bottom showing stage progress, running cost, and token count.

Stage indicators update live: ○ pending → ◉ active → ● complete → ⚠ fallback → ✗ failed.

Context Injection

CRTX can scan your project and inject relevant code into the pipeline:

crtx run "Write tests for the auth module" --context .

It uses AST-aware Python analysis to extract class signatures, function definitions, and import graphs — then selects the most relevant files within a configurable token budget. Your models see your actual code patterns, not generic examples.

Supported Models

CRTX works with any model supported by LiteLLM — that's 100+ providers. Out of the box, it's configured for:

Provider Models
Anthropic Claude Opus 4, Sonnet 4
OpenAI GPT-4o, o3
Google Gemini 2.5 Pro, Flash
xAI Grok

Add any LiteLLM-compatible model in ~/.crtx/config.toml.

Commands

Command What it does
crtx run "task" Run a pipeline
crtx demo Guided first-run experience
crtx review-code Multi-model code review on files or git diffs
crtx improve Review → improve pipeline with cross-model consensus
crtx repl Interactive shell with session history
crtx setup API key configuration
crtx models List available models with fitness scores
crtx estimate "task" Cost estimate before running
crtx sessions Browse past runs
crtx replay <id> Re-display a previous session
crtx dashboard Real-time web dashboard

Presets

Don't want to think about configuration? Use a preset:

crtx run "task" --preset fast       # Sequential, streaming, cost-optimized
crtx run "task" --preset thorough   # Full arbiter, quality-first routing
crtx run "task" --preset cheap      # Minimum cost, speed routing
crtx run "task" --preset explore    # Parallel mode, all models
crtx run "task" --preset debate     # Debate mode with judgment

Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Architect  │ ──→ │  Implementer │ ──→ │  Refactorer  │ ──→ │   Verifier   │
│  (Claude)    │     │  (GPT-4o)    │     │  (Claude)    │     │    (o3)      │
└──────────────┘     └──────────────┘     └──────────────┘     └──────────────┘
                                                                       │
                                                                       ▼
                                                               ┌──────────────┐
                                                               │   Arbiter    │
                                                               │  (Gemini)    │
                                                               └──────────────┘
                                                                       │
                                                              APPROVE / REJECT

The Arbiter always runs on a different model than the generators. Cross-model review catches errors that self-review misses.

Philosophy

Evidence over claims. The Arbiter doesn't trust self-reported confidence. It verifies independently.

Diversity over consensus. Multiple models with different training data and different failure modes produce better results than one model reviewing its own work.

Safety by default. Apply mode previews before writing. Git branches are protected. Tests run after apply. Rollback is automatic.

Transparency over magic. Every routing decision, every token cost, every arbiter verdict is logged and visible. crtx sessions shows you exactly what happened and why.

Contributing

Contributions are welcome. Fork the repo, create a branch, and submit a PR.

The test suite has 1,045 tests — run them with pytest. Linting is ruff check ..

License

Apache 2.0. See LICENSE for details.


Built by TriadAI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crtx-0.2.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crtx-0.2.0-py3-none-any.whl (274.1 kB view details)

Uploaded Python 3

File details

Details for the file crtx-0.2.0.tar.gz.

File metadata

  • Download URL: crtx-0.2.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for crtx-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fe13cc06823712b901fe4e2af0cb6b08c83dee5dbf35937960b38170a932fa2a
MD5 201c06f0619859410fec558f4796d9ee
BLAKE2b-256 866f46f7fa793f0356545a9d62ba8efac5ebb44e3720a819a10ebaaf3e0b5af1

See more details on using hashes here.

File details

Details for the file crtx-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: crtx-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 274.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for crtx-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1567208595830e1ec51697ea2e27ae75122dce429a569bb738eadb26acd4145a
MD5 f52ace5ce3595991f92a0cc1718d4d9f
BLAKE2b-256 ced46b34a14d95778a1e2bf2e0eb9d75a08709be242f91dddb2e3d9f321d160a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page