Skip to main content

Multi-model AI orchestration platform. Plugin any LLM. Ship better code.

Project description

CRTX

Multi-model AI orchestration with adversarial verification.

Quick StartHow It WorksThe ArbiterSupported ModelsCommandsContributing

python 3.12+ license Apache 2.0 PyPI version


What is CRTX?

Most AI coding tools send your prompt to one model and hope for the best. CRTX sends it to multiple models and makes them argue about it.

Here's what actually happens when you run a task: an Architect designs the approach. An Implementer writes the code. A Refactorer cleans it up. A Verifier checks it. And then an independent Arbiter — running on a different model than the one that wrote the code — reviews everything and decides if it's good enough to ship.

If it's not? The Arbiter sends it back with specific feedback. The pipeline runs again. No human intervention required.

The result is code that's been debated, reviewed, and stress-tested by multiple AI models before you ever see it.

Quick Start

pip install crtx
crtx setup        # configure your API keys
crtx demo         # 60-second guided first run

Or jump straight in:

crtx run "Build a REST API with authentication and rate limiting"

That's it. CRTX handles model selection, stage routing, cost optimization, and cross-model review automatically.

How It Works

Every task flows through a pipeline of specialized stages. Each stage can be assigned to a different AI model based on what it's best at.

Sequential mode (the default) chains four stages together:

  1. Architect — Designs the approach, defines file structure, picks patterns
  2. Implement — Writes the actual code based on the architect's plan
  3. Refactor — Cleans up the implementation: better names, fewer bugs, tighter logic
  4. Verify — Reviews the final output for correctness, edge cases, and test coverage

Each stage receives the output of the previous one. The Architect's plan feeds the Implementer. The Implementer's code feeds the Refactorer. Context accumulates — nothing gets lost between stages.

CRTX also supports parallel mode (all models solve independently, then cross-review and merge the best approach) and debate mode (models write position papers, rebuttals, and final arguments before a judge picks the winner).

The Arbiter

This is the thing that makes CRTX different from just chaining API calls together.

The Arbiter is an independent reviewer that uses a different model than the one that generated the code. It's adversarial by design — its job is to find problems, not to agree.

It returns one of four verdicts:

  • APPROVE — Code meets the spec, no issues found
  • FLAG — Minor concerns, but acceptable to ship
  • REJECT — Significant issues, sends structured feedback back to the pipeline for retry
  • HALT — Critical problems, stops the pipeline immediately

The Arbiter enforces a confidence floor: if a model says "APPROVE" but its confidence score is below 0.50, CRTX automatically downgrades it to FLAG. Low-confidence approvals are meaningless.

You can control how much review you want with --arbiter off|final_only|bookend|full. The default is bookend — the Arbiter reviews the Architect's plan and the Verifier's final output.

Smart Routing

Not every model is good at everything. CRTX knows this.

The routing engine assigns models to stages based on fitness scores, task type, and your chosen strategy:

  • quality_first — Best model for each stage regardless of cost
  • cost_optimized — Cheapest model that meets a minimum quality threshold
  • speed_first — Fastest model per stage
  • hybrid (default) — Quality-first for critical stages (refactor, verify), cost-optimized for everything else

Cross-stage diversity is enforced: no single model gets assigned more than 2 stages. This prevents monoculture — you want different perspectives reviewing the code, not the same model grading its own homework.

Auto-Fallback

If a provider goes down mid-pipeline (rate limit, timeout, outage), CRTX automatically substitutes the next best model and keeps going. No manual intervention, no restart required. A 5-minute cooldown prevents hammering a struggling provider.

Apply Mode

Generated code doesn't have to stay in the terminal. CRTX can write it directly to your project:

crtx run "Add WebSocket support to the chat server" --apply

This gives you an interactive diff preview where you select which files to write. Add --confirm to skip the preview and write immediately.

Safety features: git branch protection (won't write to main/master), conflict detection via SHA-256 checksums, AST-aware patching, and automatic rollback if post-apply tests fail.

Streaming Display

Pipeline output streams in real-time, token by token. You'll see syntax-highlighted code blocks as they're generated, with a pinned status bar at the bottom showing stage progress, running cost, and token count.

Stage indicators update live: ○ pending → ◉ active → ● complete → ⚠ fallback → ✗ failed.

Context Injection

CRTX can scan your project and inject relevant code into the pipeline:

crtx run "Write tests for the auth module" --context .

It uses AST-aware Python analysis to extract class signatures, function definitions, and import graphs — then selects the most relevant files within a configurable token budget. Your models see your actual code patterns, not generic examples.

Supported Models

CRTX works with any model supported by LiteLLM — that's 100+ providers. Out of the box, it's configured for:

Provider Models
Anthropic Claude Opus 4, Sonnet 4
OpenAI GPT-4o, o3
Google Gemini 2.5 Pro, Flash
xAI Grok

Add any LiteLLM-compatible model in ~/.crtx/config.toml.

Commands

Command What it does
crtx run "task" Run a pipeline
crtx demo Guided first-run experience
crtx review-code Multi-model code review on files or git diffs
crtx improve Review → improve pipeline with cross-model consensus
crtx repl Interactive shell with session history
crtx setup API key configuration
crtx models List available models with fitness scores
crtx estimate "task" Cost estimate before running
crtx sessions Browse past runs
crtx replay <id> Re-display a previous session
crtx dashboard Real-time web dashboard

Presets

Don't want to think about configuration? Use a preset:

crtx run "task" --preset fast       # Sequential, streaming, cost-optimized
crtx run "task" --preset thorough   # Full arbiter, quality-first routing
crtx run "task" --preset cheap      # Minimum cost, speed routing
crtx run "task" --preset explore    # Parallel mode, all models
crtx run "task" --preset debate     # Debate mode with judgment

Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│   Architect  │ ──→ │  Implementer │ ──→ │  Refactorer  │ ──→ │   Verifier   │
│  (Claude)    │     │  (GPT-4o)    │     │  (Claude)    │     │    (o3)      │
└──────────────┘     └──────────────┘     └──────────────┘     └──────────────┘
                                                                       │
                                                                       ▼
                                                               ┌──────────────┐
                                                               │   Arbiter    │
                                                               │  (Gemini)    │
                                                               └──────────────┘
                                                                       │
                                                              APPROVE / REJECT

The Arbiter always runs on a different model than the generators. Cross-model review catches errors that self-review misses.

Philosophy

Evidence over claims. The Arbiter doesn't trust self-reported confidence. It verifies independently.

Diversity over consensus. Multiple models with different training data and different failure modes produce better results than one model reviewing its own work.

Safety by default. Apply mode previews before writing. Git branches are protected. Tests run after apply. Rollback is automatic.

Transparency over magic. Every routing decision, every token cost, every arbiter verdict is logged and visible. crtx sessions shows you exactly what happened and why.

Contributing

Contributions are welcome. Fork the repo, create a branch, and submit a PR.

The test suite has 1,045 tests — run them with pytest. Linting is ruff check ..

License

Apache 2.0. See LICENSE for details.


Built by TriadAI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crtx-0.1.1.tar.gz (749.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crtx-0.1.1-py3-none-any.whl (233.9 kB view details)

Uploaded Python 3

File details

Details for the file crtx-0.1.1.tar.gz.

File metadata

  • Download URL: crtx-0.1.1.tar.gz
  • Upload date:
  • Size: 749.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for crtx-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fab6be4d3137ffd990d36243163b57d7375ecdd823fd55c2b5e93ea80821ee0b
MD5 3f49dd884daee50b7d8292f50c6582cf
BLAKE2b-256 93768bd975cda59a0d481be082fb6b66677d9102db1bb91b3f38725fe509ffc5

See more details on using hashes here.

File details

Details for the file crtx-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: crtx-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 233.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for crtx-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5d14060eb95a0fea6f7602cf32ba601a1816bdaa89bca3b537c34dd2528f5e1f
MD5 4195587aa58ba7ec2c484e4c986c5c7e
BLAKE2b-256 f913372ee21f0a794bdf0558f4f92e2d4d53601187d355e9f51625fac75a48e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page