Skip to main content

Multi-model AI development intelligence — learns your codebase, gets smarter every session

Project description

CRTX

Multi-model AI that learns your codebase.

Quick StartGetting StartedHow It WorksThe ArbiterSupported ModelsArchitectureContributing

Python 3.12+ License Any LLM


The Problem

You paste code into an AI model. It looks right. You ship it. Then you find the hallucinated import, the missed edge case, the pattern violation that cascades through your codebase.

Single-model code generation has a blindspot problem. Every model has biases, gaps, and failure modes — and the same model that wrote the bug can't reliably find it.

The Fix

CRTX routes your coding task through multiple AI models in specialized roles, with an independent referee that catches mistakes before they reach your codebase.

Task -> [Architect] -> [Implementer] -> [Refactor] -> [Verify] -> Production Code
             |              |              |             |
         Arbiter         Arbiter        Arbiter       Arbiter
      (different model reviews each stage)

Each model does what it's best at. A different model checks the work. The code that survives is production-ready.


Quick Start

pip install crtx
crtx setup          # Interactive API key configuration
crtx                # Launch interactive session

That's it. crtx setup walks you through API key configuration with live validation. crtx launches an interactive session with a branded terminal UI, real-time pipeline status, and a persistent REPL.


Getting Started

First-Time Setup

crtx setup

Interactive wizard that prompts for API keys (Anthropic, OpenAI, Google, xAI), validates each key against its provider, and saves them to ~/.crtx/keys.env. You need at least one provider configured. For parallel and debate modes, you'll need at least two.

crtx setup --check    # Validate existing keys without re-prompting
crtx setup --reset    # Clear saved keys and reconfigure

Interactive Session (REPL)

crtx

Launches the interactive REPL. The REPL maintains session state — set your mode, routing strategy, and arbiter depth once, then run multiple tasks without repeating flags.

crtx ▸ mode parallel
  Mode set to parallel

crtx ▸ route quality_first
  Route set to quality_first

crtx ▸ Build a REST API with JWT authentication and rate limiting
  # → Interactive config screen → real-time pipeline display → completion summary

crtx ▸ status
  Mode:    parallel
  Route:   quality_first
  Arbiter: bookend

Type help for all commands, exit or Ctrl+C to quit.

Direct Execution

# Run with interactive config screen (choose mode/route/arbiter before launch)
crtx run "Build a REST API with JWT authentication and rate limiting"

# Run with explicit flags (skips config screen)
crtx run "Build a REST API" --mode sequential --route hybrid --arbiter bookend

When you run without explicit flags, CRTX shows an interactive config screen where you can cycle through modes, routing strategies, and arbiter settings with single keypresses before confirming. With explicit flags, the pipeline starts immediately.


How It Works

CRTX uses a sequential pipeline with four stages. Each stage is handled by whichever model scores highest for that role:

Stage Role What It Does
Architect Design the solution Produces a technical scaffold: file structure, interfaces, data models, dependency map.
Implement Write the code Takes the scaffold and produces complete, working implementation with error handling.
Refactor Improve and test Restructures for clarity, adds edge case handling, writes comprehensive test suite.
Verify Validate everything Reviews the complete output for correctness, security, and pattern compliance.

Models don't just hand off and move on — any model can suggest improvements outside its assigned role. The Architect can flag an implementation concern. The Implementer can propose a structural change. Suggestions are tracked, evaluated, and either accepted or escalated to consensus.


The Arbiter

The Arbiter is what makes CRTX fundamentally different from running the same prompt through multiple models.

It's an independent referee. The Arbiter never writes code. It never proposes architecture. Its only job is to find what's wrong with other models' work.

It's always a different model. If Claude wrote the code, GPT-4 or Grok arbitrates. If Gemini designed the architecture, Claude checks it. The system enforces this automatically — the same model never grades its own work.

It assumes there are bugs. The Arbiter's prompt starts from skepticism: "Assume there are errors until proven otherwise." This inverts the typical AI review pattern where models default to "looks good" and hedge with minor suggestions.

It can stop the pipeline. Four verdicts:

Verdict Action
APPROVE Continue. Output is sound.
FLAG Continue, but inject warnings for the next stage to address.
REJECT Re-run this stage with structured feedback. Max 2 retries.
HALT Stop everything. Present analysis for human decision.

When the Arbiter rejects, it doesn't just say "this is wrong." It provides structured feedback with severity, category, exact location, evidence, and a suggested fix — all injected into the retry prompt so the generating model knows exactly what to address.

Configurable Review Depth

Not every task needs full review. Choose your safety level:

crtx run "..." --arbiter full       # Review every stage (critical features)
crtx run "..." --arbiter bookend    # Review architecture + final output (default)
crtx run "..." --arbiter final      # Review final output only (prototypes)
crtx run "..." --arbiter off        # No review (rapid iteration)

Or in the REPL: arbiter full sets the depth for all subsequent tasks in the session.


Supported Models

CRTX is model-agnostic. Any LLM that supports chat completions works. Add a new model by adding a TOML entry — no code changes required.

Pre-Configured Providers

Provider Models Best At
Anthropic Claude Opus, Sonnet, Haiku Refactoring, verification, nuanced review
OpenAI GPT-4o, o3-mini Fast implementation, broad language support
Google Gemini 2.5 Pro, Flash Architecture, large context reasoning
xAI Grok 4, Grok 3 Independent analysis, alternative perspectives

Adding Models

# config/models.toml
[models.deepseek-v3]
provider = "deepseek"
model = "deepseek-chat"
roles = ["implement", "refactor"]
cost_per_1k_input = 0.0001
cost_per_1k_output = 0.0002

DeepSeek, Llama, Mistral, Ollama (local), vLLM (self-hosted) — if LiteLLM supports it, CRTX supports it.


Presets

Instead of specifying --mode, --route, and --arbiter on every command, use a preset:

Preset Mode Route Arbiter Use Case
balanced (default) sequential hybrid bookend Standard development. Best cost/quality balance.
fast sequential speed-first off Rapid iteration. Cheapest models, no review.
cheap sequential cost-optimized off Budget-conscious. Lowest cost above fitness threshold.
thorough sequential quality-first full Maximum quality. Best models, every stage reviewed.
explore parallel hybrid bookend Fan out to 3+ models, cross-review, synthesize the best.
debate debate quality-first full Structured debate. Best for architecture decisions and tradeoffs.
crtx run "Build a REST API" --preset explore
crtx run "Build a REST API" --preset fast

# Override any part of a preset
crtx run "Build a REST API" --preset explore --arbiter full

In the REPL:

crtx [balanced] ▸ preset explore
  Mode set to parallel, route hybrid, arbiter bookend

crtx [explore] ▸ preset fast
  Mode set to sequential, route speed-first, arbiter off

No preset flag defaults to balanced. If you manually change mode/route/arbiter after selecting a preset, the prompt shows the current settings instead of a preset name.


Presets

Most users never need to touch --mode, --route, or --arbiter directly. Presets bundle them:

Preset Mode Routing Arbiter Use Case
balanced (default) sequential hybrid bookend Standard development. Best cost/quality balance.
fast sequential speed-first off Rapid iteration. Cheapest models, no review.
cheap sequential cost-optimized off Budget-conscious. Cheapest models above fitness threshold.
thorough sequential quality-first full Critical features. Best models, every stage reviewed.
explore parallel hybrid bookend Fan out to 3+ models, cross-review, synthesize the best.
debate debate quality-first full Structured debate between models. Architecture decisions.
crtx run "Build a REST API" --preset explore
crtx run "Build a REST API" --preset fast
crtx run "Build a REST API"                    # balanced (default)

Presets are starting points — override any part:

crtx run "Build a REST API" --preset explore --arbiter full

In the REPL:

crtx [balanced] ▸ preset explore
  Mode set to parallel, route hybrid, arbiter bookend

crtx [explore] ▸ preset fast
  Mode set to sequential, route speed-first, arbiter off

Mode How It Works Best For
Sequential (default) Architect → Implement → Refactor → Verify, each building on the last Standard development, most tasks
Parallel All models solve independently, cross-review, score, merge best approach Complex problems with multiple valid solutions
Debate Position papers → rebuttals → final arguments → judgment Architectural decisions, tradeoff analysis
crtx run "..." --mode sequential   # Default
crtx run "..." --mode parallel     # Fan-out + consensus
crtx run "..." --mode debate       # Structured debate

Or in the REPL: mode parallel sets the mode for all subsequent tasks in the session.


Smart Routing

CRTX assigns models to pipeline roles based on fitness benchmarks — each model is scored on how well it performs as Architect, Implementer, Refactorer, and Verifier. Four routing strategies let you optimize for what matters:

Strategy Behavior
quality-first Best model per role regardless of cost
cost-optimized Cheapest model above fitness threshold
speed-first Lowest-latency models preferred
hybrid (default) Quality for critical stages, cost-optimized for early stages
crtx run "..." --route hybrid          # Default
crtx run "..." --route quality-first   # Max quality
crtx estimate "..." --compare-routes   # Compare costs

Or in the REPL: route quality_first sets the strategy for all subsequent tasks.


Configuration

API Keys

The recommended way to configure API keys is crtx setup, which validates keys and saves them for future sessions. Keys are loaded in this order (highest priority first):

  1. Environment variablesANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY, XAI_API_KEY
  2. ~/.crtx/keys.env — User-level keys saved by crtx setup
  3. .env in current directory — Project-level overrides

You only need keys for the providers you want to use. At least one provider must be configured.

# Recommended: interactive setup with validation
crtx setup

# Or set environment variables directly
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...

Pipeline Defaults

Pipeline defaults (mode, routing strategy, arbiter depth, timeout) are configured in config/defaults.toml. These can be overridden per-run via CLI flags or the interactive config screen.


CLI Commands

Command Description
crtx Launch interactive session (REPL mode)
crtx setup Configure API keys interactively
crtx setup --check Validate existing API keys
crtx setup --reset Clear keys and reconfigure
crtx run Run a full pipeline on a task
crtx plan Expand a rough idea into a structured task spec
crtx estimate Estimate cost before running
crtx review Multi-model PR review (CI/CD integration)
crtx review-code Multi-model review of existing code files
crtx improve Multi-model improvement of existing code
crtx models list Show registered models with fitness scores
crtx models check Verify API key connectivity
crtx config show Display current pipeline configuration
crtx sessions list Browse past pipeline runs
crtx sessions show View full session details
crtx dashboard Launch real-time browser visualization
# Interactive session — persistent state, branded UI
crtx

# Run a task with interactive config screen
crtx run "Add WebSocket support to the existing Express server"

# Run with explicit flags (skips config screen)
crtx run "Add WebSocket support" --mode sequential --route hybrid --arbiter bookend

# Plan first, then run
crtx plan "Build a data processing pipeline" --run

# Review a PR diff
crtx review --diff changes.patch --fail-on critical

# Review existing code with multiple models
crtx review-code src/middleware.py --preset thorough

# Improve existing code
crtx improve src/rate_limiter/ --focus "error handling, type safety"

# Launch the real-time dashboard
crtx dashboard --port 8420

The CLI uses Rich for a premium terminal experience — branded ASCII art, interactive config screens, real-time pipeline status with stage-by-stage progress, color-coded Arbiter verdicts, and a post-completion summary with export actions.


Review & Improve Existing Code

CRTX doesn't just generate code — it can review and improve code you've already written.

Multi-Model Review

Have 3+ models independently review your code, cross-check each other's findings, and produce a ranked report:

crtx review-code src/middleware.py
crtx review-code src/rate_limiter/ --preset thorough

Each model finds bugs, security issues, and design problems independently. Then they review each other's findings — agreeing, disagreeing, and catching what others missed. Issues found by multiple models rank highest. Single-source findings are flagged as lower confidence.

Multi-Model Improve

Have 3+ models each produce an improved version of your code, vote on the best, and synthesize:

crtx improve src/middleware.py
crtx improve src/rate_limiter/ --focus "error handling, type safety"

Like parallel mode, but starting from your existing code instead of a task description. The Arbiter reviews the final improvement against your original. You see a diff before anything is written.


CRTX supports domain-specific verification rules that the Arbiter checks in addition to general code quality:

# config/domain/my_rules.toml
[rules.schema_consistency]
description = "All database models must use integer primary keys"
severity = "critical"
pattern = "UUIDField|uuid4"
action = "reject"

[rules.test_coverage]
description = "Every new service must have corresponding test file"
severity = "warning"

We use CRTX to build a financial services platform — our custom rules enforce schema patterns, threading conventions, and audit trail requirements specific to our domain. You can do the same for yours.


How We Use It

We built CRTX because we needed it. Our team uses CRTX as the primary development workflow for a financial services operating system with 2,900+ tests. Every new feature, every module, every refactor goes through the pipeline. The Arbiter has caught schema mismatches, hallucinated dependencies, over-engineered abstractions, and integration failures — all before code review.

CRTX isn't a research project. It's a production tool that we bet our own codebase on every day.


Cost

CRTX adds model calls, which cost tokens. Here's what a typical task looks like:

Configuration Est. Cost per Task Use Case
No Arbiter ~$4.30 Rapid iteration
Final Only ~$5.10 Prototyping
Bookend (default) ~$5.80 Standard development
Full Arbiter ~$7.30 Critical features

At the default Bookend depth and ~15 tasks/week, the Arbiter adds about $90/month. One production bug it catches pays for a year of reviews.


Documentation

Document Description
Architecture Core pipeline design, consensus protocol, technology stack
Model-Agnostic System Plugin architecture, LiteLLM adapter, dynamic role assignment
Arbiter Layer Independent review system, verdicts, feedback injection
Build Spec MVP scope, day-by-day build plan, technical decisions

Architecture

triad/
├── cli.py                  # Typer + Rich terminal interface
├── cli_display.py          # Branded UI: logos, config screen, live display, summary
├── repl.py                 # Interactive REPL with session state
├── orchestrator.py         # Pipeline engine (sequential, parallel, debate)
├── planner.py              # Task planner (crtx plan)
├── providers/              # LiteLLM adapter + model registry
├── routing/                # Fitness-based model-to-role assignment
├── arbiter/                # Independent adversarial review engine
├── consensus/              # Cross-domain suggestions + voting protocol
├── context/                # AST-aware codebase scanner + context builder
├── persistence/            # SQLite session storage + export
├── ci/                     # Multi-model PR review for CI/CD
├── dashboard/              # Real-time WebSocket visualization (optional)
├── schemas/                # Pydantic v2 models (all data contracts)
├── prompts/                # Jinja2 role prompt templates
├── output/                 # File writer + Markdown report renderer
└── config/                 # TOML configuration (models, defaults, routing)

Contributing

We welcome contributions! Please read CONTRIBUTING.md before submitting a PR.

Important: All contributors must sign our Contributor License Agreement before their first PR can be merged. This is handled automatically via CLA Assistant — you'll be prompted when you open your first PR.


License

Apache 2.0 — see LICENSE for details.


Built by TriadAI — Every session smarter than the last.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crtx-0.1.0.tar.gz (682.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crtx-0.1.0-py3-none-any.whl (232.7 kB view details)

Uploaded Python 3

File details

Details for the file crtx-0.1.0.tar.gz.

File metadata

  • Download URL: crtx-0.1.0.tar.gz
  • Upload date:
  • Size: 682.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for crtx-0.1.0.tar.gz
Algorithm Hash digest
SHA256 84a050de4cdc77580e35afe4db76ff65bc6c0f98cb6ebdb684185393d36ff1f6
MD5 c67dc96e22937a359cb11cb6b7f86fd0
BLAKE2b-256 b5f3debf3af69da25cd6573b716e6a35c0894accd69f88ee8955e5ab6d484647

See more details on using hashes here.

File details

Details for the file crtx-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: crtx-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 232.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for crtx-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a5c6882f171833cea54236d10956b3683690acc7b765983326dff704ff998b03
MD5 66778ca164b062130e4669902d3e2f7d
BLAKE2b-256 b92e65cc9c01d9b87dea484d07bc5055316f0af7357b2176dda597a44f8d2e2c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page