Skip to main content

Multi-model AI orchestration platform. Plugin any LLM. Ship better code.

Project description

CRTX

Generate. Test. Fix. Review. One command, verified output.

Quick StartThe ProblemThe LoopBenchmarksHow It WorksCommandsSupported Models

python 3.12+ license Apache 2.0 PyPI version


What is CRTX?

CRTX is an AI development intelligence tool that generates, tests, fixes, and reviews code automatically. One command in, verified output out.

It works with any model — Claude, GPT, Gemini, Grok, DeepSeek — and picks the right one for each task. You don't configure pipelines or choose models. You describe what you want and CRTX handles the rest.

crtx loop "Build a REST API with FastAPI, SQLite, search and pagination"

The Problem

Single AI models generate code that looks correct but often has failing tests, broken imports, and missed edge cases. Developers spend 10–30 minutes per generation debugging and fixing AI output before it actually works.

Multi-model pipelines cost 10–15x more without meaningfully improving quality. Four models reviewing each other's prose doesn't catch a broken import statement.

The issue isn't the model. It's the lack of verification. Nobody runs the code before handing it to you.

The Loop

CRTX solves this with the Loop: Generate → Test → Fix → Review.

  1. Generate — The best model for the task writes the code
  2. Test — CRTX runs the code locally: AST parse, import check, pyflakes, pytest, entry point execution
  3. Fix — Failures feed back to the model with structured error context for targeted fixes
  4. Review — An independent Arbiter (always a different model) reviews the final output

Every output is tested before you see it. If tests fail, CRTX fixes them. If the fix cycle stalls, three escalation tiers activate before giving up. If the Arbiter rejects the code, one more fix cycle runs.

The result: code that passes its own tests, has been reviewed by a second model, and comes with a verification report.

Benchmarks

Same 12 prompts, same scoring rubric. CRTX Loop vs. single models vs. multi-model debate:

Condition Avg Score Min Spread Avg Dev Time Cost
Single Sonnet 94% 92% 4 pts 10 min $0.36
Single o3 81% 54% 41 pts 4 min $0.44
Multi-model Debate 88% 75% 25 pts 9 min $5.59
CRTX Loop 99% 98% 2 pts 2 min $1.80

Dev Time = estimated developer minutes to get the output to production (based on test failures, import errors, and entry point issues). Spread = max score minus min score across all prompts.

The Loop scores higher, more consistently, with less post-generation work than any other condition — at a fraction of the cost of multi-model pipelines.

Run the benchmark yourself:

crtx benchmark --quick

How It Works

  ┌─────────┐    ┌──────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐
  │  Route  │ ─→ │ Generate │ ─→ │  Test   │ ─→ │   Fix   │ ─→ │ Review  │ ─→ │ Present │
  └─────────┘    └──────────┘    └─────────┘    └─────────┘    └─────────┘    └─────────┘
       │                              │              │
       │                              └──────────────┘
       │                               ↑ loop until pass
       │
       ├── simple  → fast model, 2 fix iterations
       ├── medium  → balanced model, 3 fix iterations
       └── complex → best model, 5 fix iterations + architecture debate

Route — Classifies your prompt by complexity (simple/medium/complex) and selects the model, fix budget, and timeout tier.

Generate — Produces source files and test files. If no tests are generated, a second call creates comprehensive pytest tests so the fix cycle always has something to verify against.

Test — Five-stage local quality gate: AST parse → import check → pyflakes → pytest → entry point execution. Per-file pytest fallback on collection failures.

Fix — Feeds structured test failures back to the model for targeted fixes. Detects phantom API references (tests importing functions that don't exist in source) and pytest collection failures.

Three-tier gap closing — When the normal fix cycle can't resolve failures:

  • Tier 1 — Diagnose then fix: "analyze the root cause without writing code," then feed the diagnosis back for a targeted fix
  • Tier 2 — Minimal context retry: strip context to only the failing test and its source file, fresh perspective
  • Tier 3 — Second opinion: escalate to a different model with the primary model's diagnosis

Review — An independent Arbiter (always a different model than the generator) reviews for logic errors, security issues, and design problems. On REJECT, triggers one more fix cycle and retests.

Present — Final results with verification report, file list, and cost breakdown.

Key Features

Smart routing — Classifies prompts by complexity and picks the right model, fix budget, and timeout for each task. Simple tasks get fast models. Complex tasks get the best model plus an architecture debate.

Three-tier gap closing — When fixes stall, CRTX escalates: root cause diagnosis, minimal context retry, then a second opinion from a different model. Most stuck cases resolve at tier 1 or 2.

Independent Arbiter review — Every run gets reviewed by a model that didn't write the code. Cross-model review catches errors that self-review misses. Skip with --no-arbiter.

Verified scoring — Every output is tested locally before you see it. The verification report shows exactly which checks passed, how many tests ran, and estimated developer time to production.

Auto-fallback — If a provider goes down mid-run (rate limit, timeout, outage), CRTX substitutes the next best model and keeps going. A 5-minute cooldown prevents hammering a struggling provider.

Apply mode — Write generated code directly to your project with --apply. Interactive diff preview, git branch protection, conflict detection, AST-aware patching, and automatic rollback if post-apply tests fail.

Context injection — Scan your project and inject relevant code into the generation prompt with --context .. AST-aware Python analysis extracts class signatures, function definitions, and import graphs within a configurable token budget.

Quick Start

pip install crtx
crtx setup        # configure your API keys

Then run:

crtx loop "Build a CLI password generator with strength validation and clipboard support"

Commands

Command What it does
crtx loop "task" Generate, test, fix, and review code (default)
crtx run "task" Run a multi-model pipeline (sequential/parallel/debate)
crtx benchmark Run the built-in benchmark suite
crtx repl Interactive shell with session history
crtx review-code Multi-model code review on files or git diffs
crtx improve Review → improve pipeline with cross-model consensus
crtx setup API key configuration
crtx models List available models with fitness scores
crtx estimate "task" Cost estimate before running
crtx sessions Browse past runs
crtx replay <id> Re-display a previous session
crtx dashboard Real-time web dashboard

Supported Models

CRTX works with any model supported by LiteLLM — that's 100+ providers. Out of the box, it's configured for:

Provider Models
Anthropic Claude Opus 4, Sonnet 4
OpenAI GPT-4o, o3
Google Gemini 2.5 Pro, Flash
xAI Grok
DeepSeek DeepSeek R1

Add any LiteLLM-compatible model in ~/.crtx/config.toml.

API Key Setup

Run crtx setup to configure your keys interactively, or set them as environment variables:

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GEMINI_API_KEY=...
export XAI_API_KEY=xai-...
export DEEPSEEK_API_KEY=sk-...

CRTX only needs one provider to work. More providers means more model diversity for routing and Arbiter review.

Contributing

Contributions are welcome. Fork the repo, create a branch, and submit a PR.

The test suite has 1,096 tests — run them with pytest. Linting is ruff check ..

License

Apache 2.0. See LICENSE for details.


Built by TriadAI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crtx-0.2.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crtx-0.2.1-py3-none-any.whl (274.0 kB view details)

Uploaded Python 3

File details

Details for the file crtx-0.2.1.tar.gz.

File metadata

  • Download URL: crtx-0.2.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for crtx-0.2.1.tar.gz
Algorithm Hash digest
SHA256 a131eac9a2f94f3d2650e2e8d4bb6c2b2d796c4601e2ad8c6b115378afe713b7
MD5 34179d76790c928e1302852caac7f00d
BLAKE2b-256 1392a5483e15994469c311c66519862fbc1d64ee10985eabc86031f8abf9bd30

See more details on using hashes here.

File details

Details for the file crtx-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: crtx-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 274.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for crtx-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2c22a218e0222afc1cda6b8eb66d9d599c0177d96eafc2e51ea6b908e0992130
MD5 b3b35f82954d62887cb60742fc19c54f
BLAKE2b-256 1cf6cf196e7311c7ae4c72ec4f188a0a6b6d33d8e206a3604d813a1559f9311f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page