Skip to main content

AI Operating System for Coding โ€” routes queries across 30-50+ LLMs with GPU scheduling, LRU caching, evaluation, and feedback-driven routing.

Project description

๐Ÿง  Cortex-Engine

An AI Operating System for coding โ€” routes queries across 30โ€“50+ models, manages GPU scheduling, caching, evaluation, and continuous feedback.

models = processes   |   GPU = CPU   |   router = scheduler   |   kernel = control plane

Architecture

User Query
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        Cortex-Engine API  (FastAPI)                  โ”‚
โ”‚                                                                   โ”‚
โ”‚  POST /inference                                                  โ”‚
โ”‚       โ”‚                                                           โ”‚
โ”‚       โ–ผ                                                           โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚  โ”‚              Model Orchestrator (central brain)          โ”‚     โ”‚
โ”‚  โ”‚                                                          โ”‚     โ”‚
โ”‚  โ”‚  โ‘  Cache Check โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Redis LRU Cache              โ”‚     โ”‚
โ”‚  โ”‚       โ”‚ miss                                             โ”‚     โ”‚
โ”‚  โ”‚       โ–ผ                                                  โ”‚     โ”‚
โ”‚  โ”‚  โ‘ก Router Engine โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Cluster Detection           โ”‚     โ”‚
โ”‚  โ”‚       โ”‚                     Model Selection              โ”‚     โ”‚
โ”‚  โ”‚       โ–ผ                     Confidence Score            โ”‚     โ”‚
โ”‚  โ”‚  โ‘ข Scheduler โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ GPU Assignment             โ”‚     โ”‚
โ”‚  โ”‚       โ”‚                     Priority Queue              โ”‚     โ”‚
โ”‚  โ”‚       โ–ผ                                                  โ”‚     โ”‚
โ”‚  โ”‚  โ‘ฃ Model Worker โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ vLLM / Triton (Phase 2)     โ”‚     โ”‚
โ”‚  โ”‚       โ”‚                                                  โ”‚     โ”‚
โ”‚  โ”‚       โ–ผ                                                  โ”‚     โ”‚
โ”‚  โ”‚  โ‘ค Evaluator โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Static Analysis             โ”‚     โ”‚
โ”‚  โ”‚       โ”‚                     LLM Grading (Phase 2)       โ”‚     โ”‚
โ”‚  โ”‚       โ–ผ                                                  โ”‚     โ”‚
โ”‚  โ”‚  โ‘ฅ Feedback Log โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Redis (rolling 10k)         โ”‚     โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
    Model Registry (Redis Hash)
    7 seed models โ†’ 50+ in Phase 3

Latency Targets

Component Target
Router < 50 ms
Cache check < 5 ms
Scheduling < 10 ms
Total overhead < 100 ms

Project Structure

cortex_engine/
โ”œโ”€โ”€ main.py                  โ† FastAPI app, lifespan, middleware, health/metrics
โ”œโ”€โ”€ config.py                โ† Pydantic Settings (env-based)
โ”œโ”€โ”€ dependencies.py          โ† All Depends() providers
โ”‚
โ”œโ”€โ”€ models/
โ”‚   โ””โ”€โ”€ schemas.py           โ† All Pydantic request/response models + enums
โ”‚
โ”œโ”€โ”€ services/
โ”‚   โ”œโ”€โ”€ registry.py          โ† Redis-backed model registry (7 seed models)
โ”‚   โ”œโ”€โ”€ router.py            โ† Keyword/heuristic router with tiebreak scoring
โ”‚   โ”œโ”€โ”€ cache_manager.py     โ† LRU response cache + warm pool + eviction log
โ”‚   โ”œโ”€โ”€ scheduler.py         โ† Priority queue scheduler (Redis sorted sets)
โ”‚   โ”œโ”€โ”€ evaluator.py         โ† Static analysis + LLM-grade stub
โ”‚   โ”œโ”€โ”€ feedback.py          โ† Rolling feedback log + accuracy stats
โ”‚   โ””โ”€โ”€ orchestrator.py      โ† Central brain: routesโ†’schedulesโ†’infersโ†’evals
โ”‚
โ”œโ”€โ”€ routers/
โ”‚   โ”œโ”€โ”€ inference.py         โ† POST /inference, GET /inference/route-preview
โ”‚   โ””โ”€โ”€ api.py               โ† /registry CRUD, /admin (queue/cache/feedback)
โ”‚
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ conftest.py          โ† Async fixtures with fakeredis (no real Redis needed)
โ”‚   โ”œโ”€โ”€ test_registry.py     โ† 11 tests
โ”‚   โ”œโ”€โ”€ test_router.py       โ† 11 tests
โ”‚   โ”œโ”€โ”€ test_cache.py        โ† 7 tests
โ”‚   โ”œโ”€โ”€ test_evaluator.py    โ† 6 tests
โ”‚   โ”œโ”€โ”€ test_scheduler.py    โ† 7 tests
โ”‚   โ”œโ”€โ”€ test_orchestrator.py โ† 11 tests (integration)
โ”‚   โ””โ”€โ”€ test_feedback.py     โ† 4 tests  (57 total โ†’ all pass)
โ”‚
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ pytest.ini
โ””โ”€โ”€ .env.example

Quickstart

Option A โ€” Local (needs Redis)

# 1. Clone & install
git clone <repo>
cd cortex_engine
pip install -r requirements.txt

# 2. Start Redis
docker run -d -p 6379:6379 redis:7.4-alpine

# 3. Configure
cp .env.example .env

# 4. Run
uvicorn main:app --reload --port 8000

Option B โ€” Docker Compose (recommended)

docker compose up --build
# Optional: include Redis Commander UI
docker compose --profile dev up --build

Run Tests (no Redis needed)

pip install -r requirements.txt
pytest tests/ -v

API Reference

POST /inference

Route a query to the best model and get a response.

// Request
{
  "query": "Write a pytest test for a function that reverses a string",
  "preferred_type": null,       // optional: debugging|testing|explanation|...
  "preferred_model": null,      // optional: override routing entirely
  "max_tokens": 2048,
  "temperature": 0.2,
  "evaluate": true
}

// Response
{
  "request_id": "a1b2c3d4e5f6",
  "output": "...",
  "model_used": "starcoder2-7b",
  "route": {
    "cluster": "testing",
    "selected_model": "starcoder2-7b",
    "confidence": 0.72,
    "fallback_models": ["phi-3-mini"],
    "routing_latency_ms": 1.4
  },
  "evaluation": {
    "success": true,
    "score": 0.8,
    "method": "static_analysis",
    "details": "has_content=โœ“; has_code=โœ“; no_apology=โœ“; syntax_ok=โœ“; keyword_coverage=โœ“"
  },
  "total_latency_ms": 14.2,
  "cached": false,
  "tokens_used": 87
}

GET /inference/route-preview?query=...

Dry-run: see which model would be selected without running inference.

GET /registry/

List all registered models with status and metadata.

POST /registry/

Register a new model.

PATCH /registry/{model_name}/status

Set model status: available | loading | busy | offline

GET /admin/queue

Current GPU queue depth, running jobs, per-GPU load counts.

GET /admin/cache

Cache hit rate, warm pool, recent evictions.

GET /admin/feedback

Routing accuracy stats and recent feedback log.

GET /health

Liveness check โ€” Redis connectivity + model counts.

GET /metrics

Full system metrics: routing accuracy, cache hit rate, GPU loads, queue depth.


Cluster โ†’ Model Routing

Cluster Trigger Keywords Models
debugging error, exception, traceback, crash, fix bug codellama-7b
testing pytest, jest, unit test, mock, assert starcoder2-7b
explanation explain, what is, how does, document mistral-7b-instruct
refactoring refactor, optimize, clean up, simplify qwen-coder-14b
python python, django, fastapi, flask, .py qwen-coder-7b, qwen-coder-14b
general_code code, class, implement, build deepseek-coder-6.7b
fallback everything else phi-3-mini

Seed Models (Phase 1 MVP)

Model Cluster Size Latency GPU
qwen-coder-7b python 7B 250ms gpu-0
qwen-coder-14b python 14B 400ms gpu-1
deepseek-coder-6.7b general_code 6.7B 220ms gpu-0
codellama-7b debugging 7B 270ms gpu-1
mistral-7b-instruct explanation 7B 300ms gpu-2
starcoder2-7b testing 7B 260ms gpu-2
phi-3-mini fallback 3.8B 150ms gpu-3

Roadmap

Phase 1 (now โ€” MVP)

  • FastAPI kernel with all core services
  • Redis-backed model registry (7 models)
  • Keyword/heuristic router with cluster detection
  • LRU response cache + warm pool + eviction log
  • Priority queue GPU scheduler (Redis sorted sets)
  • Static analysis evaluator
  • Rolling feedback system + accuracy tracking
  • 59 tests (all passing, no real Redis needed)
  • Docker Compose stack

Phase 2

  • Swap inference stub โ†’ real vLLM HTTP calls
  • Embedding-based routing (BGE/E5 + FAISS/Qdrant)
  • 15 models across more language clusters
  • Ray Serve for distributed model workers
  • LLM judge model for evaluation (phi-3-mini)

Phase 3

  • 30โ€“50+ models
  • Self-improving router (daily retraining from feedback)
  • Cost-aware routing (balance quality vs. $/token)
  • Multi-model collaboration (chain models)
  • PostgreSQL for persistent metadata
  • Reinforcement learning router

Environment Variables

Variable Default Description
REDIS_URL redis://localhost:6379/0 Redis connection URL
REDIS_MAX_CONNECTIONS 50 Connection pool size
PORT 8000 API server port
WORKERS 1 Uvicorn worker count
CACHE_TTL_SECONDS 3600 Response cache TTL
ENABLE_EVALUATION true Run evaluator on outputs
ENABLE_FEEDBACK true Log routing feedback

Tech Stack

Layer Tech
API FastAPI + Uvicorn
Cache / State Redis 7 (asyncio)
Scheduling Redis sorted sets
Config Pydantic Settings
Testing pytest-asyncio + fakeredis
Containers Docker + Docker Compose
Phase 2 Serving vLLM + Triton
Phase 2 Orchestration Ray Serve + Kubernetes
Phase 2 Vector DB FAISS / Qdrant
Phase 3 Metadata PostgreSQL + SQLAlchemy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cortex_engine-0.1.0.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cortex_engine-0.1.0-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file cortex_engine-0.1.0.tar.gz.

File metadata

  • Download URL: cortex_engine-0.1.0.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cortex_engine-0.1.0.tar.gz
Algorithm Hash digest
SHA256 55d0022cd262515b336830d4802ea402dac2c3e6b2d6b8867aff97ec3d37f0d6
MD5 5e342434f61411a09866bc92e4e73f10
BLAKE2b-256 db886402f8ffdbb90cee8c5496f5cd8454632e4c3525cce0fff537846ec15eeb

See more details on using hashes here.

File details

Details for the file cortex_engine-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cortex_engine-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cortex_engine-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c3788a69a72266723942c93f1e68416b3ca9939c425682f7f63d50f0f0f13de5
MD5 b9be03618a740277e01d6e319877f956
BLAKE2b-256 bf3af594ea764e3d2254bfab7c92a066405c52cf9e488cc69b44e52f7e9774e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page