AI Operating System for Coding โ routes queries across 30-50+ LLMs with GPU scheduling, LRU caching, evaluation, and feedback-driven routing.
Project description
๐ง Cortex-Engine
An AI Operating System for coding โ routes queries across 30โ50+ models, manages GPU scheduling, caching, evaluation, and continuous feedback.
models = processes | GPU = CPU | router = scheduler | kernel = control plane
Architecture
User Query
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Cortex-Engine API (FastAPI) โ
โ โ
โ POST /inference โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Model Orchestrator (central brain) โ โ
โ โ โ โ
โ โ โ Cache Check โโโโโโโโโโโบ Redis LRU Cache โ โ
โ โ โ miss โ โ
โ โ โผ โ โ
โ โ โก Router Engine โโโโโโโโโโบ Cluster Detection โ โ
โ โ โ Model Selection โ โ
โ โ โผ Confidence Score โ โ
โ โ โข Scheduler โโโโโโโโโโโโโโโบ GPU Assignment โ โ
โ โ โ Priority Queue โ โ
โ โ โผ โ โ
โ โ โฃ Model Worker โโโโโโโโโโโบ vLLM / Triton (Phase 2) โ โ
โ โ โ โ โ
โ โ โผ โ โ
โ โ โค Evaluator โโโโโโโโโโโโโโบ Static Analysis โ โ
โ โ โ LLM Grading (Phase 2) โ โ
โ โ โผ โ โ
โ โ โฅ Feedback Log โโโโโโโโโโโบ Redis (rolling 10k) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
Model Registry (Redis Hash)
7 seed models โ 50+ in Phase 3
Latency Targets
| Component | Target |
|---|---|
| Router | < 50 ms |
| Cache check | < 5 ms |
| Scheduling | < 10 ms |
| Total overhead | < 100 ms |
Project Structure
cortex_engine/
โโโ main.py โ FastAPI app, lifespan, middleware, health/metrics
โโโ config.py โ Pydantic Settings (env-based)
โโโ dependencies.py โ All Depends() providers
โ
โโโ models/
โ โโโ schemas.py โ All Pydantic request/response models + enums
โ
โโโ services/
โ โโโ registry.py โ Redis-backed model registry (7 seed models)
โ โโโ router.py โ Keyword/heuristic router with tiebreak scoring
โ โโโ cache_manager.py โ LRU response cache + warm pool + eviction log
โ โโโ scheduler.py โ Priority queue scheduler (Redis sorted sets)
โ โโโ evaluator.py โ Static analysis + LLM-grade stub
โ โโโ feedback.py โ Rolling feedback log + accuracy stats
โ โโโ orchestrator.py โ Central brain: routesโschedulesโinfersโevals
โ
โโโ routers/
โ โโโ inference.py โ POST /inference, GET /inference/route-preview
โ โโโ api.py โ /registry CRUD, /admin (queue/cache/feedback)
โ
โโโ tests/
โ โโโ conftest.py โ Async fixtures with fakeredis (no real Redis needed)
โ โโโ test_registry.py โ 11 tests
โ โโโ test_router.py โ 11 tests
โ โโโ test_cache.py โ 7 tests
โ โโโ test_evaluator.py โ 6 tests
โ โโโ test_scheduler.py โ 7 tests
โ โโโ test_orchestrator.py โ 11 tests (integration)
โ โโโ test_feedback.py โ 4 tests (57 total โ all pass)
โ
โโโ Dockerfile
โโโ docker-compose.yml
โโโ requirements.txt
โโโ pytest.ini
โโโ .env.example
Quickstart
Option A โ Local (needs Redis)
# 1. Clone & install
git clone <repo>
cd cortex_engine
pip install -r requirements.txt
# 2. Start Redis
docker run -d -p 6379:6379 redis:7.4-alpine
# 3. Configure
cp .env.example .env
# 4. Run
uvicorn main:app --reload --port 8000
Option B โ Docker Compose (recommended)
docker compose up --build
# Optional: include Redis Commander UI
docker compose --profile dev up --build
Run Tests (no Redis needed)
pip install -r requirements.txt
pytest tests/ -v
API Reference
POST /inference
Route a query to the best model and get a response.
// Request
{
"query": "Write a pytest test for a function that reverses a string",
"preferred_type": null, // optional: debugging|testing|explanation|...
"preferred_model": null, // optional: override routing entirely
"max_tokens": 2048,
"temperature": 0.2,
"evaluate": true
}
// Response
{
"request_id": "a1b2c3d4e5f6",
"output": "...",
"model_used": "starcoder2-7b",
"route": {
"cluster": "testing",
"selected_model": "starcoder2-7b",
"confidence": 0.72,
"fallback_models": ["phi-3-mini"],
"routing_latency_ms": 1.4
},
"evaluation": {
"success": true,
"score": 0.8,
"method": "static_analysis",
"details": "has_content=โ; has_code=โ; no_apology=โ; syntax_ok=โ; keyword_coverage=โ"
},
"total_latency_ms": 14.2,
"cached": false,
"tokens_used": 87
}
GET /inference/route-preview?query=...
Dry-run: see which model would be selected without running inference.
GET /registry/
List all registered models with status and metadata.
POST /registry/
Register a new model.
PATCH /registry/{model_name}/status
Set model status: available | loading | busy | offline
GET /admin/queue
Current GPU queue depth, running jobs, per-GPU load counts.
GET /admin/cache
Cache hit rate, warm pool, recent evictions.
GET /admin/feedback
Routing accuracy stats and recent feedback log.
GET /health
Liveness check โ Redis connectivity + model counts.
GET /metrics
Full system metrics: routing accuracy, cache hit rate, GPU loads, queue depth.
Cluster โ Model Routing
| Cluster | Trigger Keywords | Models |
|---|---|---|
debugging |
error, exception, traceback, crash, fix bug | codellama-7b |
testing |
pytest, jest, unit test, mock, assert | starcoder2-7b |
explanation |
explain, what is, how does, document | mistral-7b-instruct |
refactoring |
refactor, optimize, clean up, simplify | qwen-coder-14b |
python |
python, django, fastapi, flask, .py | qwen-coder-7b, qwen-coder-14b |
general_code |
code, class, implement, build | deepseek-coder-6.7b |
fallback |
everything else | phi-3-mini |
Seed Models (Phase 1 MVP)
| Model | Cluster | Size | Latency | GPU |
|---|---|---|---|---|
qwen-coder-7b |
python | 7B | 250ms | gpu-0 |
qwen-coder-14b |
python | 14B | 400ms | gpu-1 |
deepseek-coder-6.7b |
general_code | 6.7B | 220ms | gpu-0 |
codellama-7b |
debugging | 7B | 270ms | gpu-1 |
mistral-7b-instruct |
explanation | 7B | 300ms | gpu-2 |
starcoder2-7b |
testing | 7B | 260ms | gpu-2 |
phi-3-mini |
fallback | 3.8B | 150ms | gpu-3 |
Roadmap
Phase 1 (now โ MVP)
- FastAPI kernel with all core services
- Redis-backed model registry (7 models)
- Keyword/heuristic router with cluster detection
- LRU response cache + warm pool + eviction log
- Priority queue GPU scheduler (Redis sorted sets)
- Static analysis evaluator
- Rolling feedback system + accuracy tracking
- 59 tests (all passing, no real Redis needed)
- Docker Compose stack
Phase 2
- Swap inference stub โ real vLLM HTTP calls
- Embedding-based routing (BGE/E5 + FAISS/Qdrant)
- 15 models across more language clusters
- Ray Serve for distributed model workers
- LLM judge model for evaluation (phi-3-mini)
Phase 3
- 30โ50+ models
- Self-improving router (daily retraining from feedback)
- Cost-aware routing (balance quality vs. $/token)
- Multi-model collaboration (chain models)
- PostgreSQL for persistent metadata
- Reinforcement learning router
Environment Variables
| Variable | Default | Description |
|---|---|---|
REDIS_URL |
redis://localhost:6379/0 |
Redis connection URL |
REDIS_MAX_CONNECTIONS |
50 |
Connection pool size |
PORT |
8000 |
API server port |
WORKERS |
1 |
Uvicorn worker count |
CACHE_TTL_SECONDS |
3600 |
Response cache TTL |
ENABLE_EVALUATION |
true |
Run evaluator on outputs |
ENABLE_FEEDBACK |
true |
Log routing feedback |
Tech Stack
| Layer | Tech |
|---|---|
| API | FastAPI + Uvicorn |
| Cache / State | Redis 7 (asyncio) |
| Scheduling | Redis sorted sets |
| Config | Pydantic Settings |
| Testing | pytest-asyncio + fakeredis |
| Containers | Docker + Docker Compose |
| Phase 2 Serving | vLLM + Triton |
| Phase 2 Orchestration | Ray Serve + Kubernetes |
| Phase 2 Vector DB | FAISS / Qdrant |
| Phase 3 Metadata | PostgreSQL + SQLAlchemy |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cortex_engine-0.1.0.tar.gz.
File metadata
- Download URL: cortex_engine-0.1.0.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55d0022cd262515b336830d4802ea402dac2c3e6b2d6b8867aff97ec3d37f0d6
|
|
| MD5 |
5e342434f61411a09866bc92e4e73f10
|
|
| BLAKE2b-256 |
db886402f8ffdbb90cee8c5496f5cd8454632e4c3525cce0fff537846ec15eeb
|
File details
Details for the file cortex_engine-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cortex_engine-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3788a69a72266723942c93f1e68416b3ca9939c425682f7f63d50f0f0f13de5
|
|
| MD5 |
b9be03618a740277e01d6e319877f956
|
|
| BLAKE2b-256 |
bf3af594ea764e3d2254bfab7c92a066405c52cf9e488cc69b44e52f7e9774e2
|