AI Operating System for Coding — routes queries across 30-50+ LLMs with GPU scheduling, LRU caching, evaluation, and feedback-driven routing.

These details have not been verified by PyPI

Project links

Project description

🧠 Cortex-Engine

An AI Operating System for coding — routes queries across 30–50+ models, manages GPU scheduling, caching, evaluation, and continuous feedback.

models = processes   |   GPU = CPU   |   router = scheduler   |   kernel = control plane

Architecture

User Query
    │
    ▼
┌─────────────────────────────────────────────────────────────────┐
│                        Cortex-Engine API  (FastAPI)                  │
│                                                                   │
│  POST /inference                                                  │
│       │                                                           │
│       ▼                                                           │
│  ┌─────────────────────────────────────────────────────────┐     │
│  │              Model Orchestrator (central brain)          │     │
│  │                                                          │     │
│  │  ① Cache Check ──────────► Redis LRU Cache              │     │
│  │       │ miss                                             │     │
│  │       ▼                                                  │     │
│  │  ② Router Engine ─────────► Cluster Detection           │     │
│  │       │                     Model Selection              │     │
│  │       ▼                     Confidence Score            │     │
│  │  ③ Scheduler ──────────────► GPU Assignment             │     │
│  │       │                     Priority Queue              │     │
│  │       ▼                                                  │     │
│  │  ④ Model Worker ──────────► vLLM / Triton (Phase 2)     │     │
│  │       │                                                  │     │
│  │       ▼                                                  │     │
│  │  ⑤ Evaluator ─────────────► Static Analysis             │     │
│  │       │                     LLM Grading (Phase 2)       │     │
│  │       ▼                                                  │     │
│  │  ⑥ Feedback Log ──────────► Redis (rolling 10k)         │     │
│  └─────────────────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────────┘
         │
         ▼
    Model Registry (Redis Hash)
    7 seed models → 50+ in Phase 3

Latency Targets

Component	Target
Router	< 50 ms
Cache check	< 5 ms
Scheduling	< 10 ms
Total overhead	< 100 ms

Project Structure

cortex_engine/
├── main.py                  ← FastAPI app, lifespan, middleware, health/metrics
├── config.py                ← Pydantic Settings (env-based)
├── dependencies.py          ← All Depends() providers
│
├── models/
│   └── schemas.py           ← All Pydantic request/response models + enums
│
├── services/
│   ├── registry.py          ← Redis-backed model registry (7 seed models)
│   ├── router.py            ← Keyword/heuristic router with tiebreak scoring
│   ├── cache_manager.py     ← LRU response cache + warm pool + eviction log
│   ├── scheduler.py         ← Priority queue scheduler (Redis sorted sets)
│   ├── evaluator.py         ← Static analysis + LLM-grade stub
│   ├── feedback.py          ← Rolling feedback log + accuracy stats
│   └── orchestrator.py      ← Central brain: routes→schedules→infers→evals
│
├── routers/
│   ├── inference.py         ← POST /inference, GET /inference/route-preview
│   └── api.py               ← /registry CRUD, /admin (queue/cache/feedback)
│
├── tests/
│   ├── conftest.py          ← Async fixtures with fakeredis (no real Redis needed)
│   ├── test_registry.py     ← 11 tests
│   ├── test_router.py       ← 11 tests
│   ├── test_cache.py        ← 7 tests
│   ├── test_evaluator.py    ← 6 tests
│   ├── test_scheduler.py    ← 7 tests
│   ├── test_orchestrator.py ← 11 tests (integration)
│   └── test_feedback.py     ← 4 tests  (57 total → all pass)
│
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── pytest.ini
└── .env.example

Quickstart

Option A — Local (needs Redis)

# 1. Clone & install
git clone <repo>
cd cortex_engine
pip install -r requirements.txt

# 2. Start Redis
docker run -d -p 6379:6379 redis:7.4-alpine

# 3. Configure
cp .env.example .env

# 4. Run
uvicorn main:app --reload --port 8000

Option B — Docker Compose (recommended)

docker compose up --build
# Optional: include Redis Commander UI
docker compose --profile dev up --build

Run Tests (no Redis needed)

pip install -r requirements.txt
pytest tests/ -v

API Reference

`POST /inference`

Route a query to the best model and get a response.

// Request
{
  "query": "Write a pytest test for a function that reverses a string",
  "preferred_type": null,       // optional: debugging|testing|explanation|...
  "preferred_model": null,      // optional: override routing entirely
  "max_tokens": 2048,
  "temperature": 0.2,
  "evaluate": true
}

// Response
{
  "request_id": "a1b2c3d4e5f6",
  "output": "...",
  "model_used": "starcoder2-7b",
  "route": {
    "cluster": "testing",
    "selected_model": "starcoder2-7b",
    "confidence": 0.72,
    "fallback_models": ["phi-3-mini"],
    "routing_latency_ms": 1.4
  },
  "evaluation": {
    "success": true,
    "score": 0.8,
    "method": "static_analysis",
    "details": "has_content=✓; has_code=✓; no_apology=✓; syntax_ok=✓; keyword_coverage=✓"
  },
  "total_latency_ms": 14.2,
  "cached": false,
  "tokens_used": 87
}

`GET /inference/route-preview?query=...`

Dry-run: see which model would be selected without running inference.

`GET /registry/`

List all registered models with status and metadata.

`POST /registry/`

`PATCH /registry/{model_name}/status`

Set model status: available | loading | busy | offline

`GET /admin/queue`

Current GPU queue depth, running jobs, per-GPU load counts.

`GET /admin/cache`

Cache hit rate, warm pool, recent evictions.

`GET /admin/feedback`

Routing accuracy stats and recent feedback log.

`GET /health`

Liveness check — Redis connectivity + model counts.

`GET /metrics`

Full system metrics: routing accuracy, cache hit rate, GPU loads, queue depth.

Cluster → Model Routing

Cluster	Trigger Keywords	Models
`debugging`	error, exception, traceback, crash, fix bug	codellama-7b
`testing`	pytest, jest, unit test, mock, assert	starcoder2-7b
`explanation`	explain, what is, how does, document	mistral-7b-instruct
`refactoring`	refactor, optimize, clean up, simplify	qwen-coder-14b
`python`	python, django, fastapi, flask, .py	qwen-coder-7b, qwen-coder-14b
`general_code`	code, class, implement, build	deepseek-coder-6.7b
`fallback`	everything else	phi-3-mini

Seed Models (Phase 1 MVP)

Model	Cluster	Size	Latency	GPU
`qwen-coder-7b`	python	7B	250ms	gpu-0
`qwen-coder-14b`	python	14B	400ms	gpu-1
`deepseek-coder-6.7b`	general_code	6.7B	220ms	gpu-0
`codellama-7b`	debugging	7B	270ms	gpu-1
`mistral-7b-instruct`	explanation	7B	300ms	gpu-2
`starcoder2-7b`	testing	7B	260ms	gpu-2
`phi-3-mini`	fallback	3.8B	150ms	gpu-3

Roadmap

Phase 1 (now — MVP)

FastAPI kernel with all core services
Redis-backed model registry (7 models)
Keyword/heuristic router with cluster detection
LRU response cache + warm pool + eviction log
Priority queue GPU scheduler (Redis sorted sets)
Static analysis evaluator
Rolling feedback system + accuracy tracking
59 tests (all passing, no real Redis needed)
Docker Compose stack

Phase 2

Swap inference stub → real vLLM HTTP calls
Embedding-based routing (BGE/E5 + FAISS/Qdrant)
15 models across more language clusters
Ray Serve for distributed model workers
LLM judge model for evaluation (phi-3-mini)

Phase 3

30–50+ models
Self-improving router (daily retraining from feedback)
Cost-aware routing (balance quality vs. $/token)
Multi-model collaboration (chain models)
PostgreSQL for persistent metadata
Reinforcement learning router

Environment Variables

Variable	Default	Description
`REDIS_URL`	`redis://localhost:6379/0`	Redis connection URL
`REDIS_MAX_CONNECTIONS`	`50`	Connection pool size
`PORT`	`8000`	API server port
`WORKERS`	`1`	Uvicorn worker count
`CACHE_TTL_SECONDS`	`3600`	Response cache TTL
`ENABLE_EVALUATION`	`true`	Run evaluator on outputs
`ENABLE_FEEDBACK`	`true`	Log routing feedback

Tech Stack

Layer	Tech
API	FastAPI + Uvicorn
Cache / State	Redis 7 (asyncio)
Scheduling	Redis sorted sets
Config	Pydantic Settings
Testing	pytest-asyncio + fakeredis
Containers	Docker + Docker Compose
Phase 2 Serving	vLLM + Triton
Phase 2 Orchestration	Ray Serve + Kubernetes
Phase 2 Vector DB	FAISS / Qdrant
Phase 3 Metadata	PostgreSQL + SQLAlchemy

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cortex_engine-0.1.0.tar.gz (20.0 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cortex_engine-0.1.0-py3-none-any.whl (27.0 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file cortex_engine-0.1.0.tar.gz.

File metadata

Download URL: cortex_engine-0.1.0.tar.gz
Upload date: Mar 29, 2026
Size: 20.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cortex_engine-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`55d0022cd262515b336830d4802ea402dac2c3e6b2d6b8867aff97ec3d37f0d6`
MD5	`5e342434f61411a09866bc92e4e73f10`
BLAKE2b-256	`db886402f8ffdbb90cee8c5496f5cd8454632e4c3525cce0fff537846ec15eeb`

See more details on using hashes here.

File details

Details for the file cortex_engine-0.1.0-py3-none-any.whl.

File metadata

Download URL: cortex_engine-0.1.0-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 27.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cortex_engine-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c3788a69a72266723942c93f1e68416b3ca9939c425682f7f63d50f0f0f13de5`
MD5	`b9be03618a740277e01d6e319877f956`
BLAKE2b-256	`bf3af594ea764e3d2254bfab7c92a066405c52cf9e488cc69b44e52f7e9774e2`

See more details on using hashes here.

cortex-engine 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🧠 Cortex-Engine

Architecture

Latency Targets

Project Structure

Quickstart

Option A — Local (needs Redis)

Option B — Docker Compose (recommended)

Run Tests (no Redis needed)

API Reference

POST /inference

GET /inference/route-preview?query=...

GET /registry/

POST /registry/

PATCH /registry/{model_name}/status

GET /admin/queue

GET /admin/cache

GET /admin/feedback

GET /health

GET /metrics

Cluster → Model Routing

Seed Models (Phase 1 MVP)

Roadmap

Phase 1 (now — MVP)

Phase 2

Phase 3

Environment Variables

Tech Stack

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`POST /inference`

`GET /inference/route-preview?query=...`

`GET /registry/`

`POST /registry/`

`PATCH /registry/{model_name}/status`

`GET /admin/queue`

`GET /admin/cache`

`GET /admin/feedback`

`GET /health`

`GET /metrics`