AI-powered CI/CD runner intelligence for GitLab — priority-aware routing with carbon-conscious scheduling

These details have not been verified by PyPI

Project links

Project description

🎯 RunnerIQ

Less noise. More signal. Zero alert fatigue.

Most decisions are instant. AI handles the hard ones. Advisory mode lets teams build trust before granting autonomy.

1,171+ tests · 8 focused modules · 1 unified flow · Carbon-aware routing Addresses GitLab's 10-year-old runner scheduling issue (1,008+ comments)

Wiki · Architecture · Contributing · Timeline

What is RunnerIQ?

Less noise. More signal. Zero alert fatigue.

RunnerIQ is an intelligent CI/CD operations layer that filters the noise so your team only sees what matters. Built on GitLab's Duo Agent Platform.

The Problem

🔴 Pipeline fails at 2 AM → nobody notices until standup
🔴 10 "lint failed" alerts flood Slack → real failures get buried
🔴 Flaky tests trigger investigations → 30 min wasted, it was transient
🔴 Every failure looks the same → no severity, no context, no routing

How RunnerIQ Fixes It

You run one command. RunnerIQ handles the rest.

Noisy alerts → RunnerIQ filters flaky tests, groups duplicates, routes by severity
Pipeline fails → RunnerIQ diagnoses the root cause in ~20 seconds
Runner selection → RunnerIQ recommends the optimal runner with carbon cost comparison

Most decisions are instant and free. AI kicks in only when genuine reasoning is needed — failure triage, anomaly explanation, carbon-aware trade-offs.

💡 Under the hood: 8 focused modules, a rules-first scoring engine, and a 4-stage noise reduction pipeline. See Architecture →

Key Differentiators

✅ Noise reduction first — 4-stage pipeline filters flaky tests, groups duplicates, routes by severity before anything reaches your team
✅ Pipeline failure autopilot — AI-powered root cause analysis in ~20 seconds, posted directly on your issue or MR
✅ Instant decisions, zero API cost — 85% of routing decisions are deterministic rules (<100ms, $0)
✅ AI for the 15% that need real thinking — Claude handles genuine toss-ups, failure triage, and carbon trade-offs
✅ Starts as recommendations. Earns trust over time. — Advisory → Supervised → Autonomous
✅ Non-blocking by design — RunnerIQ never replaces GitLab's scheduler, so there's nothing to fail over
✅ Bonus: carbon-aware routing — Real-time electricity grid data makes your CI/CD greener (MCP + Electricity Maps API)
✅ Works WITH good tagging, not instead of it
✅ 1,171+ tests passing across 8 modules with 60%+ coverage enforced in CI
✅ Intelligent Orchestration Flow — AI-powered 4-module pipeline published to GitLab Duo Agent Platform

🔇 Noise Reduction — Built-In, Not Bolted On

RunnerIQ's alerting pipeline filters noise at 4 levels before anything reaches your team:

Stage	Module	What It Does	Noise Reduced
1	`FlakyDetector`	Detects fail→pass retry patterns	~30% of false alerts
2	`SuppressionEngine`	Rule-based filtering (allow_failure, experimental branches)	~25% more
3	`AlertGrouper`	Batches similar alerts in 15-min windows	10 alerts → 1 notification
4	`NotificationRouter`	Routes by severity with cooldown dedup	Right channel, right time

Result: Only actionable alerts reach your team. Everything else is logged, grouped, or digested.

🚨 Pipeline Failure Autopilot

"Save the Claude calls for things that actually need natural language reasoning like incident triage or root cause analysis." — Useful-Process9033, r/devops

That's exactly what we built.

When your pipeline breaks, AI reads the job logs, fetches recent commit diffs, correlates error messages with code changes, classifies the failure type, and posts a structured diagnosis report directly on your issue or MR. No human needs to read 500 lines of log output. Verified live in Session #3021351.

v4.6.0 breakthrough: The agent now posts structured Markdown diagnosis reports directly inline on issues/MRs and autonomously creates follow-up tasks with labels when it identifies gaps — demonstrated live in Task #193, which was created by the agent itself.

Scenario	Handler	Latency	Cost
Pipeline failed with cryptic error	AI (Autopilot)	~20s	~$0.01
Classify failure: config error vs dependency issue	AI (Autopilot)	~20s	~$0.01
"Is this a flaky test or a real regression?"	AI (Autopilot)	~20s	~$0.01
Job duration spiked 3x — expected or anomaly?	AI (context)	~2-3s	~$0.003

🏃 Intelligent Runner Recommendations

"If GitLab would prioritize jobs on protected branches I'd be so happy" — SchlaWiener4711, r/devops

RunnerIQ scores each runner on speed, fit, capacity, and carbon cost. When the top-2 runners score within 15%, AI breaks the tie by weighing carbon intensity, historical reliability, and workload patterns.

The recommendation tells your team: "For this deploy job, runner-docker-large in FR (58 gCO₂/kWh) is optimal — 75% capacity available, exact tag match, and 83% lower carbon than runner-docker-medium in DE (340 gCO₂/kWh)."

Scenario	Handler	Latency	Cost
Runner A: 20% load, Runner B: 90% load	Rules engine	<100ms	$0
Standard deploy to tagged production runner	Rules engine	<100ms	$0
Two runners score within 15% of each other	AI (toss-up)	~2-3s	~$0.003
"Which runner minimizes CO₂ for this lint job?"	AI (carbon MCP)	~2-3s	~$0.003
"Why did RunnerIQ recommend runner-gpu-2?"	AI (explain)	~2-3s	~$0.003

The 10-Year Problem, Solved

GitLab #14976 asked for runner priority. 1,008+ comments later, no solution exists. RunnerIQ delivers:

Priority scoring — Production deploys score higher than lint jobs (configurable YAML rules)
Protected branch boost — Jobs on main/production automatically get priority (exactly what SchlaWiener4711 asked for)
Intelligent recommendations — Not just priority, but which specific runner is optimal and why
Failure diagnosis — When pipelines break, AI-powered root cause analysis in ~20 seconds
Carbon awareness — Every recommendation includes environmental impact data
Trust progression — Starts as recommendations. Earns trust over time.

🛠️ CLI Tools

# Start monitoring — zero config needed
runneriq run

# Health check — verify your setup (5 checks: GitLab API, Anthropic, carbon, config, tests)
runneriq doctor

# Explain why a job was assigned to a specific runner
runneriq explain <job_id>

# View the audit trail of all decisions
runneriq audit

# Emergency: remove all RunnerIQ-managed tags
runneriq reset-tags

🌱 Bonus: Carbon-Aware Routing

Competing for the Eco-Friendly Agents prize ($3K)

RunnerIQ includes real-time carbon intensity data from Electricity Maps API, enabling carbon-aware runner selection. This is a bonus capability — the core value is noise reduction and failure diagnosis.

4 MCP tools for carbon data (zone intensity, forecast, optimal window, comparison)
CO₂ savings tracker: FIFO vs intelligent routing comparison
Priority-weighted carbon scoring in runner recommendations

Why it's separate: User research showed sustainability isn't yet a buying decision driver. We built it because it's technically interesting and prize-eligible, but it's not the reason you'd deploy RunnerIQ.

Priority-Based Carbon Weights

Priority	Carbon Weight	Rationale
CRITICAL	5%	Speed is everything. Carbon is a tiebreaker only.
HIGH	20%	Prefer green runner if <10% slower.
MEDIUM	35%	Accept up to 20% speed trade-off for green.
LOW	50%	Carbon is the primary factor. Check forecast for deferral.

MCP Carbon Tools

Tool	Purpose	Called When
`get_fleet_carbon_summary()`	Fleet-wide carbon ranking (greenest first)	Every tie-break decision (first call)
`estimate_job_carbon_cost()`	CO2 estimate per runner: `Power(kW) x Duration(h) x Intensity`	Top 2-3 candidate runners
`get_carbon_forecast()`	Forecast + deferral recommendation	LOW/MEDIUM jobs in high-carbon zones
`get_carbon_intensity_now()`	Real-time intensity for a single zone	On-demand lookups

Carbon Metrics

Metric	Description
CO2 saved today	Grams saved vs. FIFO baseline
Green routing rate	% of jobs routed to low-carbon runners
Jobs deferred	Jobs shifted to cleaner electricity windows
Carbon per pipeline	CO2 footprint breakdown by runner/zone
Fleet avg intensity	Weighted average gCO2eq/kWh across fleet

Carbon Dashboard

A single-file HTML dashboard at localhost:PORT/dashboard with 3 screens:

Fleet Map: Runner cards with carbon intensity badges (🟢/🟡/🔴) and utilization
Savings Tracker: CO2 saved today/week, green routing rate, 30-day trend
24h Forecast Heatmap: Runners x hours grid, best batch job windows highlighted

Carbon Quick Start

# Option 1: Demo mode (no API key needed)
export CARBON_DEMO_MODE=true
python -m runneriq
# Open localhost:PORT/dashboard

# Option 2: Live data (free Electricity Maps token)
export ELECTRICITY_MAPS_TOKEN=your_token_here
export RUNNER_ZONE_runner_1=DE          # Germany
export RUNNER_ZONE_runner_2=DK-DK1      # Denmark (wind-heavy)
export RUNNER_ZONE_runner_3=FR           # France (nuclear, low carbon)
export RUNNER_ZONE_runner_4=US-CAL-CISO  # California (solar peaks midday)
python -m runneriq

Demo mode uses hardcoded intensities that show dramatic contrast: DE=340 (red), FR=58 (green), DK-DK1=95 (green), US-CAL-CISO=210 (amber).

Carbon Source Files

File	Description
`src/carbon/models.py`	6 dataclasses: CarbonIntensity, CarbonForecast, DeferDecision, etc.
`src/carbon/electricity_maps_client.py`	API client with caching, retry, triple fallback, demo mode
`src/carbon/mcp_server.py`	CarbonMCPTools: 4 tools + Anthropic tool definitions
`src/carbon/co2_tracker.py`	CO2SavingsTracker with file persistence
`src/carbon/settings.py`	All carbon env vars + demo config
`src/carbon/dashboard.py`	Flask blueprint: 4 API endpoints + HTML serving
`src/carbon/dashboard.html`	Self-contained dashboard (dark theme, auto-refresh)

Getting Started

RunnerIQ works out of the box with sensible defaults:

Zero config needed — runneriq run starts monitoring immediately
Advisory mode by default — recommends, never acts without permission
Customize when ready — YAML config for priority rules, alert routing, suppression rules

Advanced features (carbon routing, custom scoring weights, PagerDuty integration) are available but never required.

For judges: "RunnerIQ starts with the #1 DevOps pain point: alert fatigue. Our 4-stage noise reduction pipeline filters flaky tests, deduplicates alerts, and routes by severity — before any AI is called. When pipelines actually break, the Autopilot diagnoses root cause in ~20 seconds. 85% of routing decisions are instant and free. Carbon-aware routing is built in. We built what the community asked for."

For the critics: "You said 'save Claude for incident triage and root cause analysis.' We did. Our Pipeline Failure Autopilot reads job logs, correlates commits, classifies failures, and recommends fixes — in ~20 seconds. The rules engine handles scheduling. AI handles reasoning. And every recommendation includes the carbon cost of FIFO vs. intelligent routing."

For enterprise users: "Predictable costs. Instant decisions, zero API cost for 85% of routing. AI-powered failure diagnosis. Carbon impact tracking. Full audit trail. Advisory by default — your team stays in control."

📊 Simulated Impact Report

The following projections use realistic fleet parameters. Actual results depend on fleet size, job mix, and runner configuration.

Scenario: Mid-Size Team (10 Runners, ~200 Jobs/Day)

Metric	Without RunnerIQ	With RunnerIQ	Improvement
Avg. job queue wait	~45s	~12s	73% reduction
Runner idle time	~35%	~12%	66% reduction
Failed job retries (wrong runner)	~8/day	~1/day	87% reduction
Carbon per pipeline	2.0 gCO₂e	1.4 gCO₂e	30% reduction
Monthly carbon savings	—	~120 gCO₂e	~1.4 kgCO₂e/year

How the Savings Break Down

Queue optimization — Jobs are matched to the right runner immediately, not round-robin'd to whatever's free
Carbon-aware routing — When two runners score within 15%, AI picks the one in the lower-carbon region/time-zone
Failure prevention — Tag matching + capacity checks prevent "job assigned to incompatible runner" failures
Idle reduction — Workload balancing spreads jobs across the fleet instead of overloading hot runners

📊 Project Stats

Metric	Value
Tests	1,165+ passing
Modules	5 focused modules (monitor, analyzer, assigner, optimizer, alerting)
Merged MRs	135+
Agent Tools	10 (P0 + P1 + P2)
CLI Commands	5 (`run`, `doctor`, `explain`, `audit`, `reset-tags`)
MCP Tools	4 (carbon routing)
Decision Split	~85% instant (<100ms) / ~15% AI (~2-3s)
Language	Python 3.10+ (96.4%)
Carbon Data	Real-time via Electricity Maps API

Architecture

The architecture diagram and decision flow below show the full technical picture. For most users, the Getting Started section above is all you need.

flowchart LR
    Fail["Pipeline fails"] --> Diag["🧠 AI diagnoses\n~20s, 5 tools"]
    Diag --> Report["Structured report\nposted on Issue/MR"]

    Job["Job needs\nrunner"] --> Score["Score all\ncompatible runners"]
    Score --> Gap{"Top-2 margin\n> 15%?"}
    Gap -- "Yes (85%)" --> Rules["✅ Rules recommend\n< 100ms, $0"]
    Gap -- "No (15%)" --> Claude["🧠 AI reasons\n~2-3s, with carbon"]
    Rules --> Rec["Advisory recommendation\n+ carbon comparison"]
    Claude --> Rec
    Rec --> Team["Team decides\n(or auto-apply\nwith --execute)"]

    style Rules fill:#dcfce7,stroke:#22c55e
    style Claude fill:#fef3c7,stroke:#f59e0b
    style Diag fill:#fef3c7,stroke:#f59e0b
    style Report fill:#dbeafe,stroke:#3b82f6

Pipeline Failure Autopilot (top flow): When a pipeline fails, AI analyzes failing jobs, reads log traces, correlates with recent commits, and posts a structured diagnosis report directly on the issue or MR. Triggerable from any comment via @ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon.

Intelligent Runner Routing (bottom flow): For job assignment, the rules engine handles 85% of decisions instantly. AI is called only for genuine toss-ups (runners within 15% margin), where it weighs carbon intensity, historical reliability, and workload patterns.

Full system architecture diagram

flowchart TB
    Pipeline["🔄 GitLab CI/CD Pipeline"] --> RunnerIQ

    subgraph RunnerIQ["RunnerIQ (Non-Blocking Layer)"]
        direction TB
        FC["FlowController + RunContext"]
        FC --> A1["🔍 Module 1: Monitor\nTrack runner fleet"]
        A1 --> A2["📊 Module 2: Analyzer\nScore jobs 0-100"]
        A2 --> A3["🎯 Module 3: Assigner\nRules 85% + AI 15%"]
        A3 --> A4["⚡ Module 4: Optimizer\nPerformance + Carbon"]
    end

    subgraph Orchestration["🆕 Orchestration Flow (v4.6.0)"]
        Inline["→ Posts inline reports on issues"]
        AutoTask["→ Auto-creates follow-up tasks"]
    end

    A4 --> Orchestration
    RunnerIQ --> Fallback["If RunnerIQ is down → GitLab native FIFO takes over"]

    style RunnerIQ fill:#f0f4ff,stroke:#4a6cf7,stroke-width:2px
    style Orchestration fill:#ecfdf5,stroke:#22c55e,stroke-width:2px
    style Fallback fill:#fef3c7,stroke:#f59e0b
    style Pipeline fill:#e0e7ff,stroke:#6366f1

Decision flow (Module 3: Smart Assigner)

flowchart LR
    Job["Job arrives"] --> Score["Score all\ncompatible runners"]
    Score --> Check{"How many\nrunners?"}
    Check -- "0" --> Queue["Queue job"]
    Check -- "1" --> Direct["Direct assign\n< 10ms"]
    Check -- "2+" --> Margin{"Top-2 margin\n> 15%?"}
    Margin -- "Yes" --> Rules["Rules assign\n< 100ms"]
    Margin -- "No" --> Budget{"Token budget\navailable?"}
    Budget -- "Yes" --> Claude["AI\n~2-3s"]
    Budget -- "No" --> Rules

    style Claude fill:#fef3c7,stroke:#f59e0b
    style Rules fill:#dcfce7,stroke:#22c55e
    style Direct fill:#dcfce7,stroke:#22c55e

Scoring algorithm

TOTAL_SCORE = CAPACITY(30%) + TAG_MATCH(25%) + CARBON(25%) + HISTORY(20%)

If top runner leads by >15% → Rules assign instantly ($0, <100ms)
If within 15% margin → AI breaks tie with context (~$0.003, ~2-3s)

Target metrics

Metric	Target
Routing recommendations by rules	85–90%
Routing recommendations by AI	10–15%
Rules recommendation latency	< 100ms
AI recommendation latency	< 3s
Pipeline diagnosis latency	< 30s
Daily AI API cost	~$0.50–$1.50
Carbon savings vs. FIFO baseline	Tracked per job

The Carbon Argument: FIFO vs. Intelligent Routing

GitLab's FIFO scheduler doesn't consider where runners are located or what the local electricity grid looks like. It picks the first available runner, regardless of carbon intensity.

RunnerIQ recommends the runner that balances performance AND carbon:

FIFO picks randomly:
  Job → runner-DE (340 gCO₂/kWh) — coal-heavy grid
  Cost: 0.5 kWh × 340 = 170g CO₂

RunnerIQ recommends:
  Job → runner-FR (58 gCO₂/kWh) — nuclear, low carbon
  Cost: 0.5 kWh × 58 = 29g CO₂
  Savings: 141g CO₂ per job (83% reduction)

Multiply by hundreds of jobs per day across a fleet, and the impact is significant. RunnerIQ doesn't force the routing — it shows your team the carbon cost of each option and recommends the greener path.

The 5 Modules

Module 1: Runner Monitor 🔍

Polls the GitLab Runner API every 30 seconds. Maintains real-time state for every runner: status (online/offline/paused), active jobs, capacity, tags, and utilization. Detects state changes (runners going offline, stuck jobs >30min) and outputs structured JSON for downstream modules. Uses per-endpoint caching with stale-cache fallback on API errors.

Source: src/agent1_monitor/ — gitlab_client.py, runner_monitor.py, main.py

Module 2: Job Analyzer 📊

Extracts job metadata from pipelines and calculates a priority score (0-100) using configurable YAML rules. Scores are a weighted combination of branch priority (main=100, feature=50), user role (maintainer=100, guest=25), and job type (deploy=100, lint=40). Classifies urgency as CRITICAL/HIGH/MEDIUM/LOW. Supports bonuses (manual trigger +10, retry +5), penalties (allow_failure -10), and SLA escalation (LOW jobs auto-promote to MEDIUM after 5 minutes). Non-production branches are capped at 75.

Source: src/agent2_analyzer/ — job_analyzer.py, priority.py, history.py, priority_config.yaml

Module 3: Smart Assigner 🧠

The AI decision engine. Receives runner states from Module 1 and prioritized jobs from Module 2. Scores each compatible runner on a 0-100 scale using four weighted factors:

Factor	Weight	Description
Inverse utilization	40%	Idle runners score higher
Tag match quality	20%	Exact match = 100, superset = partial
Capacity headroom	20%	`(max - active) / max × 100`
Historical performance	20%	Duration ratio vs. fleet average

When the top-2 runners score within 15%, AI is called for nuanced trade-off analysis. A TokenBudgetTracker enforces a daily cap (default 50K tokens/day) with automatic fallback to rules.

Trust model: Advisory (default) → Supervised → Autonomous. Starts as recommendations. Earns trust over time. Every decision produces an immutable AuditEntry. Anomaly detection flags CRITICAL jobs on overloaded runners.

Source: src/agent3_assigner/ — smart_assigner.py, runner_scorer.py, claude_client.py, trust_model.py, hybrid_engine.py, priority_queue.py

Module 4: Performance Optimizer 📈

Tracks historical metrics per runner and generates weekly Markdown reports. Calculates a composite performance score (0-100) per runner:

Component	Weight	Formula
Throughput	25%	`log₂(jobs + 1) × 12`, capped at 100
Speed	30%	`(fleet_avg / runner_avg) × 50`
Reliability	30%	`(1 - failure_rate) × 100`
Utilization	15%	Bell curve, optimal at 50-80%

Detects four issue types with actionable recommendations:

Issue	Threshold	Severity	Action
Slow runner	> 2× fleet avg duration	⚠️ Warning	Upgrade or retire
Underutilized	< 20% utilization	ℹ️ Info	Consolidate or decommission
High failure rate	> 5% failure rate	🔴 Critical	Investigate infrastructure
Bottleneck	> 90% utilization	⚠️ Warning	Add parallel runner

Weekly reports include: summary with week-over-week deltas, top performers, needs-attention with inline recommendations, cost analysis, and a runner details table.

Source: src/agent4_optimizer/ — optimizer.py, performance_scorer.py, metrics_collector.py, report_generator.py, models.py

Module 5: Alerting 🔇

The noise reduction pipeline. Filters, groups, and routes alerts through 4 stages before anything reaches your team. FlakyDetector identifies fail→pass retry patterns (~30% false alert reduction). SuppressionEngine applies rule-based filtering for allow_failure jobs and experimental branches (~25% more). AlertGrouper batches similar alerts in configurable time windows (10 alerts → 1 notification). NotificationRouter delivers to the right channel with severity-based routing and cooldown deduplication.

Source: src/alerting/ — flaky_detector.py, alert_grouper.py, suppression_engine.py, notification_router.py, models.py, config_schema.py

Module Tools (v2.1)

Each module now has expanded capabilities beyond core scheduling:

Module	Core Tools	v2.1 Expansion	MR
Module 1 (Monitor)	Runner status, capacity	`GetPipelineErrors`, `GetJobLogs`, `CiLinter`	!103, !106
Module 2 (Analyzer)	Job scoring, pipeline analysis	`GetMergeRequest`, `ListMergeRequestDiffs`	!106
Module 3 (Assigner)	Assignment, tag manipulation	`CreateIssue`, `CreateIssueNote`, `GitLabUserSearch`	!104, !107
Shared	—	`GetProject`, `GetCurrentUser`	!105

10 tools total, 32 tests. All non-blocking — if any tool fails, the module falls back to existing behavior. Tools are standalone with constructor injection for easy testing and zero coupling to the core pipeline.

Action Bridge: Advisory to Action

RunnerIQ defaults to advisory mode (recommend only). When you're ready, the Action Bridge lets it influence job routing by dynamically adding runneriq: prefixed tags to runners via the GitLab API.

flowchart LR
    A3["Module 3 decides:<br/>Job X to Runner B"] --> Check{"execute flag?"}
    Check -- "No (default)" --> Advisory["Log recommendation<br/>no changes"]
    Check -- "Yes (--execute)" --> Tag["TagManager.add_tag<br/>runneriq:preferred-X"]
    Tag --> TTL["Auto-revert<br/>after 5 min"]
    Tag --> A4["Module 4 reports:<br/>tags applied, success rate"]

    style Advisory fill:#dcfce7,stroke:#22c55e
    style Tag fill:#fef3c7,stroke:#f59e0b

Safety Guarantees

Layer	Protection
Dry-run default	No tag changes unless `--execute` is passed
Tag namespace	Only manages `runneriq:` prefixed tags, never touches user-defined tags
Auto-revert	Every tag change has a TTL (default 5 min) and auto-reverts
Kill switch	`runneriq reset-tags` removes all RunnerIQ-managed tags instantly
Audit trail	Every action logged: runner, tag, reason, revert timestamp
Graceful fallback	If tag manipulation fails, falls back to advisory-only mode

Usage

# Advisory mode (default) — recommend only, zero side effects
runneriq run

# Action mode — apply routing tags to runners via GitLab API
runneriq run --execute

# Emergency reset — remove ALL runneriq: tags from all runners
runneriq reset-tags

# View audit trail of all tag changes
runneriq audit

Trust Progression

Advisory (default) → Supervised (--execute, human reviews) → Autonomous (future: auto-execute after proven reliability)

Module 4's weekly report includes an Action Bridge section when --execute is used: tags applied, success rate, and per-job recovery events.

Source: src/action_bridge/tag_manager.py (17 tests), src/agent3_assigner/smart_assigner.py (wiring), src/agent4_optimizer/optimizer.py (reporting)

🔍 RunnerIQ Intelligent Orchestration — AI Flow

RunnerIQ includes an Intelligent Orchestration flow published to the GitLab Duo Agent Platform. It is the single entry point for all demo and submission interactions, combining pipeline diagnosis, job analysis, runner assignment, and performance optimization in one 4-module pipeline.

⚠️ Pivot note: The original standalone diagnosis flow (flows/diagnosis.yml → @ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon) encounters a gRPC 16:Forbidden by auth provider error due to hackathon platform auth constraints. The orchestration flow (flows/runneriq.yml → @ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon) works correctly and includes all diagnosis capabilities plus the full 4-module pipeline. See #189 for details.

How It Works

graph LR
    A[Trigger from Issue/MR] --> B["Module 1: Monitor\nget_project, get_pipeline_errors,\nget_job_logs, ci_linter"]
    B --> C["Module 2: Analyzer\nget_merge_request,\nlist_merge_request_diffs"]
    C --> D["Module 3: Assigner\ncreate_issue, create_issue_note,\ngitlab_user_search"]
    D --> E["Module 4: Optimizer\nPerformance report"]
    E --> F[Structured Orchestration Report]

Trigger — Mention @ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon in any issue or MR comment, or select "RunnerIQ Intelligent Orchestration" in Duo Chat
Monitor (Module 1) — Diagnoses pipeline failures, fetches failing jobs, reads log traces, validates CI config
Analyze (Module 2) — Scores and prioritizes pending jobs by branch type, stage, MR context, and duration
Assign (Module 3) — Routes jobs to optimal runners using rules-first + AI toss-up engine
Optimize (Module 4) — Generates performance report with fleet utilization, throughput, and recommendations

Quick Start

From an Issue or MR comment:

@ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon Diagnose the latest pipeline failure for this project

From Duo Chat:

Open GitLab Duo Chat
Select "RunnerIQ Intelligent Orchestration" flow
Ask: "Diagnose the failing pipeline in project 79476480"

Tools Used (across 4 modules)

Module	Tools	Purpose
Monitor	`get_project`, `get_pipeline_errors`, `get_job_logs`, `ci_linter`, `get_current_user`	Pipeline diagnostics and CI validation
Analyzer	`get_merge_request`, `list_merge_request_diffs`, `get_project`, `get_current_user`	Job priority scoring with MR context
Assigner	`create_issue`, `create_issue_note`, `gitlab__user_search`, `get_project`, `get_current_user`	Runner assignment and team notification
Optimizer	`get_project`, `get_current_user`	Performance reporting

Example Output

## RunnerIQ Orchestration Report
**Project:** RunnerIQ (ID: 79476480)
**Pipeline:** #345 — FAILED

### Module 1: Pipeline Diagnosis
- **Classification:** dependency issue
- **Root Cause:** pip-audit found CVE-2024-XXXX in requests==2.31.0
- **Recommendation:** Upgrade requests to >=2.32.0

### Module 2: Job Priority Analysis
- 3 pending jobs scored: deploy (95), test (60), lint (40)

### Module 3: Runner Assignment
- DECISION: rules_engine | runner=runner-fr-large | reason=exact tag match, 75% capacity, lowest carbon

### Module 4: Performance Summary
- Fleet utilization: 55% | Green routing rate: 72%
- Top recommendation: Consolidate underutilized runner-c-small

What the Agent Does Autonomously

Capability	How It Works
Inline diagnosis reports	Posts full Markdown reports directly on issues/MRs
Follow-up task creation	Auto-creates labeled tasks when it identifies gaps
Decision transparency	Every recommendation includes score breakdown + reasoning
Carbon comparison	Shows CO₂ cost of FIFO vs. intelligent routing

Platform Architecture Alignment

Layer	GitLab Definition	RunnerIQ Implementation
Tool	Exposes data; no reasoning	GitLab REST API, Claude API, scoring engine
Agent	Autonomous task performer	4 specialists (Monitor, Analyzer, Assigner, Optimizer)
Flow	Orchestrates agents	`FlowController` with `RunContext` shared state

Flow Definition

The orchestration flow is defined in flows/runneriq.yml and published as @ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon on the GitLab Duo Agent Platform. It chains 4 modules (Monitor → Analyzer → Assigner → Optimizer) with context passing between each stage. Performance: ~30–90 second execution, stable WebSocket, tested across 20+ sessions.

RunnerIQ vs Alternatives

Based on feedback from 8 DevOps engineers on r/devops, here's how RunnerIQ compares to suggested alternatives:

Approach	Solves Capacity?	Solves Priority?	Cost	Complexity	Best For
GitLab native FIFO	❌	❌	Free	None	Single runner setups
Semantic tagging (`deploy,prod`)	Partial	❌	Free	Low	Small fleets with predictable workloads
Dedicated runner pools	✅	Partial	High (idle runners)	Medium	Teams with budget for dedicated infra
EKS + Karpenter autoscaling	✅	❌	Variable	High (K8s expertise)	Cloud-native teams
On-demand provisioning	✅	Partial	Medium	High	Teams with K8s/cloud infra skills
RunnerIQ	❌ (fixed fleet)	✅	~$5-10/mo	Low (Python + API key)	Fixed fleet teams wanting priority routing
RunnerIQ v2.0 + Karpenter	✅	✅	TBD	Medium	Cloud-native teams wanting cost-optimized scaling

Key insight: Dedicated pools and autoscaling solve capacity. RunnerIQ solves priority. These are different problems. See the full comparison table for 11 dimensions across 8 approaches.

RunnerIQ vs GitLab Runner Autoscaling

Aspect	Autoscaling	RunnerIQ	Combined
Problem solved	Capacity — "Do I have enough runners?"	Intelligence — "Which runner gets which job?"	Both
How it works	Spins up/down runners based on demand	Scores and routes jobs to optimal existing runners	RunnerIQ routes intelligently within autoscaled fleet
Priority handling	❌ FIFO within each tag pool	✅ Priority scoring (0-100) based on branch, stage, context	✅
Cost optimization	Reduces idle runner costs	Reduces wasted compute by better matching	Both
Setup	Runner config (`docker-machine`, `fleeting`)	Standalone sidecar + API token	Independent

They're complementary, not competing. Autoscaling ensures you have the right number of runners. RunnerIQ ensures each runner gets the right job. A production hotfix still waits behind lint checks in an autoscaled FIFO queue — RunnerIQ fixes that.

Current limitation: RunnerIQ has zero autoscaling awareness today. It treats all runners as static entities. Autoscaling-aware scheduling (detecting scale-up/down events, coordinating with fleeting or docker-machine) is on the v2.0 roadmap.

Technical Deep-Dive

Answers to the 5 most common architecture questions, verified against the codebase.

Tag-Aware Routing

RunnerIQ uses GitLab runner tags as a hard gate before any scoring begins. If a job requires tags that a runner doesn't have, that runner is excluded entirely — no exceptions.

After tag filtering, tag match quality contributes 20% to the overall runner score (configurable).

How it works:

Job requires tags [docker, gpu]
Runner A has tags [docker, gpu, linux] → ✅ passes gate (superset match)
Runner B has tags [docker, shell] → ❌ excluded (missing gpu)
Remaining runners scored by: utilization (40%), tag match (20%), capacity (20%), history (20%)

Configuration (runneriq.example.yaml):

runneriq:
  assigner:
    scoring:
      weights:
        utilization: 0.40
        tag_match: 0.20
        capacity: 0.20
        history: 0.20

Code: src/agent3_assigner/runner_scorer.py — RunnerScorerV2.score_runners() implements the tag gate (required_tags.issubset(runner_tags)) and weighted scoring.

Scope

Component	Scope	API Endpoint
Runner discovery (Module 1)	Instance-level — sees all runners visible to your API token	`GET /runners`
Pipeline analysis (Module 2)	Project-level — analyzes pipelines for one project	`GET /projects/{id}/pipelines`
Job assignment (Module 3)	Project-level — routes jobs within the configured project	Project-scoped

Current limitation: RunnerIQ requires GITLAB_PROJECT_ID and analyzes one project at a time. For multi-project setups, run one RunnerIQ instance per project.

v2.0 roadmap: Group-level pipeline support (GITLAB_GROUP_ID) to analyze all projects in a group with a single instance.

Integration Architecture

RunnerIQ runs as a standalone sidecar process alongside your GitLab instance. It does not modify your .gitlab-ci.yml or intercept GitLab's native scheduler.

┌──────────────┐     polls      ┌──────────────┐
│  RunnerIQ    │ ──────────────→│  GitLab API  │
│  (sidecar)   │     every 30s  │              │
└──────┬───────┘                └──────────────┘
       │
       ▼
  Advisory recommendations
  (logged for human review)

How to run:

# Full pipeline (Monitor → Analyze → Assign)
make run-pipeline

# Or individual modules
make run-monitor
make run-analyzer
make run-assigner

RunnerIQ is non-blocking by design. If RunnerIQ is down, removed, or misconfigured, your CI/CD runs exactly as it does today. Zero impact.

v2.0 roadmap: Webhook-based event-driven integration (Pipeline and Job hooks via Flask/FastAPI) for sub-second response times, with polling retained as a consistency fallback.

Supported Runner Types

RunnerIQ works with all GitLab runner types — Docker, Shell, Kubernetes, custom — because it communicates exclusively through the GitLab REST API. No host-level agent or special runner configuration required.

Metric	Source	What it measures
Runner status	`GET /runners/{id}`	Online/offline/paused
Active jobs	`GET /runners/{id}/jobs?status=running`	Current workload
Utilization	Calculated: `active_jobs / max_jobs`	Logical capacity usage

Current limitation: Utilization is job-based, not resource-based. RunnerIQ knows "3 of 4 job slots are full" but not "CPU is at 90%." This is sufficient for job routing decisions but doesn't capture hardware-level bottlenecks.

v2.0 roadmap: Host-level metrics via Prometheus integration for CPU, memory, and disk-aware scheduling.

Installation

Prerequisites

Python 3.10+
git
(Optional) Anthropic API key for AI decisions

Self-Diagnostic

# Verify your setup (5 checks: GitLab API, Anthropic, carbon, config, tests)
runneriq doctor

Try It on GitLab (Duo Agent Platform)

Mention the agent on any issue or MR:

@ai-runneriq-intelligent-orchestration-gitlab-ai-hackathon Diagnose the latest failing pipeline

The agent posts a structured diagnosis report directly on the issue.

Option 1: pip install (recommended)

git clone https://gitlab.com/gitlab-ai-hackathon/participants/11553323.git
cd asifdotpy

python -m venv .venv
source .venv/bin/activate

pip install -e .          # Install RunnerIQ + all dependencies
pip install -e ".[dev]"   # Also install dev tools (pytest, mypy, black)

cp .env.example .env      # Copy environment template

Option 2: Via Makefile (recommended)

git clone https://gitlab.com/gitlab-ai-hackathon/participants/11553323.git
cd asifdotpy
make setup        # Creates venv, installs deps + dev tools, runs tests
make setup-quick  # Same but skips the test suite

Option 4: Manual with requirements.txt

git clone https://gitlab.com/gitlab-ai-hackathon/participants/11553323.git
cd asifdotpy

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt   # Install all dependencies
cp .env.example .env              # Copy environment template

Verify Installation

make demo                 # Run the demo (mock mode)
make test                 # Run all tests
make typecheck            # 100% strict compliance on src/

Configuration

Environment Variables

The setup script auto-copies .env.example to .env. Edit it with your credentials:

$EDITOR .env

Variable	Required	Description
`GITLAB_URL`	Yes	Your GitLab instance URL
`GITLAB_TOKEN`	Yes	Personal access token with `api` scope
`GITLAB_PROJECT_ID`	Yes	Numeric project ID
`ANTHROPIC_API_KEY`	For AI mode	Anthropic API key (Module 3 hybrid/claude_only)
`RUNNERIQ_LOG_LEVEL`	No	`DEBUG`, `INFO`, `WARNING`, `ERROR` (default: `INFO`)
`RUNNERIQ_POLL_INTERVAL`	No	Runner polling interval in seconds (default: `30`)

YAML Priority Rules

RunnerIQ also uses a YAML config for priority rules. Copy and customize:

cp runneriq.example.yaml runneriq.yaml

Priority Rules (Module 2)

runneriq:
  analyzer:
    priority:
      branch_weights:
        main: 100          # Production branches get highest priority
        "hotfix/*": 95     # Hotfixes are near-production
        develop: 75        # Development branch
        "feature/*": 50    # Feature branches
        default: 40        # Unknown branches

      user_role_weights:
        maintainer: 100
        developer: 75
        guest: 25

      job_type_weights:
        deploy: 100        # Deploys are highest priority
        build: 75
        test: 50
        lint: 40

      manual_trigger_bonus: 10
      non_production_cap: 75  # Feature branches capped at 75

Decision Engine (Module 3)

runneriq:
  assigner:
    decision_engine:
      mode: hybrid              # hybrid | rules_only | claude_only
      margin_threshold: 0.15    # Use AI when top-2 within 15%
      daily_token_budget: 50000 # Max AI tokens/day (0 = unlimited)
    trust_model:
      mode: advisory            # advisory | supervised | autonomous
      supervised_threshold: HIGH

Usage

Run Modules

make run-monitor      # Module 1: Runner Monitor
make run-analyzer     # Module 2: Job Analyzer
make run-assigner     # Module 3: Smart Assigner (mock mode)
make run-optimizer    # Module 4: Performance Optimizer (mock mode)
make run-pipeline     # Full integration pipeline
make demo             # Live demo script

Run make help to see all 24 available targets.

Orchestrator (Unified Pipeline)

# Full 4-module pipeline (requires GITLAB_* env vars)
runneriq run

# Preview without executing assignments
runneriq run --dry-run

# Monitor + Analyze only (skip assignment and optimization)
runneriq run --mode analyze-only

# Mock mode (no API credentials needed, great for demos)
runneriq run --mock

# JSON output for scripting
runneriq run --mock --output json

# Or via Makefile
make run-orchestrator

Individual Modules (Advanced)

source .venv/bin/activate
export PYTHONPATH=src

python -m agent1_monitor.main
python -m agent2_analyzer.main --pipeline-id 12345 --project-id 67890
python -m agent3_assigner.main --mock
python -m agent4_optimizer.main --mock --output-format markdown

Module 4 Report Output

make run-optimizer

Produces:

# RunnerIQ Performance Report
**Week of:** Feb 12-19, 2026

## Summary
- Total jobs: 532
- Avg completion time: 4.9 minutes (↓ 2.1 min from last week)
- Critical job delay: 0.8 minutes
- Runner utilization: 55%

## Top Performers 🏆
1. **runner-a-large**: 139 jobs, 2.4min avg, 99% uptime
2. **runner-d-medium**: 195 jobs, 3.6min avg, 98% uptime

## Needs Attention ⚠️
- **runner-c-small**: 52 jobs, 9.8min avg (2x slower than average)
  - Recommendation: Upgrade to medium specs or retire

## Cost Analysis 💰
- Total compute cost: $127 (↓ $213 from manual management)
- Cost per job: $0.15 (↓ from $0.41)
- Projected monthly savings: $340

API Reference

GitLab APIs Used

Endpoint	Module	Purpose
`GET /api/v4/runners`	1	List all runners
`GET /api/v4/runners/:id`	1	Runner details
`GET /api/v4/runners/:id/jobs`	1, 4	Runner job history
`GET /api/v4/projects/:id/pipelines/:pid`	2	Pipeline metadata
`GET /api/v4/projects/:id/pipelines/:pid/jobs`	2	Job list
`GET /api/v4/projects/:id/repository/branches`	2	Branch info
`POST /api/v4/projects/:id/issues`	4	Create report issues

Anthropic Claude API

# Used by Module 3 for toss-up decisions only
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1000,
    messages=[{"role": "user", "content": prompt}]
)

Key Python Classes

Class	Module	Description
`RunnerMonitor`	`agent1_monitor`	Real-time runner state tracking
`JobAnalyzer`	`agent2_analyzer`	Priority scoring and urgency classification
`SmartAssigner`	`agent3_assigner`	Hybrid rules + AI assignment engine
`TrustModel`	`agent3_assigner`	Advisory/Supervised/Autonomous trust tiers
`RunnerScorerV2`	`agent3_assigner`	4-factor runner scoring with margin calculation
`ClaudeClient`	`agent3_assigner`	Claude API with token budget enforcement
`PriorityQueue`	`agent3_assigner`	SLA-aware job queue with auto-escalation
`FlowController`	`orchestrator`	Unified 4-module pipeline orchestrator
`RunContext`	`orchestrator`	Shared state passed through module pipeline
`PerformanceOptimizer`	`agent4_optimizer`	Module 4 pipeline orchestrator
`PerformanceScorer`	`agent4_optimizer`	Composite 0-100 scoring with issue detection
`ReportGenerator`	`agent4_optimizer`	Weekly Markdown report renderer
`MetricsCollector`	`agent4_optimizer`	Runner metrics aggregation
`ElectricityMapsClient`	`carbon`	Carbon intensity API with triple fallback + cache
`CarbonMCPTools`	`carbon`	4 MCP tools for AI carbon-aware routing
`CO2SavingsTracker`	`carbon`	File-persisted CO2 savings + green routing rate

Security

RunnerIQ takes security seriously. Full details in SECURITY.md.

Credentials: All secrets (GITLAB_TOKEN, ANTHROPIC_API_KEY) via environment variables only. No hardcoded tokens.
CI scanning: SAST, Secret Detection, and Dependency Scanning templates in every pipeline
Bandit: Python-specific security linting blocks merge on medium+ severity findings
Advisory mode by default: RunnerIQ recommends but never acts without explicit --execute opt-in
Auto-revert: All tag changes include automatic rollback on failure
Audit trail: Every assignment decision logged with full context
Local scan: make security runs Bandit locally before pushing

Test Suite

RunnerIQ has 1,171+ tests across all modules, enforced by CI with 60%+ coverage.

make test             # Run all tests
make test-cov         # Run with coverage report (fail under 60%)
make test-agent1      # Module 1 tests only
make test-agent2      # Module 2 tests only
make test-agent3      # Module 3 tests only
make test-agent4      # Module 4 tests only
make test-integration # Integration and e2e tests

Test Distribution

Module	Test Files	Key Coverage
Module 1	`test_gitlab_client.py`, `test_runner_monitor.py`	API caching, stale fallback, state change detection
Module 2	`test_priority.py`, `test_history.py`, `test_job_analyzer.py`, `test_gitlab_client.py`	Priority overrides, non-production cap, env variable handling
Module 3	`test_smart_assigner.py`, `test_smart_assigner_v2.py`, `test_hybrid_engine.py`, `test_claude_integration.py`, `test_priority_queue.py`, `test_trust_mode.py`, `test_integration_e2e.py`	Hybrid routing, token budget, trust tiers, anomaly detection, SLA escalation
Module 4	`test_performance_scorer.py`, `test_metrics_collector.py`, `test_models.py`	Composite scoring, all 4 issue detectors, fleet average, edge cases
Integration	`test_full_pipeline.py`, `test_pipeline_integration.py`, `test_performance.py`	End-to-end pipeline, Module 1→2→3 flow, performance benchmarks
E2E / Config	`test_agent2_e2e.py`, `test_config_validation.py`, `test_error_handling.py`, `test_priority_scoring.py`, `test_integration.py`	Schema validation, error recovery, priority algorithm, config edge cases
Module Tools	`test_agent1_tools.py`, `test_agent2_tools.py`, `test_agent3_tools.py`, `test_agent3_user_search.py`, `test_shared_tools.py`	All 10 v2.1 tools: diagnostics, MR context, issue management, user search
Alerting	`test_flaky_detector.py`, `test_alert_grouper.py`, `test_suppression_engine.py`, `test_notification_router.py`, `test_config_schema.py`	4-stage noise reduction: flaky detection, grouping, suppression, routing
Action Bridge E2E	`test_action_bridge_e2e.py`	Full advisory→action flow: tag add → verify → revert → Module 4 reports
Smoke	`test_smoke.py`	All module imports verified (alerting, core, carbon, orchestrator)

CI Pipeline

Stage	Jobs	Blocking?
Lint	`black:format`, `flake8:lint`, `mypy:typecheck`, `pylint:analysis`	Yes (except pylint)
Test	`test:unit`, `test:integration`, `test:count-check`	Yes
Carbon Tests	`test_carbon_client`, `test_carbon_mcp_tools`, `test_carbon_routing_integration` (32 tests)	Yes
Coverage	`coverage:report` (≥ 60%)	Yes
Security	`security:bandit`, GitLab SAST, Secret Detection, Dependency Scanning	Yes (bandit); Advisory (safety)

Project Structure

flows/                            # GitLab Duo Agent Platform flows
├── runneriq.yml              # Intelligent Orchestration (public, WORKING — primary entry point)
├── diagnosis.yml             # Pipeline Failure Diagnosis (public, gRPC auth error — see #189)
├── README.md                 # Flow architecture docs
├── test-01-*.yml             # Test flows (private)
├── test-02-*.yml
├── test-03-*.yml
└── test-04-*.yml
src/
├── agent1_monitor/           # 🔍 Runner Monitor
│   ├── gitlab_client.py      #    GitLab API client with caching
│   ├── runner_monitor.py     #    State tracking & change detection
│   ├── main.py               #    CLI entry point
│   └── tests/
├── agent2_analyzer/          # 📊 Job Analyzer
│   ├── job_analyzer.py       #    Pipeline analysis orchestrator
│   ├── priority.py           #    Priority scoring engine
│   ├── history.py            #    Historical duration estimation
│   ├── priority_config.yaml  #    Configurable rules
│   └── tests/
├── agent3_assigner/          # 🧠 Smart Assigner
│   ├── smart_assigner.py     #    Main orchestrator (3-path routing)
│   ├── runner_scorer.py      #    4-factor runner scoring
│   ├── claude_client.py      #    Claude API + token budget
│   ├── trust_model.py        #    Advisory/Supervised/Autonomous
│   ├── hybrid_engine.py      #    Hybrid decision engine
│   ├── priority_queue.py     #    SLA-aware priority queue
│   └── tests/
├── agent4_optimizer/         # 📈 Performance Optimizer
│   ├── optimizer.py          #    Full pipeline orchestrator
│   ├── performance_scorer.py #    Composite scoring + issue detection
│   ├── metrics_collector.py  #    Runner metrics aggregation
│   ├── report_generator.py   #    Weekly Markdown reports
│   ├── models.py             #    Data models (RunnerMetrics, etc.)
│   └── tests/
├── orchestrator/             # 🎯 Unified Pipeline Orchestrator
│   ├── cli.py                #    CLI entry point (runneriq run)
│   ├── flow_controller.py    #    4-module pipeline with graceful degradation
│   ├── run_context.py        #    Shared state dataclass
│   └── tests/
├── carbon/                   # 🌍 Carbon-Aware Routing
│   ├── models.py             #    6 dataclasses (CarbonIntensity, etc.)
│   ├── electricity_maps_client.py  # API client + cache + triple fallback
│   ├── mcp_server.py         #    4 MCP tools for AI
│   ├── co2_tracker.py        #    CO2 savings tracker (file-persisted)
│   ├── settings.py           #    Carbon env vars + demo config
│   ├── dashboard.py          #    Flask API (4 endpoints)
│   └── dashboard.html        #    Single-file HTML dashboard
├── alerting/                 # 🔇 Noise Reduction Alerting
│   ├── flaky_detector.py     #    Fail→pass retry pattern detection
│   ├── alert_grouper.py      #    Time-window alert batching
│   ├── suppression_engine.py #    Rule-based alert filtering
│   ├── notification_router.py #   Severity-based channel routing
│   ├── models.py             #    Alert, AlertGroup, SuppressionResult, etc.
│   ├── config_schema.py      #    YAML config validation
│   └── tests/
├── common/                   #    Shared utilities
│   ├── logging_config.py     #    Structured JSON logging
│   ├── config_validator.py   #    YAML config validation
│   └── benchmarks.py         #    Performance measurement
├── config/                   #    Centralized configuration
│   └── runneriq_config.py    #    Config loader + validation
└── integration/              #    Cross-agent integration
    ├── full_pipeline.py      #    End-to-end pipeline runner
    └── tests/

Contributing

See CONTRIBUTING.md for full details. Quick summary:

Branch from main: git checkout -b feat/your-feature
Format: make format
Lint: make lint
Type check: make typecheck
Test: make test
All checks: make check (runs lint + typecheck + test)
Commit: Use conventional commits (feat:, fix:, docs:, test:)
Clean up: make clean
MR: All CI checks must pass before merge

Tech Stack

Component	Technology
Platform	GitLab Duo Agent Platform
AI Model	Anthropic Claude (Sonnet)
Language	Python 3.10+
Package Manager	uv (fast, Rust-based)
APIs	GitLab REST API v4
Config	YAML (priority rules, thresholds)
Testing	pytest + pytest-cov
CI/CD	GitLab CI (lint → test → coverage → security)
Logging	Unified `setup_logging()` with `RotatingFileHandler`, JSON structured output

Graceful Degradation

RunnerIQ degrades gracefully through multiple layers. Because it is advisory and non-blocking, there is no single point of failure.

Layer	Trigger	Behavior	Latency Impact
Full system	Everything working	Rules + AI hybrid scoring	~3ms (rules) / ~2-3s (AI)
Rules-only	AI API unavailable or token budget exhausted	Deterministic rules scoring, zero AI API calls	~3ms
Stale cache	GitLab API errors	Last-known runner state used for scoring	~3ms
Passthrough	RunnerIQ down or crashed	GitLab native FIFO scheduler continues unaffected	0ms (RunnerIQ not in path)

Key design principle: RunnerIQ is advisory. It recommends assignments but never blocks or intercepts GitLab's scheduler. There is no fallback mechanism that could itself fail, because RunnerIQ is never in the critical path.

Code references:

AI → rules fallback: src/agent3_assigner/smart_assigner.py → _decide_single_job()
Stale cache on API errors: src/agent1_monitor/gitlab_client.py → get_stale()
Module 3 failure handling: src/integration/full_pipeline.py → _run_agent3()

Performance Targets

Metric	Target
Runner polling	Every 30 seconds
Job analysis	< 5 seconds
Rules-based assignment	< 100ms
AI decision	< 3 seconds
Total assignment latency	< 20 seconds

Community Validation

1,008+ comments across 3 GitLab issues (10 years unsolved)
11 DevOps engineers validated on r/devops across 2 posts:
- Engineering Manager, EKS fleet operator, first user-side pain confirmation
- eltear1: 5 technical deep-dive questions on tag routing, scope, integration, runner types, autoscaling (drove 5 new README sections)
- ArieHein: vision alignment on agents replacing CI/CD DSLs, MCP as task execution layer
0 competitors in the intelligent runner orchestration space

Upstream Contribution

During development, debugging custom flow YAML configurations uncovered several undocumented runtime behaviours — including silent WebSocket closures caused by the inputs string format passing schema validation but failing at runtime. These findings were shared with the GitLab team and resulted in gitlab-org/gitlab#591567, where the AI Catalog team is now actively working on improved error messaging for misconfigured flows.

⚠️ Known Platform Limitations

RunnerIQ is built on the GitLab Duo Agent Platform, which is new and evolving. We document these limitations transparently, not as complaints, but to help users understand current boundaries and future possibilities.

API Integration Points (Future Enablement)

RunnerIQ's Smart Assigner currently operates in Hybrid Mode, accepting runner data via context/JSON. When the Duo Agent Platform expands to include these endpoints, it switches to Live Mode with zero code changes:

Endpoint	Purpose in RunnerIQ	Priority	Status
`GET /projects/{id}/runners`	Discover available runners in the fleet	🔴 Critical	Requires Maintainer role
`GET /runners/{id}`	Runner details: tags, status, capacity, region	🔴 Critical	Requires Maintainer role
`GET /runners/{id}/jobs`	Current workload per runner (for balancing)	🟡 High	Requires Maintainer role
`GET /projects/{id}/jobs?scope[]=pending`	Pending job queue (what needs assignment)	🔴 Critical	Available
`GET /projects/{id}/jobs/{job_id}`	Job requirements: tags, resource needs	🟡 High	Available
`POST /projects/{id}/jobs/{job_id}/play`	Execute the assignment decision	🟢 Nice-to-have	Advisory mode works without this

Hybrid Mode vs. Live Mode

TODAY (Hybrid Mode):
  User provides runner JSON in context → Smart Assigner reasons over it → Recommendation

FUTURE (Live Mode):
  Smart Assigner calls Runner API → Gets live fleet state → Reasons over it → Assignment

The decision logic is identical. Only the data source changes. This is by design.

🔄 Platform Constraints & Our Pivot

RunnerIQ's Smart Assigner was designed to call Runner API endpoints (GET /projects/:id/runners, GET /runners/:id, GET /runners/:id/jobs) for live fleet data. During development, we discovered these endpoints require Maintainer-level access, which is beyond the Developer role available to hackathon participants.

What we did

Built the full scoring engine — tag match (40%), capacity (30%), performance (30%), with carbon-aware tiebreaking via AI
Pivoted to a hybrid model — the Smart Assigner accepts runner fleet data via context/JSON instead of calling live APIs
Kept the decision logic identical — only the data source changed, not the intelligence
Designed for zero-change upgrade — when the Duo Agent Platform expands Runner API access to Developer role, RunnerIQ switches to live fleet management with no code changes

Why this matters

GitLab's runner queue uses FIFO (first-in, first-out) scheduling — a 10-year-old problem with 1,008+ comments. RunnerIQ's scoring engine solves this by matching the right job to the right runner instantly. 85% of decisions are handled by the rules engine (free, <100ms). AI handles the 15% that need genuine reasoning — failure triage, anomaly explanation, and carbon-aware trade-offs when runners score within a 15% margin.

The triage brain is built and tested. The integration point is ready. The platform will catch up.

Validated by GitLab team

"it's good you found a way to still show the value by using simulated data!" — Mattias Michaux, GitLab (source)

Bonus: Our debugging contributed back to GitLab

During this pivot, our flow debugging findings were adopted by GitLab as an official issue: gitlab-org/gitlab#591567. See #123 for details.

License

Built for the GitLab AI Hackathon 2026

RunnerIQ: Less noise. More signal. Zero alert fatigue. And your CI/CD should be greener while it's at it.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

6.1.0

Mar 13, 2026

This version

2.0.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

runneriq-2.0.0.tar.gz (339.7 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

runneriq-2.0.0-py3-none-any.whl (321.0 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file runneriq-2.0.0.tar.gz.

File metadata

Download URL: runneriq-2.0.0.tar.gz
Upload date: Mar 12, 2026
Size: 339.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for runneriq-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`957f555ff7114d0a324485564e92c58f0bf7481492f5cba5dcd541624ce7a8de`
MD5	`272329cbe189e93cbf33dc9546d27d8d`
BLAKE2b-256	`2adbf0d274425c8c846e2c3da08068b644459366b2bb225a59b8295a813a4704`

See more details on using hashes here.

File details

Details for the file runneriq-2.0.0-py3-none-any.whl.

File metadata

Download URL: runneriq-2.0.0-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 321.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for runneriq-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8b0f8347ed021cabc0702c04d66edb83333ee4ecad3180d1a7de01f6b2c7d51a`
MD5	`144ae9b70d6642bc0f736ccabb6a4e87`
BLAKE2b-256	`672bebf8c9f878211d20606e1bb22c3414b5b73e3f3a5cde78ae987967185337`

See more details on using hashes here.

runneriq 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🎯 RunnerIQ

What is RunnerIQ?

The Problem

How RunnerIQ Fixes It

Key Differentiators

🔇 Noise Reduction — Built-In, Not Bolted On

🚨 Pipeline Failure Autopilot

🏃 Intelligent Runner Recommendations

The 10-Year Problem, Solved

🛠️ CLI Tools

🌱 Bonus: Carbon-Aware Routing

Priority-Based Carbon Weights

MCP Carbon Tools

Carbon Metrics

Carbon Dashboard

Carbon Quick Start

Carbon Source Files

Getting Started

📊 Simulated Impact Report

Scenario: Mid-Size Team (10 Runners, ~200 Jobs/Day)

How the Savings Break Down

📊 Project Stats

Architecture

The Carbon Argument: FIFO vs. Intelligent Routing

The 5 Modules

Module 1: Runner Monitor 🔍

Module 2: Job Analyzer 📊

Module 3: Smart Assigner 🧠

Module 4: Performance Optimizer 📈

Module 5: Alerting 🔇

Module Tools (v2.1)

Action Bridge: Advisory to Action

Safety Guarantees

Usage

Trust Progression

🔍 RunnerIQ Intelligent Orchestration — AI Flow

How It Works

Quick Start

Tools Used (across 4 modules)

Example Output

What the Agent Does Autonomously

Platform Architecture Alignment

Flow Definition

RunnerIQ vs Alternatives

RunnerIQ vs GitLab Runner Autoscaling

Technical Deep-Dive

Tag-Aware Routing

Scope

Integration Architecture

Supported Runner Types

Installation

Prerequisites

Self-Diagnostic

Try It on GitLab (Duo Agent Platform)

Option 1: pip install (recommended)

Option 2: Via Makefile (recommended)

Option 4: Manual with requirements.txt

Verify Installation

Configuration

Environment Variables

YAML Priority Rules

Priority Rules (Module 2)

Decision Engine (Module 3)

Usage

Run Modules

Orchestrator (Unified Pipeline)

Individual Modules (Advanced)

Module 4 Report Output

API Reference

GitLab APIs Used

Anthropic Claude API