Skip to main content

MCP server that turns 50K-line CI logs into focused failure context for AI coding agents.

Project description

ci-log-intelligence

Stop dumping 50,000-line CI logs into your AI coding agent. This MCP server reads the logs for the agent and returns a few hundred tokens of focused, typed failure context — so the agent can debug your CI without flooding its context window.

CI Python License: MIT

The problem

You ask Claude / Codex / Copilot to fix a failing CI build. The agent runs gh run view --log, gets back 60,000 lines of pytest output, and pastes the whole thing into its context. Now:

  • The actual failure is buried somewhere on line 47,892.
  • Your context window is ~80% spent on log output before any work begins.
  • Every tool call after this costs more because the cached context is enormous.
  • The agent's reasoning quality drops because the relevant signal is diluted.

After a few of these, your conversation either OOMs the context or gets too expensive to be useful.

What this does

ci-log-intelligence is an MCP server (also usable as a CLI / Python library) that sits between the agent and the CI logs. You give it a GitHub URL — a PR, a workflow run, or a single job — and it does the heavy reading in its own process:

PR / run / job URL  →  fetch logs  →  parse  →  11 detector plugins  →  typed failure records
                                                                              │
                                                                              ▼
                                                                      a few hundred tokens
                                                                      of focused context
                                                                      back to your agent

You get back a structured response: a ranked list of typed FailureRecords (hash_mismatch, build_error_rust, pytest_fail, go_test_fail, …), each with the test name / file path / error code / log excerpt that's actually relevant — not 50K lines of npm install output.

Three MCP tools, designed to explore-then-drill

Rather than one omnibus call that returns a fixed payload, the server exposes three tools that map onto how an agent actually wants to work:

Tool When to use Approximate response size
list_failed_jobs(ci_url) First call. Cheap map of failed jobs with classifications + the failure types present in each. No per-block content. ~200–500 tokens
analyze_ci_failure(ci_url, top_k=3, failure_types=None, …) Get the top-K typed failure records with content. Filterable by detector (failure_types=["hash_mismatch"]). ~1–4K tokens
get_block(ci_url, block_index, surround=5) Drill into a specific block. Returns full content with in_block / is_anchor flags. per-block

Results are cached per (repo, run_id, job_id). A second call against the same URL skips the GitHub fetch, the parse, and the reducer entirely.

Quick start

Install

pip install ci-log-intelligence

Or from source:

git clone https://github.com/kuldeep0020/ci-log-intelligence.git
cd ci-log-intelligence
pip install -e .

Authenticate with GitHub

The fetcher prefers the local gh CLI; falls back to a GITHUB_TOKEN env var.

gh auth login         # preferred
# or
export GITHUB_TOKEN=ghp_…

Wire up your MCP client

Claude Code (CLI) — one command, available in every project:

claude mcp add ci-log-intelligence --scope user -- ci-log-intelligence-mcp
claude mcp list   # confirm it shows up

Claude Desktop — add to your claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/claude_desktop_config.json; Windows: %APPDATA%\Claude\claude_desktop_config.json):

{
  "mcpServers": {
    "ci-log-intelligence": {
      "command": "ci-log-intelligence-mcp",
      "args": []
    }
  }
}

Fully quit and relaunch Claude Desktop after editing the file.

Codex — this repo includes .codex/config.toml; open the repo in Codex and run /mcp to confirm ci-log-intelligence is listed.

VS Code / GitHub Copilot — this repo includes .vscode/mcp.json; open the repo in VS Code with Copilot agent mode enabled.

See INSTALL.md for full setup instructions including troubleshooting, environment variables, HTTP transport, and other MCP clients.

A 30-second demo

In your AI agent, after wiring up the MCP server:

"The build at https://github.com/me/myrepo/actions/runs/12345 failed. Can you fix it?"

The agent now has three tools available. A reasonable trace:

agent  →  list_failed_jobs("https://github.com/me/myrepo/actions/runs/12345")

server →  {
            "jobs": [
              {
                "job_name": "postgres-test (bundling)",
                "block_count": 3,
                "failure_types_present": ["hash_mismatch", "generic"],
                "classifications": {"root_cause": 1, "symptom": 2},
                "job_url": "…/runs/12345/jobs/678"
              }
            ],
            "metadata": {"failed_jobs": 1, "total_runs_analyzed": 1}
          }

agent  →  analyze_ci_failure(
             ci_url="…/runs/12345",
             failure_types=["hash_mismatch"]
          )

server →  {
            "root_cause": {
              "summary": "Run 12345 job postgres-test (bundling) root_cause at lines 1058-1062: ...",
              "log_excerpt": "common.go:1058: file hashes don't match for ...\n--- FAIL: TestRunSetPartial (45.3s)\n…",
              "has_traceback": false,
              "has_assertion": true,
              "score": 10.0,
              "score_components": {"severity_weight": 10.0, "signal_density": 0.5, "duplicate_penalty": 0.0}
            },
            "failures": [
              {
                "type": "hash_mismatch",
                "classification": "root_cause",
                "severity": 2,
                "score": 10.0,
                "start_line": 1058,
                "end_line": 1062,
                "summary": "…",
                "log_excerpt": "…",
                "extracted_fields": {
                  "test_name": "TestRunSetPartial",
                  "warehouse_target": "postgres",
                  "job_name": "postgres-test (bundling)"
                }
              }
            ],
            "metadata": {"failures_returned": 1, "failures_total": 1, …}
          }

The agent now knows: it's a golden-file hash mismatch in TestRunSetPartial on the postgres warehouse target. It can run make update_ref_samples scoped to that one test. Total context consumed: <2K tokens instead of 50K.

CLI usage

For humans debugging CI in a terminal:

ci-log-intel analyze --url https://github.com/owner/repo/pull/123 --include-passed

Machine-readable JSON:

ci-log-intel analyze --url https://github.com/owner/repo/actions/runs/12345 --json

Analyzing a local log file

The same detectors work on any large log file — not just GitHub Actions runs. Point --file at a local path (or - for stdin):

ci-log-intel analyze --file ./build.log
kubectl logs my-pod | ci-log-intel analyze --file -

You get back the same ranked failure blocks and typed detected-failure records, useful for triaging Jenkins/Buildkite/local-CI logs or any long log stream where the actual failure is buried.

Python usage

from ci_log_intelligence import analyze_ci_url

report = analyze_ci_url(
    "https://github.com/owner/repo/pull/123",
    include_passed=True,
    max_passed_runs=3,
)

print(report.root_cause.summary)
for record in report.failures:
    print(record.type, record.classification, record.score, record.extracted_fields)

For raw log strings (no GitHub fetch):

from ci_log_intelligence import analyze_log

result = analyze_log("STEP: test\nERROR build failed\nException: boom")
for failure in result.detected_failures:
    print(failure.type, failure.anchor_lines, failure.extracted_fields)

How it works

The pipeline is deterministic and heuristic — no LLM in the loop. A set of Detector plugins scans each parsed line and emits typed DetectedFailure records; the framework clusters anchors, expands context (step-bounded), suppresses noise, scores, classifies, and ranks.

Detectors shipped in v1

Detector Severity What it catches
hash_mismatch 2 file hashes don't match paired with --- FAIL: in the same step (golden-file failures)
go_test_fail 2 Standalone --- FAIL: TestName from go test (not paired with hash mismatches)
pytest_fail 2 FAILED tests/x.py::test_y - … summary lines with traceback pairing
rust_test_fail 2 test foo::bar ... FAILED paired with thread '…' panicked at
junit_xml 2 <testcase>...<failure> / <error> fragments embedded in log streams
build_error_rust 3 error[E####]: + --> location, plus bare cargo summaries
build_error_go 3 ./pkg/file.go:line:col: message
build_error_npm 3 Multi-line npm ERR! / yarn error blocks
build_error_make 3 make: *** [target] Error N
build_error_gcc 3 file:line:col: error: … with note continuation (gcc/clang)
generic 1–3 Hardened keyword fallback (Traceback, Exception, ERROR, FAILED, etc.) with word boundaries, case-insensitive matching, and a benign-mention filter ("0 errors" won't anchor)

Build errors at severity 3 outrank test failures at severity 2, so when a build broke before any test ran the build error is correctly selected as root_cause and the cascading test failures show as symptoms.

Adding a detector

Each detector is a single file under ci_log_intelligence/reducer/detectors/. Implement the Detector Protocol (one scan() method that returns a list of DetectedFailure records) and add yourself to the registry. The framework handles clustering, expansion, scoring, classification, and the typed-record output.

See architecture.md for the full pipeline description, data contracts, and design rationale.

CI-aware comparison

When you give it a PR URL, the server fetches both failed and passed jobs in the same workflow run. Failed jobs go through the full reducer; passed jobs use targeted extraction (matching step IDs, test names, or assertion text from failed blocks). A cross-run analyzer then surfaces insights like:

  • "Failure occurs only in variant snowflake for job group test."
  • "Step build-stage is present in passed runs but missing in failing run for job group test."
  • "Test foo behaves differently between passed and failed runs."

These come back in cross_run_insights so the agent can quickly see whether a failure is environment-specific, a regression, or flaky.

HTTP API

If you'd rather not use MCP, there's a small FastAPI endpoint for raw-log analysis:

uvicorn ci_log_intelligence.api:app --reload
curl -X POST http://127.0.0.1:8000/analyze \
  -H "Content-Type: application/json" \
  -d '{"log":"STEP: test\nERROR build failed\nException: boom"}'

Testing

python -m unittest discover -s tests -v

250+ tests covering each detector, the cache, the MCP tool surface, and end-to-end scenarios across multiple detector types.

Known limitations

  • All specialized detectors are severity 2 or 3 and tiebreak on earliest anchor line. A specificity weighting on DetectedFailure is on the v1.1 roadmap.
  • Windows-style paths (C:\src\foo.cpp:5:1:) may not parse correctly in the GCC build-error detector. Linux CI only for now.
  • The JUnit XML detector caps at 50 records per scan; consumers should check extracted_fields.get("truncated", False).
  • Long-running Go tests with (1m30s) duration format report the seconds tail only.
  • Progress notifications only render in MCP clients that send a progressToken. The server emits MCP-spec-compliant progress events during slow log fetches; clients like Codex CLI display them, but Claude Code (as of this writing) does not opt into them so the progress bar never appears. The tools still work — only the live progress UI is missing. Use the CLI (ci-log-intel analyze --url ...) when you want visible progress in a terminal. See INSTALL.md for details and a diagnostic flag.

See architecture.md for the full list.

Contributing

If this tool saves you tokens on a debugging session, contributing back is warmly welcomed — even a single PR adding a detector for a CI tool that isn't covered yet makes the project meaningfully better for the next person. The codebase is small (~3K LOC + tests) and the detector framework is explicitly designed to make adding a new language / build tool a single-file change.

See CONTRIBUTING.md for setup, a worked "add a new detector" example, and the PR process. The TL;DR: open an issue if you're stuck, send a PR if you're not.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ci_log_intelligence-0.2.1.tar.gz (69.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ci_log_intelligence-0.2.1-py3-none-any.whl (87.4 kB view details)

Uploaded Python 3

File details

Details for the file ci_log_intelligence-0.2.1.tar.gz.

File metadata

  • Download URL: ci_log_intelligence-0.2.1.tar.gz
  • Upload date:
  • Size: 69.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ci_log_intelligence-0.2.1.tar.gz
Algorithm Hash digest
SHA256 1022c947c1dd9bc8e3a02574163e2f69d3cf596ef28c1da0d0275ea645400d1e
MD5 66cc05dbe05e2c83fa78bd2c4551f568
BLAKE2b-256 a35b34b7901f447341ed1a84f65d19dbdef610e25a6fc94fc2378dba13a978a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for ci_log_intelligence-0.2.1.tar.gz:

Publisher: publish.yml on kuldeep0020/ci-log-intelligence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ci_log_intelligence-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ci_log_intelligence-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2be72d35e650da2c0cd26245aa3b6ffedc2c47b6449717aaf0f055ea921d480f
MD5 09a8ca7cacfe6fa97ed36d43efe592ee
BLAKE2b-256 9cbedbceb5b5ecdf86b7350360a6c0a473ef8e11f857af4afae785cec154e59f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ci_log_intelligence-0.2.1-py3-none-any.whl:

Publisher: publish.yml on kuldeep0020/ci-log-intelligence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page