Skip to main content

MCP server that turns 50K-line CI logs into focused failure context for AI coding agents.

Project description

ci-log-intelligence

Stop dumping 50,000-line CI logs into your AI coding agent. This MCP server reads the logs for the agent and returns a few hundred tokens of focused, typed failure context — so the agent can debug your CI without flooding its context window.

CI Python License: MIT

The problem

You ask Claude / Codex / Copilot to fix a failing CI build. The agent runs gh run view --log, gets back 60,000 lines of pytest output, and pastes the whole thing into its context. Now:

  • The actual failure is buried somewhere on line 47,892.
  • Your context window is ~80% spent on log output before any work begins.
  • Every tool call after this costs more because the cached context is enormous.
  • The agent's reasoning quality drops because the relevant signal is diluted.

After a few of these, your conversation either OOMs the context or gets too expensive to be useful.

What this does

ci-log-intelligence is an MCP server (also usable as a CLI / Python library) that sits between the agent and the CI logs. You give it a GitHub URL — a PR, a workflow run, or a single job — and it does the heavy reading in its own process:

PR / run / job URL  →  fetch logs  →  parse  →  11 detector plugins  →  typed failure records
                                                                              │
                                                                              ▼
                                                                      a few hundred tokens
                                                                      of focused context
                                                                      back to your agent

You get back a structured response: a ranked list of typed FailureRecords (hash_mismatch, build_error_rust, pytest_fail, go_test_fail, …), each with the test name / file path / error code / log excerpt that's actually relevant — not 50K lines of npm install output.

Three MCP tools, designed to explore-then-drill

Rather than one omnibus call that returns a fixed payload, the server exposes three tools that map onto how an agent actually wants to work:

Tool When to use Approximate response size
list_failed_jobs(ci_url) First call. Cheap map of failed jobs with classifications + the failure types present in each. No per-block content. ~200–500 tokens
analyze_ci_failure(ci_url, top_k=3, failure_types=None, …) Get the top-K typed failure records with content. Filterable by detector (failure_types=["hash_mismatch"]). ~1–4K tokens
get_block(ci_url, block_index, surround=5) Drill into a specific block. Returns full content with in_block / is_anchor flags. per-block

Results are cached per (repo, run_id, job_id). A second call against the same URL skips the GitHub fetch, the parse, and the reducer entirely.

Quick start

Install

pip install ci-log-intelligence

Or from source:

git clone https://github.com/kuldeep0020/ci-log-intelligence.git
cd ci-log-intelligence
pip install -e .

Authenticate with GitHub

The fetcher prefers the local gh CLI; falls back to a GITHUB_TOKEN env var.

gh auth login         # preferred
# or
export GITHUB_TOKEN=ghp_…

Wire up your MCP client

This repo ships shared MCP configuration for several clients (see INSTALL.md for the full setup guide):

  • Codex: .codex/config.toml (auto-discovered)
  • VS Code / GitHub Copilot: .vscode/mcp.json (workspace-scoped)
  • Claude Desktop: example at docs/claude_desktop_config.example.json

For any other MCP client, point it at the ci-log-intelligence-mcp command installed by the package.

A 30-second demo

In your AI agent, after wiring up the MCP server:

"The build at https://github.com/me/myrepo/actions/runs/12345 failed. Can you fix it?"

The agent now has three tools available. A reasonable trace:

agent  →  list_failed_jobs("https://github.com/me/myrepo/actions/runs/12345")

server →  {
            "jobs": [
              {
                "job_name": "postgres-test (bundling)",
                "block_count": 3,
                "failure_types_present": ["hash_mismatch", "generic"],
                "classifications": {"root_cause": 1, "symptom": 2},
                "job_url": "…/runs/12345/jobs/678"
              }
            ],
            "metadata": {"failed_jobs": 1, "total_runs_analyzed": 1}
          }

agent  →  analyze_ci_failure(
             ci_url="…/runs/12345",
             failure_types=["hash_mismatch"]
          )

server →  {
            "root_cause": {
              "summary": "Run 12345 job postgres-test (bundling) root_cause at lines 1058-1062: ...",
              "log_excerpt": "common.go:1058: file hashes don't match for ...\n--- FAIL: TestRunSetPartial (45.3s)\n…",
              "has_traceback": false,
              "has_assertion": true,
              "score": 10.0,
              "score_components": {"severity_weight": 10.0, "signal_density": 0.5, "duplicate_penalty": 0.0}
            },
            "failures": [
              {
                "type": "hash_mismatch",
                "classification": "root_cause",
                "severity": 2,
                "score": 10.0,
                "start_line": 1058,
                "end_line": 1062,
                "summary": "…",
                "log_excerpt": "…",
                "extracted_fields": {
                  "test_name": "TestRunSetPartial",
                  "warehouse_target": "postgres",
                  "job_name": "postgres-test (bundling)"
                }
              }
            ],
            "metadata": {"failures_returned": 1, "failures_total": 1, …}
          }

The agent now knows: it's a golden-file hash mismatch in TestRunSetPartial on the postgres warehouse target. It can run make update_ref_samples scoped to that one test. Total context consumed: <2K tokens instead of 50K.

CLI usage

For humans debugging CI in a terminal:

ci-log-intel analyze --url https://github.com/owner/repo/pull/123 --include-passed

Machine-readable JSON:

ci-log-intel analyze --url https://github.com/owner/repo/actions/runs/12345 --json

Python usage

from ci_log_intelligence import analyze_ci_url

report = analyze_ci_url(
    "https://github.com/owner/repo/pull/123",
    include_passed=True,
    max_passed_runs=3,
)

print(report.root_cause.summary)
for record in report.failures:
    print(record.type, record.classification, record.score, record.extracted_fields)

For raw log strings (no GitHub fetch):

from ci_log_intelligence import analyze_log

result = analyze_log("STEP: test\nERROR build failed\nException: boom")
for failure in result.detected_failures:
    print(failure.type, failure.anchor_lines, failure.extracted_fields)

How it works

The pipeline is deterministic and heuristic — no LLM in the loop. A set of Detector plugins scans each parsed line and emits typed DetectedFailure records; the framework clusters anchors, expands context (step-bounded), suppresses noise, scores, classifies, and ranks.

Detectors shipped in v1

Detector Severity What it catches
hash_mismatch 2 file hashes don't match paired with --- FAIL: in the same step (golden-file failures)
go_test_fail 2 Standalone --- FAIL: TestName from go test (not paired with hash mismatches)
pytest_fail 2 FAILED tests/x.py::test_y - … summary lines with traceback pairing
rust_test_fail 2 test foo::bar ... FAILED paired with thread '…' panicked at
junit_xml 2 <testcase>...<failure> / <error> fragments embedded in log streams
build_error_rust 3 error[E####]: + --> location, plus bare cargo summaries
build_error_go 3 ./pkg/file.go:line:col: message
build_error_npm 3 Multi-line npm ERR! / yarn error blocks
build_error_make 3 make: *** [target] Error N
build_error_gcc 3 file:line:col: error: … with note continuation (gcc/clang)
generic 1–3 Hardened keyword fallback (Traceback, Exception, ERROR, FAILED, etc.) with word boundaries, case-insensitive matching, and a benign-mention filter ("0 errors" won't anchor)

Build errors at severity 3 outrank test failures at severity 2, so when a build broke before any test ran the build error is correctly selected as root_cause and the cascading test failures show as symptoms.

Adding a detector

Each detector is a single file under ci_log_intelligence/reducer/detectors/. Implement the Detector Protocol (one scan() method that returns a list of DetectedFailure records) and add yourself to the registry. The framework handles clustering, expansion, scoring, classification, and the typed-record output.

See architecture.md for the full pipeline description, data contracts, and design rationale.

CI-aware comparison

When you give it a PR URL, the server fetches both failed and passed jobs in the same workflow run. Failed jobs go through the full reducer; passed jobs use targeted extraction (matching step IDs, test names, or assertion text from failed blocks). A cross-run analyzer then surfaces insights like:

  • "Failure occurs only in variant snowflake for job group test."
  • "Step build-stage is present in passed runs but missing in failing run for job group test."
  • "Test foo behaves differently between passed and failed runs."

These come back in cross_run_insights so the agent can quickly see whether a failure is environment-specific, a regression, or flaky.

HTTP API

If you'd rather not use MCP, there's a small FastAPI endpoint for raw-log analysis:

uvicorn ci_log_intelligence.api:app --reload
curl -X POST http://127.0.0.1:8000/analyze \
  -H "Content-Type: application/json" \
  -d '{"log":"STEP: test\nERROR build failed\nException: boom"}'

Testing

python -m unittest discover -s tests -v

250+ tests covering each detector, the cache, the MCP tool surface, and end-to-end scenarios across multiple detector types.

Known limitations

  • All specialized detectors are severity 2 or 3 and tiebreak on earliest anchor line. A specificity weighting on DetectedFailure is on the v1.1 roadmap.
  • Windows-style paths (C:\src\foo.cpp:5:1:) may not parse correctly in the GCC build-error detector. Linux CI only for now.
  • The JUnit XML detector caps at 50 records per scan; consumers should check extracted_fields.get("truncated", False).
  • Long-running Go tests with (1m30s) duration format report the seconds tail only.

See architecture.md for the full list.

Contributing

Issues and PRs welcome. The codebase is small (~2.5K LOC + tests) and the detector framework is designed to make adding a new language / tool a single-file change. Run the tests, follow the existing patterns in ci_log_intelligence/reducer/detectors/, and open a PR.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ci_log_intelligence-0.1.1.tar.gz (62.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ci_log_intelligence-0.1.1-py3-none-any.whl (80.6 kB view details)

Uploaded Python 3

File details

Details for the file ci_log_intelligence-0.1.1.tar.gz.

File metadata

  • Download URL: ci_log_intelligence-0.1.1.tar.gz
  • Upload date:
  • Size: 62.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ci_log_intelligence-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3dad28cbc80919bae8efc30738be1320e298d1deb33f16ec1ea2d5739e8c7190
MD5 dabff073df06c2d94b353cc567237ba5
BLAKE2b-256 2f4942e81b1d903bf4fb1ca851cdea6e30f9b5acf2cfa32a41a2a48b5cc273bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for ci_log_intelligence-0.1.1.tar.gz:

Publisher: publish.yml on kuldeep0020/ci-log-intelligence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ci_log_intelligence-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ci_log_intelligence-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 91eda60e105a6f8edf476816b8e22685624bff23898cc8eb65f4ab6a04c2d36e
MD5 b154360f32d0df700c24f3e055cf09c5
BLAKE2b-256 f0fada41e14e75366824161a78674fbf338c00b8826cdee947b6bd7d76e05361

See more details on using hashes here.

Provenance

The following attestation bundles were made for ci_log_intelligence-0.1.1-py3-none-any.whl:

Publisher: publish.yml on kuldeep0020/ci-log-intelligence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page