Skip to main content

Deterministic log templating on top of Drain3, packaged as an artifact for AI agents.

Project description

AgenticAILogAnalyser

Python port of codag-drain that uses the upstream Python Drain3 package as its grouping engine. Same CLI surface, same output shape, same evidence-rich artifact, packaged as a single binary you can drop into any environment.

The intended consumer is an AI agent that needs to read a large log window under a fixed token budget. Instead of feeding the agent 1,400 raw lines, you feed it 8 templates with slot statistics and a few raw examples per group.

What it does

Takes a stream of log lines on stdin, groups near-duplicates with Drain3, and emits one templated line per group with:

  • the count of collapsed lines,
  • a derived <*> template,
  • per-slot stats (min / max / median for numeric slots, distinct values for enums, an auto-detected unit like ms or MB),
  • a few raw sample lines.

The intended consumer is an LLM agent that needs to read a large log window under a fixed token budget.

Real-world example

A 1,438-line Kiro IDE log compresses to 8 templates at ~180x compression:

[x1] [WebviewProcessMonitor] Service starting
[x4] update#setState <*> [idle,downloading,downloaded,ready]
[x14] [WebviewProcessMonitor] Tracking webview renderer: pid=<*>, origin=<*>, windowId=<*> [13773..87619 p50=87288.5]
[x1] update#setState checking for updates
[x14] Extension host with pid <*> exited with code: 0, signal: unknown. [13697..89755 p50=73921]
[x1395] No ptyHost heartbeat after 6 seconds
[x8] [WebviewProcessMonitor] Webview renderer process gone: pid=<*>
[x1] Extracting content from 1 URIs
[codag-drain-py] 1438 lines -> 8 templates (179.8x)

The dominant signal — 97% of the file being one repeating warning — is the first thing the model sees instead of being buried. Numeric ranges and enum values are preserved, so outliers and state distributions stay visible.

Install

From source:

pip install -e .

From source with the build extra (PyInstaller):

pip install -e ".[build]"

Usage

echo 'worker latency 20ms
worker latency 20ms
worker latency 20ms
worker latency 8400ms' | codag-drain-py --stats
[x4] worker latency <*> [20..8400ms p50=20ms]
[codag-drain-py] 4 lines -> 1 templates (4.0x)

JSON output:

echo 'worker ready shard=1' | codag-drain-py --format json

Choose a grouper:

cat logs.txt | codag-drain-py --grouper drain-stock

NDJSON input:

cat events.ndjson | codag-drain-py --json

Available groupers:

name description
drain (default) Drain3 with codag's compact-line tokenizer fallback
drain-stock Drain3 with vanilla whitespace tokenization
drain-delimited Drain with extra punctuation delimiters folded into whitespace
drain-fullsearch Drain similarity over all same-length clusters (no prefix-tree)
statistical Non-Drain control: IDF-weighted anchor co-occurrence

Build a single-file binary

./scripts/build_binary.sh
./dist/codag-drain-py --help

PyInstaller bundles the Python interpreter and drain3 into one file under dist/. Build on each OS / architecture you intend to ship.

Programmatic API

from codag_drain_py import LogLine, TemplaterConfig, template_logs

result = template_logs(
    [LogLine(message="latency 20ms"), LogLine(message="latency 8400ms")]
)
print(result.render())
print(result.to_json(indent=2))

TemplateIndex exposes the streaming variant:

from codag_drain_py import LogLine, TemplateIndex

idx = TemplateIndex()
for msg in some_iterator():
    idx.push(LogLine(message=msg))
print(idx.templates().render())

Tests

pip install -e ".[dev]"
pytest

Credits

  • Drain3 — the underlying log template miner from logpai. We use the published PyPI package directly.
  • codag-drain — the Rust project this Python port is modeled on. The compact-line tokenizer fallback, multi-member template derivation, slot profiling, and CLI surface all follow that design.
  • Drain paper — He et al., "Drain: An Online Log Parsing Approach with Fixed Depth Tree", ICWS 2017.

License

MIT. See LICENSE.

Layout

src/codag_drain_py/
    __init__.py     public exports
    __main__.py     `python -m codag_drain_py`
    cli.py          argparse + stdin pipeline
    compress.py     templater entry point + rendering
    grouper.py      Drain / DrainStock / DrainDelimited / FullSearch / Statistical
    input.py        heuristic line + NDJSON parsers
    lex.py          character-class tokenizer + lex template derivation
    profile.py      slot capture, numeric stats, distinct-value summaries
    stream.py       TemplateIndex streaming wrapper
    template.py     whitespace template derivation + capture regex
tests/
    test_compress.py
    test_input.py
scripts/
    build_binary.sh PyInstaller --onefile build

MCP server (use as a tool from Kiro / Claude / any MCP client)

The analyser ships with a built-in Model Context Protocol server. Once registered with Kiro or Claude Desktop, your assistant can call it as a tool to compress logs on demand without you piping anything through a shell.

What it exposes

Five tools, all served over stdio:

tool description
analyse_logs Compress an inline log body. Returns templated artifact + summary.
analyse_log_file Same but reads the body from a local file path.
stream_push Append lines to a named streaming session.
stream_project Render templates over the accumulated session.
stream_reset Clear a session.

Each tool accepts the full set of analyser options: grouper, sample_cap, template_clip, body_format, output_format.

Build the MCP binary

./scripts/build_mcp_binary.sh

This produces a single self-contained binary at dist/agentic-log-analyser-mcp (~22 MB). It bundles the Python interpreter, the analyser, drain3, and the MCP SDK — no Python install required on the machine that runs it.

Register with Kiro

Open Kiro's MCP config (Command Palette → "Open MCP Config" or edit .kiro/settings/mcp.json in your workspace, or ~/.kiro/settings/mcp.json for user-wide). Add:

{
  "mcpServers": {
    "agentic-log-analyser": {
      "command": "/absolute/path/to/dist/agentic-log-analyser-mcp",
      "args": [],
      "disabled": false,
      "autoApprove": ["analyse_logs", "analyse_log_file", "stream_project"]
    }
  }
}

There's a ready-to-paste example at examples/mcp_config_kiro.json. Reload the MCP config from the MCP Server view in the Kiro feature panel.

Register with Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) and merge in:

{
  "mcpServers": {
    "agentic-log-analyser": {
      "command": "/absolute/path/to/dist/agentic-log-analyser-mcp",
      "args": []
    }
  }
}

Restart Claude Desktop. The tools will appear in the tools menu.

Use it from a chat

In Kiro or Claude, just ask:

"Compress this log file and tell me what stands out: /Users/me/Desktop/logs/cloudtrail_event.txt"

The assistant will pick up analyse_log_file, call it with the path, and diagnose against the templated artifact instead of the raw bytes.

Debug from the CLI

To run the server manually and tail its output:

./dist/agentic-log-analyser-mcp

It speaks JSON-RPC over stdio. The repo's scripts/smoke_mcp_binary.py shows a real client roundtrip you can use as a reference.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_log_analyser-0.1.0.tar.gz (27.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentic_log_analyser-0.1.0-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file agentic_log_analyser-0.1.0.tar.gz.

File metadata

  • Download URL: agentic_log_analyser-0.1.0.tar.gz
  • Upload date:
  • Size: 27.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentic_log_analyser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 02755b2755e061d063af953712abfdddf96a8263324a09602359e8330c1f533b
MD5 5ac21c825f0fb1745b7d9f01b0870d56
BLAKE2b-256 9943277f0c89d5923ece067065caf429638b408869732c75da929bba7fc7c9e4

See more details on using hashes here.

File details

Details for the file agentic_log_analyser-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agentic_log_analyser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cc4c42bd34b960f19e3893f5ffc4f2c88b33892aae622975a3b1618a21286b22
MD5 b43beaa385224d5e02c2b939d6eb7fc9
BLAKE2b-256 22755da6c0e33e2b30bfc62e388d90a3d9c77bab8c0403b6cbdc6c09fb5c8914

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page