Skip to main content

An MCP server for HTTP traffic analysis with value provenance tracing — token-efficient HAR inspection for AI agents.

Project description

hartrace

An MCP server for analyzing HTTP traffic captures (HAR files) — built so an AI agent can answer questions about a capture without reading the raw JSON into its context window.

Its distinguishing feature is value provenance tracing: given any token, cookie, id, or payload field, hartrace reconstructs where the value was produced (which response set it) and where it was consumed (which later requests sent it), as a compact timeline. Every other tool — search, inspection, diffing — is built to return small, structured results with hard size caps, so analysis stays cheap regardless of how large the capture is.

load_har("session.har")
trace_value("session", "<csrf token>")
  → set_by:  response #4 body, JSON path data.csrf
  → used_in: request #9 header X-CSRF, request #9 body field token

Why this exists

HAR files are large, deeply nested, and repetitive. The two common ways an AI ends up analyzing them are both bad: writing throwaway extraction scripts every session, or pasting raw HAR JSON into the context window (slow, expensive, and it overflows on anything real). A 100-entry capture can be several megabytes; a single gzipped response can be hundreds of kilobytes.

hartrace moves the extraction and correlation to the server. Tools return only what was asked for, capped. The questions that normally require reading many entries by hand — where did this auth token come from? which request produced this cookie? where is this id reused? — are answered in one call.


Features

  • Provenance tracingtrace_value follows any value across the capture (responses → requests), reporting JSON paths for body fields. Works on tokens, cookies, ids, headers, and payload fields alike, not just cookies.
  • Search toolkit — full regex search across URLs, headers, and request/response bodies; header finder; URL/endpoint finder; query-parameter extraction.
  • Inspection — per-request and per-response retrieval with base64 + gzip/deflate decoding, nested-JSON unwrapping, binary detection, and size caps.
  • Lifecycle mapscookie_map and token_map summarize how cookies and high-entropy secrets flow through a session.
  • Diffing — compare two captures by (method, url, ordinal) so repeated calls to the same endpoint align.
  • Loading — from a local path or an http(s) URL (with SSRF protection and a size cap).
  • Safe by construction — every list/search tool paginates with server-clamped limits; secrets are redacted in inspection output; no tool raises to the transport (errors are returned as structured values).

Installation

Requires Python 3.10+.

# from PyPI (once published)
pipx install hartrace

# or zero-install with uv
uvx hartrace

# or from source
git clone https://github.com/rafsanbasunia/hartrace
cd hartrace
pip install -e .

The only runtime dependency is fastmcp.


Quick start with Claude Desktop

Add the server to claude_desktop_config.json:

{
  "mcpServers": {
    "hartrace": {
      "command": "uvx",
      "args": ["hartrace"]
    }
  }
}

Or, running from source:

{
  "mcpServers": {
    "hartrace": {
      "command": "python",
      "args": ["/absolute/path/to/hartrace/har_mcp.py"]
    }
  }
}

Restart Claude Desktop, then talk to it naturally:

"Load ~/captures/login.har and tell me where the CSRF token comes from." "Which requests reuse the session cookie?" "Diff before.har and after.har — what's new?"

Other MCP clients

hartrace is a standard stdio MCP server, so it works in any MCP-capable client — only the config file and wrapper key differ.

  • Cursor (.cursor/mcp.json) and Windsurf use the same "mcpServers" block shown above.

  • VS Code (.vscode/mcp.json) uses a "servers" key with an explicit type:

    {
      "servers": {
        "hartrace": { "type": "stdio", "command": "uvx", "args": ["hartrace"] }
      }
    }
    

In every case the command/args are identical to the Claude Desktop example.


Tools

hartrace exposes 19 tools. Refer to a loaded capture by the name returned from load_har.

Loading

Tool Purpose
list_har_files(dir) List .har files in a directory to choose from.
load_har(path) Load a capture from a local path. Returns the assigned name.
load_har_url(url) Load a capture from an http(s) URL (SSRF-guarded, size-capped).
list_hars() List currently loaded captures.
unload_har(name) Drop a capture to free memory.

Inspection

Tool Purpose
list_requests(name, filter, …) Overview rows: index, method, URL, status, response size.
get_request(name, index) One request, decoded; secrets redacted.
get_response(name, index, max_chars) One response body, decoded (base64/gzip), JSON unwrapped, capped.
get_headers(name, index) Request and response headers for one entry.
get_query_params(name, index) Parsed query string of one request.

Search

Tool Purpose
search_regex(name, pattern, scope) Regex over url | req_headers | resp_headers | req_body | resp_body | all.
find_header(name, header_name) Every entry carrying a header, with raw values.
find_urls(name, pattern) Requests whose URL matches a pattern.
list_endpoints(name, group_by) Unique endpoints with call counts.

Provenance

Tool Purpose
trace_value(name, value) Where a value was set vs. used — the full timeline.
trace_header(name, header_name) Resolve a header's value(s) and trace their origin.
cookie_map(name) Every cookie's set/used lifecycle and attributes.
token_map(name, all_tokens) High-entropy secrets and how they propagate.

Comparison

Tool Purpose
diff_hars(a, b) Requests unique to each capture; matched by (method, url, ordinal).

Every tool's full description — including argument semantics and a worked example — is available to the model through the MCP protocol.


How provenance tracing works

On first trace, hartrace builds a correlation index over the capture (cached for subsequent calls). For a queried value it separates hits into two sides:

  • set_by — responses that produced the value: a Set-Cookie header, or a response body (with the JSON path when the value sits inside parseable JSON).
  • used_in — later requests that sent the value: in a request header, a cookie, or a request body field (again with JSON path where applicable).

An empty set_by means the value was supplied by the client rather than produced by any captured response — for example a pre-existing OAuth token. origin is the earliest producer; timeline is the ordered list of entry indices touched.

Values shorter than four characters are refused, because short strings match everywhere and the result would be noise rather than signal.


Design and safety

  • stdio transport only. The server communicates over stdin/stdout per the MCP spec. All logging is routed to stderr so it cannot corrupt the protocol stream. There is no web UI, port, or background process.
  • Bounded output. List and search tools paginate with limit/offset, and limits are clamped server-side (a request for a million rows returns the cap, not a million rows). Response bodies are capped by max_chars. These bounds are what make token usage predictable.
  • Bounded memory. Captures are rejected above 500 MB or 50,000 entries; nested-JSON decoding is depth- and size-limited.
  • Redaction. Inspection tools mask sensitive header values and high-entropy secrets as <REDACTED len=N>, using a configurable header list plus a generic Shannon-entropy heuristic — not a vendor-specific token shape. Provenance tools correlate on the real value but display only a redacted preview. (find_header intentionally returns raw values, since its purpose is to extract a value to trace.)
  • SSRF protection. load_har_url refuses non-http(s) schemes and any host resolving to a private, loopback, link-local, reserved, or multicast address (including cloud metadata endpoints), and enforces the size cap on download.
  • No exceptions across the boundary. Every tool returns a structured {error: "..."} on failure rather than raising, so the agent always receives an actionable message.

The server contains no vendor-specific logic. Helpers such as nested-JSON unwrapping are generic and apply to any deeply nested response.


Development

pip install -e ".[dev]"
pytest

The suite covers parsing and decoding, redaction and entropy detection, provenance tracing (cookie and non-cookie values, JSON-path resolution), the search toolkit, URL loading and SSRF guards, and the tools driven through the actual FastMCP call path.

Layout:

har_mcp.py      MCP server: tool definitions and the stdio entry point
har_parser.py   Parsing, decoding, redaction, and the in-memory store
provenance.py   Correlation index and the trace / cookie / token tools
search.py       Regex, header, URL, and endpoint search
config.json     Redaction settings (sensitive headers, entropy thresholds)
tests/          pytest suite

Configuration

config.json adjusts redaction without code changes:

{
  "sensitive_headers": ["authorization", "cookie", "set-cookie", "x-csrf-token", "x-api-key"],
  "entropy_min_len": 24,
  "entropy_bits_min": 3.5
}

sensitive_headers are always redacted by name; any other value is redacted if it exceeds entropy_min_len characters and entropy_bits_min bits of Shannon entropy per character.


License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hartrace-0.1.0.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hartrace-0.1.0-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file hartrace-0.1.0.tar.gz.

File metadata

  • Download URL: hartrace-0.1.0.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for hartrace-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a7ca5e61f9cb7d56f7325a6ff6da7fc9c81e1b4bc9b16641d8c53b83bc0c7f23
MD5 536181a59daa64b3577420ed9664e630
BLAKE2b-256 3bd26a93b695bafbf39472697aaa20787fb05c19f2e1750711a215ae8168e79f

See more details on using hashes here.

File details

Details for the file hartrace-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hartrace-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for hartrace-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3112f4c5bd68470124cf9be7b5b8939b517a070f4d7ab5ebd499f0c6521401d7
MD5 a65182a868dc144bd1239bef02ed023e
BLAKE2b-256 f7e6d5182002cd3059517f62bc4b3c00708411de58345f446ba6529ee3284388

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page