An MCP server for HTTP traffic analysis with value provenance tracing — token-efficient HAR inspection for AI agents.
Project description
hartrace
An MCP server for analyzing HTTP traffic captures (HAR files) — built so an AI agent can answer questions about a capture without reading the raw JSON into its context window.
Its distinguishing feature is value provenance tracing: given any token, cookie, id, or payload field, hartrace reconstructs where the value was produced (which response set it) and where it was consumed (which later requests sent it), as a compact timeline. Every other tool — search, inspection, diffing — is built to return small, structured results with hard size caps, so analysis stays cheap regardless of how large the capture is.
load_har("session.har")
trace_value("session", "<csrf token>")
→ set_by: response #4 body, JSON path data.csrf
→ used_in: request #9 header X-CSRF, request #9 body field token
Why this exists
HAR files are large, deeply nested, and repetitive. The two common ways an AI ends up analyzing them are both bad: writing throwaway extraction scripts every session, or pasting raw HAR JSON into the context window (slow, expensive, and it overflows on anything real). A 100-entry capture can be several megabytes; a single gzipped response can be hundreds of kilobytes.
hartrace moves the extraction and correlation to the server. Tools return only what was asked for, capped. The questions that normally require reading many entries by hand — where did this auth token come from? which request produced this cookie? where is this id reused? — are answered in one call.
Features
- Provenance tracing —
trace_valuefollows any value across the capture (responses → requests), reporting JSON paths for body fields. Works on tokens, cookies, ids, headers, and payload fields alike, not just cookies. - Search toolkit — full regex search across URLs, headers, and request/response bodies; header finder; URL/endpoint finder; query-parameter extraction.
- Inspection — per-request and per-response retrieval with base64 + gzip/deflate decoding, nested-JSON unwrapping, binary detection, and size caps.
- Lifecycle maps —
cookie_mapandtoken_mapsummarize how cookies and high-entropy secrets flow through a session. - Diffing — compare two captures by
(method, url, ordinal)so repeated calls to the same endpoint align. - Loading — from a local path or an
http(s)URL (with SSRF protection and a size cap). - Safe by construction — every list/search tool paginates with server-clamped limits; secrets are redacted in inspection output; no tool raises to the transport (errors are returned as structured values).
Installation
Requires Python 3.10+.
# from PyPI (once published)
pipx install hartrace
# or zero-install with uv
uvx hartrace
# or from source
git clone https://github.com/rafsanbasunia/hartrace
cd hartrace
pip install -e .
The only runtime dependency is fastmcp.
Quick start with Claude Desktop
Add the server to claude_desktop_config.json:
{
"mcpServers": {
"hartrace": {
"command": "uvx",
"args": ["hartrace"]
}
}
}
Or, running from source:
{
"mcpServers": {
"hartrace": {
"command": "python",
"args": ["/absolute/path/to/hartrace/har_mcp.py"]
}
}
}
Restart Claude Desktop, then talk to it naturally:
"Load
~/captures/login.harand tell me where the CSRF token comes from." "Which requests reuse the session cookie?" "Diffbefore.harandafter.har— what's new?"
Other MCP clients
hartrace is a standard stdio MCP server, so it works in any MCP-capable client — only the config file and wrapper key differ.
-
Cursor (
.cursor/mcp.json) and Windsurf use the same"mcpServers"block shown above. -
VS Code (
.vscode/mcp.json) uses a"servers"key with an explicit type:{ "servers": { "hartrace": { "type": "stdio", "command": "uvx", "args": ["hartrace"] } } }
In every case the command/args are identical to the Claude Desktop example.
Tools
hartrace exposes 19 tools. Refer to a loaded capture by the name returned from load_har.
Loading
| Tool | Purpose |
|---|---|
list_har_files(dir) |
List .har files in a directory to choose from. |
load_har(path) |
Load a capture from a local path. Returns the assigned name. |
load_har_url(url) |
Load a capture from an http(s) URL (SSRF-guarded, size-capped). |
list_hars() |
List currently loaded captures. |
unload_har(name) |
Drop a capture to free memory. |
Inspection
| Tool | Purpose |
|---|---|
list_requests(name, filter, …) |
Overview rows: index, method, URL, status, response size. |
get_request(name, index) |
One request, decoded; secrets redacted. |
get_response(name, index, max_chars) |
One response body, decoded (base64/gzip), JSON unwrapped, capped. |
get_headers(name, index) |
Request and response headers for one entry. |
get_query_params(name, index) |
Parsed query string of one request. |
Search
| Tool | Purpose |
|---|---|
search_regex(name, pattern, scope) |
Regex over url | req_headers | resp_headers | req_body | resp_body | all. |
find_header(name, header_name) |
Every entry carrying a header, with raw values. |
find_urls(name, pattern) |
Requests whose URL matches a pattern. |
list_endpoints(name, group_by) |
Unique endpoints with call counts. |
Provenance
| Tool | Purpose |
|---|---|
trace_value(name, value) |
Where a value was set vs. used — the full timeline. |
trace_header(name, header_name) |
Resolve a header's value(s) and trace their origin. |
cookie_map(name) |
Every cookie's set/used lifecycle and attributes. |
token_map(name, all_tokens) |
High-entropy secrets and how they propagate. |
Comparison
| Tool | Purpose |
|---|---|
diff_hars(a, b) |
Requests unique to each capture; matched by (method, url, ordinal). |
Every tool's full description — including argument semantics and a worked example — is available to the model through the MCP protocol.
How provenance tracing works
On first trace, hartrace builds a correlation index over the capture (cached for subsequent calls). For a queried value it separates hits into two sides:
set_by— responses that produced the value: aSet-Cookieheader, or a response body (with the JSON path when the value sits inside parseable JSON).used_in— later requests that sent the value: in a request header, a cookie, or a request body field (again with JSON path where applicable).
An empty set_by means the value was supplied by the client rather than produced by any captured response — for example a pre-existing OAuth token. origin is the earliest producer; timeline is the ordered list of entry indices touched.
Values shorter than four characters are refused, because short strings match everywhere and the result would be noise rather than signal.
Design and safety
- stdio transport only. The server communicates over stdin/stdout per the MCP spec. All logging is routed to stderr so it cannot corrupt the protocol stream. There is no web UI, port, or background process.
- Bounded output. List and search tools paginate with
limit/offset, and limits are clamped server-side (a request for a million rows returns the cap, not a million rows). Response bodies are capped bymax_chars. These bounds are what make token usage predictable. - Bounded memory. Captures are rejected above 500 MB or 50,000 entries; nested-JSON decoding is depth- and size-limited.
- Redaction. Inspection tools mask sensitive header values and high-entropy secrets as
<REDACTED len=N>, using a configurable header list plus a generic Shannon-entropy heuristic — not a vendor-specific token shape. Provenance tools correlate on the real value but display only a redacted preview. (find_headerintentionally returns raw values, since its purpose is to extract a value to trace.) - SSRF protection.
load_har_urlrefuses non-http(s)schemes and any host resolving to a private, loopback, link-local, reserved, or multicast address (including cloud metadata endpoints), and enforces the size cap on download. - No exceptions across the boundary. Every tool returns a structured
{error: "..."}on failure rather than raising, so the agent always receives an actionable message.
The server contains no vendor-specific logic. Helpers such as nested-JSON unwrapping are generic and apply to any deeply nested response.
Development
pip install -e ".[dev]"
pytest
The suite covers parsing and decoding, redaction and entropy detection, provenance tracing (cookie and non-cookie values, JSON-path resolution), the search toolkit, URL loading and SSRF guards, and the tools driven through the actual FastMCP call path.
Layout:
har_mcp.py MCP server: tool definitions and the stdio entry point
har_parser.py Parsing, decoding, redaction, and the in-memory store
provenance.py Correlation index and the trace / cookie / token tools
search.py Regex, header, URL, and endpoint search
config.json Redaction settings (sensitive headers, entropy thresholds)
tests/ pytest suite
Configuration
config.json adjusts redaction without code changes:
{
"sensitive_headers": ["authorization", "cookie", "set-cookie", "x-csrf-token", "x-api-key"],
"entropy_min_len": 24,
"entropy_bits_min": 3.5
}
sensitive_headers are always redacted by name; any other value is redacted if it exceeds entropy_min_len characters and entropy_bits_min bits of Shannon entropy per character.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hartrace-0.1.0.tar.gz.
File metadata
- Download URL: hartrace-0.1.0.tar.gz
- Upload date:
- Size: 32.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7ca5e61f9cb7d56f7325a6ff6da7fc9c81e1b4bc9b16641d8c53b83bc0c7f23
|
|
| MD5 |
536181a59daa64b3577420ed9664e630
|
|
| BLAKE2b-256 |
3bd26a93b695bafbf39472697aaa20787fb05c19f2e1750711a215ae8168e79f
|
File details
Details for the file hartrace-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hartrace-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3112f4c5bd68470124cf9be7b5b8939b517a070f4d7ab5ebd499f0c6521401d7
|
|
| MD5 |
a65182a868dc144bd1239bef02ed023e
|
|
| BLAKE2b-256 |
f7e6d5182002cd3059517f62bc4b3c00708411de58345f446ba6529ee3284388
|