Skip to main content

HAR capture and PII sanitization library for network traffic analysis

Project description

har-capture

PyPI version Downloads codecov License: MIT AI Assisted

Capture and sanitize HAR (HTTP Archive) files with deep PII removal. Perfect for support diagnostics, security reviews, and test fixtures.

Quick Start

Windows

  1. Install Python from the Microsoft Store or python.org
  2. Open PowerShell and run:
pip install har-capture[full]
python -m har_capture get https://example.com --patterns network-device

macOS / Linux

pip install har-capture[full]
har-capture get https://example.com --patterns network-device

Already have a HAR file?

pip install har-capture
har-capture sanitize myfile.har --patterns network-device

Why har-capture?

Chrome DevTools now sanitizes cookies and auth headers, but HAR files contain much more sensitive data: IP addresses, MAC addresses, emails, passwords in form bodies, serial numbers, device names, WiFi credentials, session tokens, and API keys.

How har-capture compares:

Feature har-capture DevTools Google/Cloudflare
Deep sanitization (IPs, MACs, emails)
Correlation-preserving hashes
Interactive review Varies
Custom patterns Limited
Local + CLI automation No CLI Varies

Key benefits:

  • Zero dependencies - Core sanitization uses only Python stdlib
  • Format-preserving hashes - Track the same device across requests without exposing real values
  • One-command workflow - Capture, sanitize, and compress in a single step
  • Interactive browser flows preserved - Handle browser auth, popups, and dialogs while still recording the resulting traffic

See detailed comparison with all tools →


See It In Action

1. Sanitization report — 84 values auto-redacted across 9 PII categories:

Sanitization Report

2. Flagged values for review — passwords, fields, WiFi SSIDs, and phone numbers detected automatically:

Flagged Values for Review

3. Interactive redaction picker — high-confidence items pre-selected, you choose the rest:

Redact Picker


Installation

# Core only (sanitization - zero dependencies)
pip install har-capture

# With browser capture support
pip install har-capture[capture]
playwright install chromium

# Full installation (recommended)
pip install har-capture[full]

Usage

Command Line

# Capture and sanitize a network device (cable modem, router, AP)
har-capture get https://192.168.100.1 --patterns network-device

# Sanitize an existing HAR with universal PII rules only (no device domain)
har-capture sanitize capture.har --patterns base

# Validate for PII leaks
har-capture validate capture.har --patterns network-device

--patterns is required as of 0.9.0 — pick network-device for cable modems/routers/APs, base for generic web/API captures, or a custom JSON path. Run har-capture patterns for the full list.

Full CLI reference →

Python API

from har_capture.sanitization import sanitize_html, sanitize_har_file
from har_capture.sanitization.report import HeuristicMode

# Sanitize HTML (correlation-preserving by default)
clean_html = sanitize_html(raw_html)

# Sanitize with consistent salt (correlate across captures)
clean_html = sanitize_html(raw_html, salt="my-secret-key")

# Enable heuristic detection for WiFi, SSIDs, device names
clean_html = sanitize_html(raw_html, heuristics=HeuristicMode.REDACT)

# Sanitize HAR file
sanitize_har_file("capture.har")  # → capture.sanitized.har

# Custom patterns (e.g., modem serials, customer IDs)
custom = {"patterns": {"modem_sn": {"regex": r"SN[0-9]{10}", "replacement_prefix": "MODEM"}}}
sanitize_har_file("capture.har", custom_patterns=custom)

# Redact device-specific credential FIELD NAMES (not just value patterns).
# See docs/CUSTOM_PATTERNS.md#extending-sensitive-field-detection.
device_fields = {"fields": {"auto_redact_patterns": ["pws"]}}
sanitize_har_file("capture.har", custom_patterns=device_fields)

Documentation


Use Cases

  • Support diagnostics - Users submit sanitized HAR files without exposing credentials
  • Security review - Validate HAR files for PII leaks before sharing
  • Test fixtures - Generate reproducible traffic captures
  • Modem debugging - Capture router/modem traffic with sensitive data removed

What Gets Sanitized

Category Examples Output
Network IPs, MACs 192.168.1.110.255.42.17 (private), 8.8.8.8192.0.2.42 (public)
Personal Emails, phones user@example.comuser_a1b2@redacted.invalid
Credentials Passwords, tokens password=secretpassword=PASS_a1b2c3d4
Device Serials, WiFi, SSIDs SN123456SERIAL_a1b2c3d4
HTTP Auth headers, cookies Cookie: session=xyzCookie: session=TOKEN_a1b2

See complete PII categories list →


Platform Support

Component Windows macOS Linux
Sanitization
Validation
CLI
Capture

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.


License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

har_capture-0.10.1.tar.gz (550.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

har_capture-0.10.1-py3-none-any.whl (118.2 kB view details)

Uploaded Python 3

File details

Details for the file har_capture-0.10.1.tar.gz.

File metadata

  • Download URL: har_capture-0.10.1.tar.gz
  • Upload date:
  • Size: 550.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for har_capture-0.10.1.tar.gz
Algorithm Hash digest
SHA256 1e6bc1fab1995a2de58e1ab280eed523d58ca5c9752c044f8e6bb70563f8ea5a
MD5 07142f8b6ca3a4d5b40ab3d0271fc7e1
BLAKE2b-256 4f7d0fae91f5a5c63f0998c4582856ee340575f03831547b3074faaf34c74284

See more details on using hashes here.

Provenance

The following attestation bundles were made for har_capture-0.10.1.tar.gz:

Publisher: publish.yml on solentlabs/har-capture

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file har_capture-0.10.1-py3-none-any.whl.

File metadata

  • Download URL: har_capture-0.10.1-py3-none-any.whl
  • Upload date:
  • Size: 118.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for har_capture-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 29f871490f83d02c618f59e24f84a2d317f0b1007557c25e4e591a242d4b823a
MD5 abfa74b4c6a3db68b6644cd733988e90
BLAKE2b-256 8f10f2c3c7a9b7d74333953b1e72a81f291e52f0abbc7c331c8c7899022268c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for har_capture-0.10.1-py3-none-any.whl:

Publisher: publish.yml on solentlabs/har-capture

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page