Skip to main content

HAR capture and PII sanitization library for network traffic analysis

Project description

har-capture

PyPI version Downloads codecov License: MIT AI Assisted

Capture and sanitize HAR (HTTP Archive) files. HAR files record browser HTTP activity and are commonly used for debugging, diagnostics, and test fixtures.

Quick Start

Windows
  1. Install Python from the Microsoft Store or python.org
  2. Open PowerShell and run:
pip install har-capture[full]
python -m har_capture get https://example.com
macOS / Linux
pip install har-capture[full]
har-capture get https://example.com
Already have a HAR file?
pip install har-capture
har-capture sanitize myfile.har
Python API
from har_capture.sanitization import sanitize_har

with open("input.har") as f:
    har_data = json.load(f)

sanitized = sanitize_har(har_data)

Why har-capture?

Chrome DevTools v130+ now sanitizes cookies and auth headers by default when exporting HAR files. That's a good start, but HAR files contain much more sensitive data:

  • IP addresses, MAC addresses, email addresses
  • Passwords and credentials in form bodies
  • Serial numbers, device names, session tokens

har-capture provides deep sanitization and CLI automation:

har-capture get <TARGET>     # Capture + sanitize in one step

Comparison with Existing Tools

Feature har-capture DevTools Google Cloudflare Edgio
Sanitization
Cookies/auth headers Yes Yes Yes Yes Yes
IPs, MACs, emails Yes No No No No
Passwords in forms Yes No Yes No Yes
JWT smart redaction No No No Yes No
Correlation-preserving Yes No No No No
Usability
No installation needed No Yes No Yes Yes
Data stays local Yes Yes No Yes Yes
CLI/scriptable Yes No Yes No Yes
Preview before redact Yes No Yes No No
Extras
Integrated capture Yes Yes No No No
Custom patterns Yes No Yes No No
Validation Yes No No No No

Target Use Cases

  • Support diagnostics: Users submit sanitized HAR files without exposing credentials
  • Web development: Capture and analyze HTTP traffic for debugging
  • Test fixtures: Generate reproducible traffic captures for testing
  • Security review: Validate HAR files for PII leaks before sharing

Features

  • Zero Dependencies Core: Core sanitization uses only Python stdlib
  • HAR Capture: Browser-based capture using Playwright (optional)
  • PII Sanitization: Remove sensitive data from HTML and HAR files
  • Correlation-Preserving Redaction: Salted hashes maintain value relationships
  • Custom Patterns: External JSON files for easy pattern updates
  • Validation: Check HAR files for PII leaks before committing
  • CLI Interface: Easy-to-use command line tools

Installation

# Core only (zero dependencies)
pip install har-capture

# With browser capture
pip install har-capture[capture]
playwright install chromium  # Install browser

# With CLI
pip install har-capture[cli]

# Full installation
pip install har-capture[full]

Quick Start

Python API

from har_capture.sanitization import sanitize_html, sanitize_har

# Sanitize HTML (correlation-preserving by default)
clean_html = sanitize_html(raw_html)

# Sanitize with consistent salt (correlate across files)
clean_html = sanitize_html(raw_html, salt="my-secret-key")

# Use static placeholders (legacy mode)
clean_html = sanitize_html(raw_html, salt=None)

# Sanitize HAR file
from har_capture.sanitization import sanitize_har_file
sanitize_har_file("capture.har")  # Creates capture.sanitized.har

CLI

# Capture HTTP traffic
har-capture get <TARGET>

# Sanitize a HAR file (uses random salt by default)
har-capture sanitize capture.har

# Sanitize with consistent salt
har-capture sanitize capture.har --salt my-key

# Sanitize with static placeholders
har-capture sanitize capture.har --no-salt

# Use custom patterns
har-capture sanitize capture.har --patterns custom.json

# Validate for PII leaks
har-capture validate capture.har

Correlation-Preserving Redaction

By default, har-capture uses format-preserving salted hashes for redaction:

  • Same value → same hash (within a session)
  • Different values → different hashes
  • Output remains valid format (parseable by analysis tools)
  • Uses reserved/documentation ranges that won't collide with real data

Example:

Before:
  MAC: AA:BB:CC:DD:EE:FF (appears 3 times)
  MAC: 11:22:33:44:55:66 (appears 2 times)

With salted hash (default):
  MAC: 02:a1:b2:c3:d4:e5 (appears 3 times - same device, valid MAC format)
  MAC: 02:7f:8e:9d:2c:01 (appears 2 times - different device)

With static placeholders (--no-salt):
  MAC: XX:XX:XX:XX:XX:XX (appears 5 times - correlation lost)

Format-preserving ranges used:

Type Range Standard
MAC 02:xx:xx:xx:xx:xx Locally administered bit
Private IP 10.255.x.x RFC 1918
Public IP 192.0.2.x RFC 5737 TEST-NET-1
IPv6 2001:db8:: RFC 3849 documentation
Email user_xxx@redacted.invalid RFC 2606 .invalid TLD

Salt options:

  • --salt auto (default): Random salt per session
  • --salt my-key: Consistent hashing across runs
  • --no-salt: Static placeholders (legacy mode)

Custom Patterns

Patterns are stored in external JSON files for easy customization:

src/har_capture/patterns/
├── pii.json          # PII detection patterns
├── sensitive.json    # Sensitive headers/fields
└── allowlist.json    # Safe placeholder values

Add custom patterns via CLI:

har-capture sanitize capture.har --patterns my_patterns.json
har-capture validate capture.har --patterns my_patterns.json

Add custom patterns via Python:

from har_capture.sanitization import sanitize_html

clean = sanitize_html(html, custom_patterns="my_patterns.json")

Example custom patterns file:

{
  "patterns": {
    "my_custom_id": {
      "regex": "CUST-[A-Z0-9]{8}",
      "replacement_prefix": "CUSTID",
      "description": "Customer ID pattern"
    }
  }
}

PII Categories Removed

The sanitization removes the following types of PII:

  • MAC Addresses: AA:BB:CC:DD:EE:FF02:a1:b2:c3:d4:e5
  • Private IPs: 192.168.1.10010.255.42.17
  • Public IPs: 8.8.8.8192.0.2.42
  • IPv6 Addresses: fe80::12001:db8::a1b2:c3d4
  • Email Addresses: user@example.comuser_a1b2c3d4@redacted.invalid
  • Passwords/Credentials: In forms, headers, and JavaScript → PASS_a1b2c3d4
  • Session Tokens: In cookies and headers → TOKEN_a1b2c3d4
  • Serial Numbers: → SERIAL_a1b2c3d4
  • WiFi Credentials: In JavaScript variables
  • Device Names: In network device lists

CLI Commands

get

Capture HTTP traffic using a browser.

har-capture get <TARGET>
har-capture get <TARGET> --output capture.har
har-capture get <TARGET> --no-sanitize

sanitize

Remove PII from HAR files.

har-capture sanitize capture.har
har-capture sanitize capture.har --output clean.har --compress
har-capture sanitize capture.har --salt my-key      # Consistent hash
har-capture sanitize capture.har --no-salt          # Static placeholders
har-capture sanitize capture.har --patterns custom.json
har-capture sanitize capture.har --max-size 500     # Allow up to 500MB
har-capture sanitize capture.har --compression-level 6  # Faster compression

validate

Check for PII leaks.

har-capture validate capture.har
har-capture validate --dir ./captures --recursive
har-capture validate capture.har --strict
har-capture validate capture.har --patterns custom.json

Platform Support

Component Windows macOS Linux
Sanitization Yes Yes Yes
Validation Yes Yes Yes
CLI Yes Yes Yes
Capture Yes Yes Yes

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check .

# Type checking
mypy src/har_capture

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

har_capture-0.2.0.tar.gz (60.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

har_capture-0.2.0-py3-none-any.whl (46.6 kB view details)

Uploaded Python 3

File details

Details for the file har_capture-0.2.0.tar.gz.

File metadata

  • Download URL: har_capture-0.2.0.tar.gz
  • Upload date:
  • Size: 60.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for har_capture-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8548afe23df7883413bf5c7a2f1f96967de6c839335867d0c7e2af11d364b188
MD5 22b43d589ef3215b22b4a589a6b77904
BLAKE2b-256 3b02ee166ce7e3bab793ac63ee487a28f9c2cbe5aa06435071c5d0127c1fd450

See more details on using hashes here.

Provenance

The following attestation bundles were made for har_capture-0.2.0.tar.gz:

Publisher: publish.yml on solentlabs/har-capture

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file har_capture-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: har_capture-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 46.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for har_capture-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 71b86c3de8d253dc0a31eff41556cdbd0278ab2bbb8cece456947162f40df27f
MD5 e30eac491ba3cb640770307af00804e5
BLAKE2b-256 f47f4e73ba895e93cba96f3fd4e9eab283c3e54ec13352375ca9ea7190d006cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for har_capture-0.2.0-py3-none-any.whl:

Publisher: publish.yml on solentlabs/har-capture

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page