HAR capture and PII sanitization library for network traffic analysis
Project description
har-capture
Capture and sanitize HAR (HTTP Archive) files with deep PII removal. Perfect for support diagnostics, security reviews, and test fixtures.
Quick Start
Windows
- Install Python from the Microsoft Store or python.org
- Open PowerShell and run:
pip install har-capture[full]
python -m har_capture https://example.com
macOS / Linux
pip install har-capture[full]
har-capture https://example.com
Already have a HAR file?
pip install har-capture
har-capture sanitize myfile.har
Why har-capture?
Chrome DevTools now sanitizes cookies and auth headers, but HAR files contain much more sensitive data: IP addresses, MAC addresses, emails, passwords in form bodies, serial numbers, device names, WiFi credentials, session tokens, and API keys.
How har-capture compares:
| Feature | har-capture | DevTools | Google/Cloudflare |
|---|---|---|---|
| Deep sanitization (IPs, MACs, emails) | ✅ | ❌ | ❌ |
| Correlation-preserving hashes | ✅ | ❌ | ❌ |
| Interactive review | ✅ | ❌ | Varies |
| Custom patterns | ✅ | ❌ | Limited |
| Local + CLI automation | ✅ | No CLI | Varies |
Key benefits:
- Zero dependencies - Core sanitization uses only Python stdlib
- Format-preserving hashes - Track the same device across requests without exposing real values
- One-command workflow - Capture, sanitize, and compress in a single step
See detailed comparison with all tools →
See It In Action
1. Sanitization report — 84 values auto-redacted across 9 PII categories:
2. Flagged values for review — passwords, fields, WiFi SSIDs, and phone numbers detected automatically:
3. Interactive redaction picker — high-confidence items pre-selected, you choose the rest:
Installation
# Core only (sanitization - zero dependencies)
pip install har-capture
# With browser capture support
pip install har-capture[capture]
playwright install chromium
# Full installation (recommended)
pip install har-capture[full]
Usage
Command Line
# Capture and sanitize (interactive review always enabled)
har-capture https://example.com
# Sanitize existing HAR
har-capture sanitize capture.har
# Validate for PII leaks
har-capture validate capture.har
Python API
from har_capture.sanitization import sanitize_html, sanitize_har_file
from har_capture.sanitization.report import HeuristicMode
# Sanitize HTML (correlation-preserving by default)
clean_html = sanitize_html(raw_html)
# Sanitize with consistent salt (correlate across captures)
clean_html = sanitize_html(raw_html, salt="my-secret-key")
# Enable heuristic detection for WiFi, SSIDs, device names
clean_html = sanitize_html(raw_html, heuristics=HeuristicMode.REDACT)
# Sanitize HAR file
sanitize_har_file("capture.har") # → capture.sanitized.har
# Custom patterns (e.g., modem serials, customer IDs)
custom = {"patterns": {"modem_sn": {"regex": r"SN[0-9]{10}", "replacement_prefix": "MODEM"}}}
sanitize_har_file("capture.har", custom_patterns=custom)
Documentation
- Comparison with Other Tools - DevTools, Google, Cloudflare, Edgio
- Correlation-Preserving Redaction - How format-preserving hashing works
- PII Categories - What gets sanitized
- Custom Patterns - Add organization-specific patterns
- CLI Reference - Detailed command documentation
- Interactive Sanitization - Review edge cases manually
Use Cases
- Support diagnostics - Users submit sanitized HAR files without exposing credentials
- Security review - Validate HAR files for PII leaks before sharing
- Test fixtures - Generate reproducible traffic captures
- Modem debugging - Capture router/modem traffic with sensitive data removed
What Gets Sanitized
| Category | Examples | Output |
|---|---|---|
| Network | IPs, MACs | 192.168.1.1 → 10.255.42.17 |
| Personal | Emails, phones | user@example.com → user_a1b2@redacted.invalid |
| Credentials | Passwords, tokens | password=secret → password=PASS_a1b2c3d4 |
| Device | Serials, WiFi, SSIDs | SN123456 → SERIAL_a1b2c3d4 |
| HTTP | Auth headers, cookies | Cookie: session=xyz → Cookie: session=TOKEN_a1b2 |
See complete PII categories list →
Platform Support
| Component | Windows | macOS | Linux |
|---|---|---|---|
| Sanitization | ✅ | ✅ | ✅ |
| Validation | ✅ | ✅ | ✅ |
| CLI | ✅ | ✅ | ✅ |
| Capture | ✅ | ✅ | ✅ |
Contributing
Contributions welcome! See CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file har_capture-0.5.1.tar.gz.
File metadata
- Download URL: har_capture-0.5.1.tar.gz
- Upload date:
- Size: 411.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12019439a4acd8e29deeaae2f4659257c4ccbb91b54d2008d1422350743cb132
|
|
| MD5 |
10b5d51d5c83e3e90850003cc5288990
|
|
| BLAKE2b-256 |
b2058b18157e355a8732b09676c160d87be691d398fbf467b0e60ff007efee2c
|
Provenance
The following attestation bundles were made for har_capture-0.5.1.tar.gz:
Publisher:
publish.yml on solentlabs/har-capture
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
har_capture-0.5.1.tar.gz -
Subject digest:
12019439a4acd8e29deeaae2f4659257c4ccbb91b54d2008d1422350743cb132 - Sigstore transparency entry: 1201567004
- Sigstore integration time:
-
Permalink:
solentlabs/har-capture@fc54218bf83d69b660bd05192a8f5eec37016c10 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/solentlabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fc54218bf83d69b660bd05192a8f5eec37016c10 -
Trigger Event:
push
-
Statement type:
File details
Details for the file har_capture-0.5.1-py3-none-any.whl.
File metadata
- Download URL: har_capture-0.5.1-py3-none-any.whl
- Upload date:
- Size: 94.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05ee225ae95762b8560e2b7bcaf9aaa80c162c1819d422a7cc203f09515e30aa
|
|
| MD5 |
93a72501f7ab620a88cc21f22b338f1e
|
|
| BLAKE2b-256 |
eea42ad17e50d45e1f68d06e8cc47b266280f2051de22edaedfa78a0871147a4
|
Provenance
The following attestation bundles were made for har_capture-0.5.1-py3-none-any.whl:
Publisher:
publish.yml on solentlabs/har-capture
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
har_capture-0.5.1-py3-none-any.whl -
Subject digest:
05ee225ae95762b8560e2b7bcaf9aaa80c162c1819d422a7cc203f09515e30aa - Sigstore transparency entry: 1201567018
- Sigstore integration time:
-
Permalink:
solentlabs/har-capture@fc54218bf83d69b660bd05192a8f5eec37016c10 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/solentlabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fc54218bf83d69b660bd05192a8f5eec37016c10 -
Trigger Event:
push
-
Statement type: