HTTP Response Fuzzy Hashing

Project description

HRFH - HTTP Response Fuzzy Hashing

A Python library for generating fuzzy hashes of HTTP responses, useful for identifying similar web content, detecting CDN configurations, and analyzing web infrastructure.

Features

Fast Processing: Efficient HTTP response parsing and hashing
Fuzzy Hashing: Generate consistent hashes for similar content
Content Masking: Intelligent masking of dynamic content (timestamps, IDs, etc.)
Multiple Formats: Support for raw HTTP responses and JSON data
Python 3.7+: Compatible with modern Python versions
Easy Integration: Simple API for embedding in your projects

Installation

From PyPI (Recommended)

pip install hrfh

From Source

git clone https://github.com/yourusername/hrfh.git
cd hrfh
uv sync

Quick Start

Basic Usage

from hrfh.utils.parser import create_http_response_from_bytes

# Parse HTTP response from bytes
response = create_http_response_from_bytes(
    b"""HTTP/1.0 200 OK\r\nServer: nginx\r\nServer: apache\r\nETag: ea67ba7f802fb5c6cfa13a6b6d27adc6\r\n\r\n"""
)

# Get basic response info
print(response)
# Output: <HTTPResponse 1.1.1.1:80 200 OK>

# Get masked content (with dynamic parts masked)
print(response.masked)
# Output: HTTP/1.0 200 OK
#         ETag: [MASK]
#         Server: apache
#         Server: nginx

# Generate fuzzy hash for similarity detection
print(response.fuzzy_hash())
# Output: ba15cc1f9ad3ef632d0ce7798f7fa44718f1e7fcc2c0f94c1a702f647b79923b

Interactive Example

>>> from hrfh.utils.parser import create_http_response_from_bytes
>>> response = create_http_response_from_bytes(b"""HTTP/1.0 200 OK\r\nServer: nginx\r\nServer: apache\r\nETag: ea67ba7f802fb5c6cfa13a6b6d27adc6\r\n\r\n""")
>>> print(response)
<HTTPResponse 1.1.1.1:80 200 OK>
>>> print(response.masked)
HTTP/1.0 200 OK
ETag: [MASK]
Server: apache
Server: nginx
>>> print(response.fuzzy_hash())
ba15cc1f9ad3ef632d0ce7798f7fa44718f1e7fcc2c0f94c1a702f647b79923b

API Reference

Core Classes

HTTPResponse

Main class for representing HTTP responses with fuzzy hashing capabilities.

from hrfh.models import HTTPResponse

response = HTTPResponse(
    ip="1.2.3.4",
    port=80,
    version="HTTP/1.1",
    status_code=200,
    status_reason="OK",
    headers=[("Server", "nginx"), ("Content-Type", "text/html")],
    body=b"<html>Hello World</html>"
)

Key Methods:

fuzzy_hash(): Generate fuzzy hash for similarity detection
masked: Get masked content with dynamic parts hidden
dump(): Get formatted HTTP response string

HTTPRequest

Class for representing HTTP requests.

from hrfh.models import HTTPRequest

request = HTTPRequest(
    ip="1.2.3.4",
    port=80,
    method="GET",
    version="HTTP/1.1",
    headers=[("Host", "example.com")],
    body=b""
)

Utility Functions

Parsing Functions

from hrfh.utils.parser import (
    create_http_response_from_bytes,
    create_http_response_from_json,
    create_http_request_from_json
)

# Parse from raw HTTP response bytes
response = create_http_response_from_bytes(http_bytes)

# Parse from JSON data
response = create_http_response_from_json(json_data)
request = create_http_request_from_json(json_data)

Advanced Usage

Working with JSON Data

import json
from hrfh.utils.parser import create_http_response_from_json

# Load HTTP response data from JSON file
with open('response_data.json', 'r') as f:
    data = json.load(f)

response = create_http_response_from_json(data)
hash_value = response.fuzzy_hash()

Example JSON format:

{
  "ip": "104.103.147.116",
  "timestamp": 1717146116,
  "status_code": 400,
  "status_reason": "Bad Request",
  "headers": {
    "Server": "AkamaiGHost",
    "Content-Type": "text/html",
    "Content-Length": "312"
  },
  "body": "<HTML><HEAD><TITLE>Invalid URL</TITLE></HEAD><BODY>...</BODY></HTML>"
}

Batch Processing

import os
from hrfh.utils.parser import create_http_response_from_json

def process_responses(data_dir):
    results = {}

    for cdn_dir in os.listdir(data_dir):
        cdn_path = os.path.join(data_dir, cdn_dir)
        if os.path.isdir(cdn_path):
            for json_file in os.listdir(cdn_path):
                if json_file.endswith('.json'):
                    file_path = os.path.join(cdn_path, json_file)
                    with open(file_path, 'r') as f:
                        data = json.load(f)

                    response = create_http_response_from_json(data)
                    hash_value = response.fuzzy_hash()
                    results[hash_value] = response

    return results

# Usage
results = process_responses('data/')
for hash_val, response in results.items():
    print(f"{hash_val[:16]} {response}")

Development

Setting Up Development Environment

Clone the repository

git clone https://github.com/yourusername/hrfh.git
cd hrfh

Install dependencies
```
uv sync
```
Run tests
```
uv run pytest
```
Type checking
```
uv run mypy hrfh/
```

Project Structure

hrfh/
├── hrfh/                    # Main package
│   ├── models/             # Data models (HTTPRequest, HTTPResponse)
│   ├── utils/              # Utility functions
│   │   ├── parser.py       # HTTP parsing utilities
│   │   ├── masker.py       # Content masking logic
│   │   ├── hasher.py       # Hashing algorithms
│   │   └── tokenizer.py    # HTML tokenization
│   └── __main__.py         # CLI entry point
├── tests/                   # Test suite
├── data/                    # Sample data for testing
├── pyproject.toml          # Project configuration
└── README.md               # This file

Running the CLI Tool

# Install the package in development mode
uv sync

# Run the CLI tool
uv run hrfh --help

# Process a specific file
uv run hrfh data/akamai/104.103.147.116.json

# Process from stdin
cat data/akamai/104.103.147.116.json | uv run hrfh -

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Testing

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=hrfh

# Run specific test file
uv run pytest tests/test_http_response.py

Examples

CDN Analysis

from hrfh.utils.parser import create_http_response_from_bytes

# Analyze responses from different CDNs
akamai_response = create_http_response_from_bytes(akamai_bytes)
cloudflare_response = create_http_response_from_bytes(cloudflare_bytes)

# Compare hashes to detect similar content
if akamai_response.fuzzy_hash() == cloudflare_response.fuzzy_hash():
    print("Same content served from different CDNs")

Content Change Detection

# Monitor for content changes
old_hash = response.fuzzy_hash()

# After some time...
new_response = create_http_response_from_bytes(new_bytes)
new_hash = new_response.fuzzy_hash()

if old_hash != new_hash:
    print("Content has changed!")

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Issues: GitHub Issues
Documentation: GitHub Wiki
Discussions: GitHub Discussions

Acknowledgments

Built with BeautifulSoup for HTML parsing
Uses NLTK for natural language processing
Inspired by fuzzy hashing techniques for digital forensics

Project details

Release history Release notifications | RSS feed

0.1.22

Aug 12, 2025

This version

0.1.21

Aug 12, 2025

0.1.18

Jul 8, 2024

0.1.17

Jul 3, 2024

0.1.15

Jun 28, 2024

0.1.14

Jun 22, 2024

0.1.13

Jun 22, 2024

0.1.12

Jun 22, 2024

0.1.11

Jun 22, 2024

0.1.10

Jun 22, 2024

0.1.9

Jun 22, 2024

0.1.8

Jun 22, 2024

0.1.7

Jun 22, 2024

0.1.6

Jun 20, 2024

0.1.5

Jun 19, 2024

0.1.4

Jun 19, 2024

0.1.3

Jun 19, 2024

0.1.1

Jun 18, 2024

0.1.0

Jun 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hrfh-0.1.21.tar.gz (30.4 kB view details)

Uploaded Aug 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hrfh-0.1.21-py3-none-any.whl (131.8 kB view details)

Uploaded Aug 12, 2025 Python 3

File details

Details for the file hrfh-0.1.21.tar.gz.

File metadata

Download URL: hrfh-0.1.21.tar.gz
Upload date: Aug 12, 2025
Size: 30.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for hrfh-0.1.21.tar.gz
Algorithm	Hash digest
SHA256	`b1d299187f259ba06a4c11faa7a33e6631bc6d5394fa8738f4aad21aa35ec5a4`
MD5	`3d53d1f9a157698336d73b761fd30e62`
BLAKE2b-256	`32cb53e28d2247bc624a48ff6ddc9ec34709fd967479f9fac4f2602ec291ae79`

See more details on using hashes here.

File details

Details for the file hrfh-0.1.21-py3-none-any.whl.

File metadata

Download URL: hrfh-0.1.21-py3-none-any.whl
Upload date: Aug 12, 2025
Size: 131.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for hrfh-0.1.21-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd54a5fdb0cd21af4f3ae3690319159f3dd8f04da2eae89c08a1c3b94371ff8b`
MD5	`99649a162d0d2bcc809cda8d5c8789a9`
BLAKE2b-256	`ecae41675c5691cc2ca4966308ec1ad977a74cb0ee1af697a5334cb393b33140`

See more details on using hashes here.

hrfh 0.1.21

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

HRFH - HTTP Response Fuzzy Hashing

Features

Installation

From PyPI (Recommended)

From Source

Quick Start

Basic Usage

Interactive Example

API Reference

Core Classes

HTTPResponse

HTTPRequest

Utility Functions

Parsing Functions

Advanced Usage

Working with JSON Data

Batch Processing

Development

Setting Up Development Environment

Project Structure

Running the CLI Tool

Contributing

Testing

Examples

CDN Analysis

Content Change Detection

License

Support

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes