Skip to main content

Hyper-fast HTTP Scraping Tool

Project description

HTTPZ Web Scanner

A high-performance concurrent web scanner written in Python. HTTPZ efficiently scans domains for HTTP/HTTPS services, extracting valuable information like status codes, titles, SSL certificates, and more.

Requirements

Installation

Via pip (recommended)

# Install from PyPI
pip install httpz

# The 'httpz' command will now be available in your terminal
httpz --help

From source

# Clone the repository
git clone https://github.com/acidvegas/httpz
cd httpz
pip install -r requirements.txt

Usage

Command Line Interface

Basic usage:

python -m httpz domains.txt

Scan with all flags enabled and output to JSONL:

python -m httpz domains.txt -all -c 100 -o results.jsonl -j -p

Read from stdin:

cat domains.txt | python -m httpz - -all -c 100
echo "example.com" | python -m httpz - -all

Filter by status codes and follow redirects:

httpz domains.txt -mc 200,301-399 -ec 404,500 -fr -p

Show specific fields with custom timeout and resolvers:

httpz domains.txt -sc -ti -i -tls -to 10 -r resolvers.txt

Full scan with all options:

httpz domains.txt -c 100 -o output.jsonl -j -all -to 10 -mc 200,301 -ec 404,500 -p -ax -r resolvers.txt

Python Library

import asyncio
from httpz import HTTPZScanner

async def scan_domains():
    # Initialize scanner with all possible options (showing defaults)
    scanner = HTTPZScanner(
        # Core settings
        concurrent_limit=100,    # Number of concurrent requests
        timeout=5,              # Request timeout in seconds
        follow_redirects=False,  # Follow redirects (max 10)
        check_axfr=False,       # Try AXFR transfer against nameservers
        resolver_file=None,     # Path to custom DNS resolvers file
        output_file=None,       # Path to JSONL output file
        show_progress=False,    # Show progress counter
        debug_mode=False,       # Show error states and debug info
        jsonl_output=False,     # Output in JSONL format
        
        # Control which fields to show (all False by default unless show_fields is None)
        show_fields={
            'status_code': True,       # Show status code
            'content_type': True,      # Show content type
            'content_length': True,    # Show content length
            'title': True,            # Show page title
            'body': True,             # Show body preview
            'ip': True,               # Show IP addresses
            'favicon': True,          # Show favicon hash
            'headers': True,          # Show response headers
            'follow_redirects': True,  # Show redirect chain
            'cname': True,            # Show CNAME records
            'tls': True               # Show TLS certificate info
        },
        
        # Filter results
        match_codes={200, 301, 302},    # Only show these status codes
        exclude_codes={404, 500, 503}   # Exclude these status codes
    )

    # Initialize resolvers (required before scanning)
    await scanner.init()

    # Scan domains from file
    await scanner.scan('domains.txt')
    
    # Or scan from stdin
    await scanner.scan('-')

if __name__ == '__main__':
    asyncio.run(scan_domains())

The scanner will return results in this format:

{
    'domain': 'example.com',           # Base domain
    'url': 'https://example.com',      # Full URL
    'status': 200,                     # HTTP status code
    'port': 443,                       # Port number
    'title': 'Example Domain',         # Page title
    'body': 'Example body text...',    # Body preview
    'content_type': 'text/html',       # Content type
    'content_length': '12345',         # Content length
    'ips': ['93.184.216.34'],         # IP addresses
    'cname': 'cdn.example.com',        # CNAME record
    'nameservers': ['ns1.example.com'],# Nameservers
    'favicon_hash': '123456789',       # Favicon hash
    'headers': {                       # Response headers
        'Server': 'nginx',
        'Content-Type': 'text/html'
    },
    'redirect_chain': [               # Redirect history
        'http://example.com',
        'https://example.com'
    ],
    'tls': {                         # TLS certificate info
        'fingerprint': 'sha256...',
        'common_name': 'example.com',
        'issuer': 'Let\'s Encrypt',
        'alt_names': ['www.example.com'],
        'not_before': '2023-01-01T00:00:00',
        'not_after': '2024-01-01T00:00:00',
        'version': 3,
        'serial_number': 'abcdef1234'
    }
}

Arguments

Argument Long Form Description
file - File containing domains (one per line), use - for stdin
-d --debug Show error states and debug information
-c N --concurrent N Number of concurrent checks (default: 100)
-o FILE --output FILE Output file path (JSONL format)
-j --jsonl Output JSON Lines format to console
-all --all-flags Enable all output flags

Output Field Flags

Flag Long Form Description
-sc --status-code Show status code
-ct --content-type Show content type
-ti --title Show page title
-b --body Show body preview
-i --ip Show IP addresses
-f --favicon Show favicon hash
-hr --headers Show response headers
-cl --content-length Show content length
-fr --follow-redirects Follow redirects (max 10)
-cn --cname Show CNAME records
-tls --tls-info Show TLS certificate information

Other Options

Option Long Form Description
-to N --timeout N Request timeout in seconds (default: 5)
-mc CODES --match-codes CODES Only show specific status codes (comma-separated)
-ec CODES --exclude-codes CODES Exclude specific status codes (comma-separated)
-p --progress Show progress counter
-ax --axfr Try AXFR transfer against nameservers
-r FILE --resolvers FILE File containing DNS resolvers (one per line)

Examples

Scan domains with all flags enabled and output to JSONL:

python httpz.py domains.txt -c 100 -o output.jsonl -j -all -to 10 -mc 200,301 -ec 404,500 -p

Scan domains from stdin:

cat domains.txt | python httpz.py - -c 100 -o output.jsonl -j -all -to 10 -mc 200,301 -ec 404,500 -p

Scan domains with custom resolvers and AXFR checks:

python httpz.py domains.txt -r resolvers.txt -ax -c 100 -o output.jsonl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

httpz_scanner-1.0.0.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

httpz_scanner-1.0.0-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file httpz_scanner-1.0.0.tar.gz.

File metadata

  • Download URL: httpz_scanner-1.0.0.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for httpz_scanner-1.0.0.tar.gz
Algorithm Hash digest
SHA256 fecca9cb4258d33b08ff693a2d39ecca2f5a241b6d33788623546aaf5ad0e07d
MD5 848b71c3a464407a2289a0f94b5a9cb3
BLAKE2b-256 b76be59cf45945586e94711d6790a92e3c7c59025ef78fd41cdc57c4a4aaff87

See more details on using hashes here.

File details

Details for the file httpz_scanner-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: httpz_scanner-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for httpz_scanner-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 88886d742be62cffbce43fff7d77a41c04336a147e7e5ded77895dacaa24971e
MD5 d8d2b85b1167c34a145f27ca9b0baac8
BLAKE2b-256 4dd77c455dbb5268d39bd6070fe22b4a895428ffa3d7bb934f551c8b7442bef3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page