Hyper-fast HTTP Scraping Tool
Project description
HTTPZ Web Scanner
A high-performance concurrent web scanner written in Python. HTTPZ efficiently scans domains for HTTP/HTTPS services, extracting valuable information like status codes, titles, SSL certificates, and more.
Requirements
Installation
Via pip (recommended)
# Install from PyPI
pip install httpz-scanner
# The 'httpz' command will now be available in your terminal
httpz --help
From source
# Clone the repository
git clone https://github.com/acidvegas/httpz
cd httpz
pip install -r requirements.txt
Usage
Command Line Interface
Basic usage:
python -m httpz-scanner domains.txt
Scan with all flags enabled and output to JSONL:
python -m httpz-scanner domains.txt -all -c 100 -o results.jsonl -j -p
Read from stdin:
cat domains.txt | python -m httpz-scanner - -all -c 100
echo "example.com" | python -m httpz-scanner - -all
Filter by status codes and follow redirects:
python -m httpz-scanner domains.txt -mc 200,301-399 -ec 404,500 -fr -p
Show specific fields with custom timeout and resolvers:
python -m httpz-scanner domains.txt -sc -ti -i -tls -to 10 -r resolvers.txt
Full scan with all options:
python -m httpz-scanner domains.txt -c 100 -o output.jsonl -j -all -to 10 -mc 200,301 -ec 404,500 -p -ax -r resolvers.txt
Python Library
import asyncio
from httpz_scanner import HTTPZScanner
async def scan_domains():
# Initialize scanner with all possible options (showing defaults)
scanner = HTTPZScanner(
# Core settings
concurrent_limit=100, # Number of concurrent requests
timeout=5, # Request timeout in seconds
follow_redirects=False, # Follow redirects (max 10)
check_axfr=False, # Try AXFR transfer against nameservers
resolver_file=None, # Path to custom DNS resolvers file
output_file=None, # Path to JSONL output file
show_progress=False, # Show progress counter
debug_mode=False, # Show error states and debug info
jsonl_output=False, # Output in JSONL format
# Control which fields to show (all False by default unless show_fields is None)
show_fields={
'status_code': True, # Show status code
'content_type': True, # Show content type
'content_length': True, # Show content length
'title': True, # Show page title
'body': True, # Show body preview
'ip': True, # Show IP addresses
'favicon': True, # Show favicon hash
'headers': True, # Show response headers
'follow_redirects': True, # Show redirect chain
'cname': True, # Show CNAME records
'tls': True # Show TLS certificate info
},
# Filter results
match_codes={200, 301, 302}, # Only show these status codes
exclude_codes={404, 500, 503} # Exclude these status codes
)
# Initialize resolvers (required before scanning)
await scanner.init()
# Scan domains from file
await scanner.scan('domains.txt')
# Or scan from stdin
await scanner.scan('-')
if __name__ == '__main__':
asyncio.run(scan_domains())
The scanner will return results in this format:
{
'domain': 'example.com', # Base domain
'url': 'https://example.com', # Full URL
'status': 200, # HTTP status code
'port': 443, # Port number
'title': 'Example Domain', # Page title
'body': 'Example body text...', # Body preview
'content_type': 'text/html', # Content type
'content_length': '12345', # Content length
'ips': ['93.184.216.34'], # IP addresses
'cname': 'cdn.example.com', # CNAME record
'nameservers': ['ns1.example.com'],# Nameservers
'favicon_hash': '123456789', # Favicon hash
'headers': { # Response headers
'Server': 'nginx',
'Content-Type': 'text/html'
},
'redirect_chain': [ # Redirect history
'http://example.com',
'https://example.com'
],
'tls': { # TLS certificate info
'fingerprint': 'sha256...',
'common_name': 'example.com',
'issuer': 'Let\'s Encrypt',
'alt_names': ['www.example.com'],
'not_before': '2023-01-01T00:00:00',
'not_after': '2024-01-01T00:00:00',
'version': 3,
'serial_number': 'abcdef1234'
}
}
Arguments
| Argument | Long Form | Description |
|---|---|---|
file |
- | File containing domains (one per line), use - for stdin |
-d |
--debug |
Show error states and debug information |
-c N |
--concurrent N |
Number of concurrent checks (default: 100) |
-o FILE |
--output FILE |
Output file path (JSONL format) |
-j |
--jsonl |
Output JSON Lines format to console |
-all |
--all-flags |
Enable all output flags |
Output Field Flags
| Flag | Long Form | Description |
|---|---|---|
-sc |
--status-code |
Show status code |
-ct |
--content-type |
Show content type |
-ti |
--title |
Show page title |
-b |
--body |
Show body preview |
-i |
--ip |
Show IP addresses |
-f |
--favicon |
Show favicon hash |
-hr |
--headers |
Show response headers |
-cl |
--content-length |
Show content length |
-fr |
--follow-redirects |
Follow redirects (max 10) |
-cn |
--cname |
Show CNAME records |
-tls |
--tls-info |
Show TLS certificate information |
Other Options
| Option | Long Form | Description |
|---|---|---|
-to N |
--timeout N |
Request timeout in seconds (default: 5) |
-mc CODES |
--match-codes CODES |
Only show specific status codes (comma-separated) |
-ec CODES |
--exclude-codes CODES |
Exclude specific status codes (comma-separated) |
-p |
--progress |
Show progress counter |
-ax |
--axfr |
Try AXFR transfer against nameservers |
-r FILE |
--resolvers FILE |
File containing DNS resolvers (one per line) |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file httpz_scanner-1.0.8.tar.gz.
File metadata
- Download URL: httpz_scanner-1.0.8.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bd8919fd333c003fc68538f4c097466a0d502c329a4b7a5ef3ff1dcb5755192
|
|
| MD5 |
8f27730486218ee65d586ad283f36889
|
|
| BLAKE2b-256 |
17872b262a17d659e3ac3b40116754573e02ca1af7e3a1645c1a124273ef943c
|
File details
Details for the file httpz_scanner-1.0.8-py3-none-any.whl.
File metadata
- Download URL: httpz_scanner-1.0.8-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5d399b62eb8b33cbcf88572112a8d08fa0c80cc14a662dcfdbf8f00d290493a
|
|
| MD5 |
fa298e9515c1a4eaa2bc344d2a6484f6
|
|
| BLAKE2b-256 |
bbfcdf88b29950472ae0c0d0a7cf8d67d2f0a1737341be5b9e2692501297783f
|