Shared HTTP, crawling, and scanning infrastructure for CommonHuman-Lab tools
Project description
commonhuman-core
Shared HTTP engine and web crawler for CommonHuman-Lab tools — session management, injection helpers, BFS crawling, and passive recon primitives. One place. No duplication.
pip install commonhuman-core
Why it exists
Every CommonHuman-Lab scanner needs to speak HTTP: proxy routing, cookie injection, rate-limit back-off, and injection helpers for query params, POST bodies, path segments, headers, and cookies. Every scanner also needs to crawl — BFS traversal, form discovery, same-origin enforcement.
commonhuman-core is the single source of truth for that layer. Tools that use it get:
- Battle-tested session handling — automatic retry on connection errors, 429 back-off with
Retry-Aftersupport, configurable per-request delay. - A complete injection toolkit — GET params, form POST, JSON POST, path segments (by index), cookies, and custom headers through one consistent interface.
- BFS crawling with exclude patterns — multi-threaded, depth and page limits, HTML form extraction, URL parameter discovery, regex-based URL filtering.
- A single place to improve — a new injection method or crawler feature lands in every tool at once.
Quick start
from commonhuman_core.http import HttpClient
from commonhuman_core.crawler import crawl, CrawlResult
from commonhuman_core.passive import fetch_seed
What's in it
| Module | Purpose |
|---|---|
commonhuman_core.http.HttpClient |
HTTP session wrapper — proxy, cookies, SSL, retry, rate limiting, injection helpers |
commonhuman_core.http.parse_cookie_string |
Parse name=value; ... or JSON cookie strings |
commonhuman_core.http.parse_post_data |
Parse urlencoded or JSON POST bodies into a flat dict |
commonhuman_core.crawler |
BFS web crawler — link + form discovery, page source storage |
commonhuman_core.passive |
Passive recon helpers — fetch_seed() |
Modules
http.HttpClient
Thin wrapper around requests.Session with everything a scanner needs built in.
from commonhuman_core.http import HttpClient
client = HttpClient(
timeout=15,
proxy="http://127.0.0.1:8080",
headers={"X-Custom": "value"},
cookies="session=abc; token=xyz",
verify_ssl=False,
delay=0.5, # seconds between requests
)
Core HTTP
resp = client.get("https://target.com/search?q=test")
resp = client.post("https://target.com/login", data={"user": "admin"})
resp = client.head("https://target.com/")
print(client.request_count) # total requests sent (including retries)
client.close()
Injection helpers
# Replace or add a query parameter
client.inject_get("https://target.com/search?q=original", "q", "PAYLOAD")
# → GET /search?q=PAYLOAD
# Inject into a form POST body
client.inject_post("https://target.com/login", "user", "PAYLOAD", base_data={"csrf": "tok"})
# → POST body: user=PAYLOAD&csrf=tok
# Inject into a JSON POST body
client.inject_post_json("https://target.com/api/search", "query", "PAYLOAD", base_data={"page": 1})
# → POST body: {"query": "PAYLOAD", "page": 1}
# Replace a path segment by index (0-based after splitting on "/")
client.inject_path("https://target.com/api/user/123", 3, "PAYLOAD")
# → GET /api/user/PAYLOAD
# Pass -1 to append a new trailing segment
client.inject_path("https://target.com/page", -1, "PAYLOAD")
# → GET /page/PAYLOAD
# Inject a cookie for a single request
client.inject_cookie("https://target.com/", "session", "PAYLOAD")
# Inject a custom header for a single request
client.inject_header("https://target.com/", "X-Forwarded-For", "PAYLOAD")
URL utilities
HttpClient.get_params("https://target.com/?a=1&b=2") # → ["a", "b"]
HttpClient.get_base_url("https://target.com/path?q=1") # → "https://target.com"
HttpClient.same_origin("https://target.com/a", "https://other.com/b") # → False
Rate limiting
Automatic 429 back-off with Retry-After header support. Up to 2 retries per request, 5-second default back-off.
# Handled transparently — no extra code needed
resp = client.get("https://target.com/api/")
crawler
Multi-threaded BFS crawler. Discovers pages, forms, and URL parameters within a target origin.
from commonhuman_core.http import HttpClient
from commonhuman_core.crawler import crawl, CrawlResult, FormTarget
client = HttpClient(delay=0.2)
result: CrawlResult = crawl(
"https://target.com/",
client,
max_pages=50,
max_depth=3,
threads=5,
same_origin=True,
exclude_patterns=[r"/logout", r"\.pdf$"],
)
result.visited_urls # list of all crawled URLs
result.form_targets # list of FormTarget — each a discovered HTML form
result.url_params # list of (url, [param_names]) for URLs with query params
result.page_sources # dict of {url: html} — raw page content
FormTarget carries everything needed to replay a form submission:
for form in result.form_targets:
print(form.method, form.action)
print(form.params) # {"username": "", "password": ""} — injectable fields
print(form.base_data) # {"csrf": "abc", "_submit": "Login"} — non-injectable
exclude_patterns accepts a list of regex strings. Any URL matching one is silently skipped before fetching.
passive
from commonhuman_core.passive import fetch_seed
from commonhuman_core.http import HttpClient
client = HttpClient()
resp = fetch_seed(client, "https://target.com/")
# Returns None on connection error or 4xx/5xx — safe to call without a try/except
if resp:
print(resp.text)
Useful for a single passive check before starting an active scan — confirms the target is reachable and returns a response worth analysing.
Subclassing for tool-specific methods
HttpClient is designed to be subclassed. stingxss adds XSS reflection probing on top:
from commonhuman_core.http import HttpClient
class Injector(HttpClient):
def probe_reflection(self, url, param, marker, method="GET"):
...
def probe_header_reflection(self, url, header_name, marker):
...
breachsql uses HttpClient directly — no subclass needed.
Design principles
- Transport only —
commonhuman-corehandles HTTP, crawling, and passive recon. Vulnerability detection, payload generation, and result analysis belong in the tools that use it. - One consistent interface — every injection method follows the same call shape:
(url, target, payload). No special cases per method type. - Threaded by default, deterministic when needed —
crawl()uses a thread pool; passthreads=1for single-threaded sequential crawling. - 100% branch coverage enforced —
pytest --covwithfail_under=100in CI. Every branch in every module is tested.
Tests
git clone https://github.com/commonhuman-lab/commonhuman-core.git
cd commonhuman-core
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest
pytest tests/unit/ # isolated unit tests only
pytest tests/regression/ # API surface contracts (requires stingxss + breachsql installed)
License
Licensed under the AGPLv3. You are free to use, modify, and distribute this software. If you run it as a service or distribute it, the source must remain open.
For commercial licensing, contact the author.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file commonhuman_core-0.1.0.tar.gz.
File metadata
- Download URL: commonhuman_core-0.1.0.tar.gz
- Upload date:
- Size: 27.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
412527b5eec7409973ccd4938ed33a6ffd1e1947629d8903c2819e37e2c4a146
|
|
| MD5 |
f4fe83a8884bb071be92546323fd5d76
|
|
| BLAKE2b-256 |
b69847eb99722e9d0481bd92c61c4931be2e6514525e59c87c1f5e74e7875bc3
|
File details
Details for the file commonhuman_core-0.1.0-py3-none-any.whl.
File metadata
- Download URL: commonhuman_core-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82f0a2000897f3e4e21e616df218cb4bb50e9c68f53f6fa07bd9b321790dc560
|
|
| MD5 |
809e9e8b861d8bb91456a05fa5218c09
|
|
| BLAKE2b-256 |
ffc5683dbab10abafdb89bbfb94d8c8192e2cde01f9cff9d9ac76631ece72294
|