Skip to main content

Shared HTTP, crawling, and scanning infrastructure for CommonHuman-Lab tools

Project description

commonhuman-core

Python PyPI License

Shared HTTP engine and web crawler for CommonHuman-Lab tools — session management, injection helpers, BFS crawling, and passive recon primitives. One place. No duplication.

pip install commonhuman-core

Why it exists

Every CommonHuman-Lab scanner needs to speak HTTP: proxy routing, cookie injection, rate-limit back-off, and injection helpers for query params, POST bodies, path segments, headers, and cookies. Every scanner also needs to crawl — BFS traversal, form discovery, same-origin enforcement.

commonhuman-core is the single source of truth for that layer. Tools that use it get:

  • Battle-tested session handling — automatic retry on connection errors, 429 back-off with Retry-After support, configurable per-request delay.
  • A complete injection toolkit — GET params, form POST, JSON POST, path segments (by index), cookies, and custom headers through one consistent interface.
  • BFS crawling with exclude patterns — multi-threaded, depth and page limits, HTML form extraction, URL parameter discovery, regex-based URL filtering.
  • A single place to improve — a new injection method or crawler feature lands in every tool at once.

Quick start

from commonhuman_core.http import HttpClient
from commonhuman_core.crawler import crawl, CrawlResult
from commonhuman_core.passive import fetch_seed

What's in it

Module Purpose
commonhuman_core.http.HttpClient HTTP session wrapper — proxy, cookies, SSL, retry, rate limiting, injection helpers
commonhuman_core.http.parse_cookie_string Parse name=value; ... or JSON cookie strings
commonhuman_core.http.parse_post_data Parse urlencoded or JSON POST bodies into a flat dict
commonhuman_core.crawler BFS web crawler — link + form discovery, page source storage
commonhuman_core.passive Passive recon helpers — fetch_seed()

Modules

http.HttpClient

Thin wrapper around requests.Session with everything a scanner needs built in.

from commonhuman_core.http import HttpClient

client = HttpClient(
    timeout=15,
    proxy="http://127.0.0.1:8080",
    headers={"X-Custom": "value"},
    cookies="session=abc; token=xyz",
    verify_ssl=False,
    delay=0.5,          # seconds between requests
)

Core HTTP

resp = client.get("https://target.com/search?q=test")
resp = client.post("https://target.com/login", data={"user": "admin"})
resp = client.head("https://target.com/")

print(client.request_count)  # total requests sent (including retries)
client.close()

Injection helpers

# Replace or add a query parameter
client.inject_get("https://target.com/search?q=original", "q", "PAYLOAD")
# → GET /search?q=PAYLOAD

# Inject into a form POST body
client.inject_post("https://target.com/login", "user", "PAYLOAD", base_data={"csrf": "tok"})
# → POST body: user=PAYLOAD&csrf=tok

# Inject into a JSON POST body
client.inject_post_json("https://target.com/api/search", "query", "PAYLOAD", base_data={"page": 1})
# → POST body: {"query": "PAYLOAD", "page": 1}

# Replace a path segment by index (0-based after splitting on "/")
client.inject_path("https://target.com/api/user/123", 3, "PAYLOAD")
# → GET /api/user/PAYLOAD

# Pass -1 to append a new trailing segment
client.inject_path("https://target.com/page", -1, "PAYLOAD")
# → GET /page/PAYLOAD

# Inject a cookie for a single request
client.inject_cookie("https://target.com/", "session", "PAYLOAD")

# Inject a custom header for a single request
client.inject_header("https://target.com/", "X-Forwarded-For", "PAYLOAD")

URL utilities

HttpClient.get_params("https://target.com/?a=1&b=2")           # → ["a", "b"]
HttpClient.get_base_url("https://target.com/path?q=1")         # → "https://target.com"
HttpClient.same_origin("https://target.com/a", "https://other.com/b")  # → False

Rate limiting

Automatic 429 back-off with Retry-After header support. Up to 2 retries per request, 5-second default back-off.

# Handled transparently — no extra code needed
resp = client.get("https://target.com/api/")

crawler

Multi-threaded BFS crawler. Discovers pages, forms, and URL parameters within a target origin.

from commonhuman_core.http import HttpClient
from commonhuman_core.crawler import crawl, CrawlResult, FormTarget

client = HttpClient(delay=0.2)
result: CrawlResult = crawl(
    "https://target.com/",
    client,
    max_pages=50,
    max_depth=3,
    threads=5,
    same_origin=True,
    exclude_patterns=[r"/logout", r"\.pdf$"],
)

result.visited_urls   # list of all crawled URLs
result.form_targets   # list of FormTarget — each a discovered HTML form
result.url_params     # list of (url, [param_names]) for URLs with query params
result.page_sources   # dict of {url: html} — raw page content

FormTarget carries everything needed to replay a form submission:

for form in result.form_targets:
    print(form.method, form.action)
    print(form.params)     # {"username": "", "password": ""} — injectable fields
    print(form.base_data)  # {"csrf": "abc", "_submit": "Login"} — non-injectable

exclude_patterns accepts a list of regex strings. Any URL matching one is silently skipped before fetching.


passive

from commonhuman_core.passive import fetch_seed
from commonhuman_core.http import HttpClient

client = HttpClient()
resp = fetch_seed(client, "https://target.com/")
# Returns None on connection error or 4xx/5xx — safe to call without a try/except
if resp:
    print(resp.text)

Useful for a single passive check before starting an active scan — confirms the target is reachable and returns a response worth analysing.


Subclassing for tool-specific methods

HttpClient is designed to be subclassed. stingxss adds XSS reflection probing on top:

from commonhuman_core.http import HttpClient

class Injector(HttpClient):
    def probe_reflection(self, url, param, marker, method="GET"):
        ...

    def probe_header_reflection(self, url, header_name, marker):
        ...

breachsql uses HttpClient directly — no subclass needed.


Design principles

  • Transport onlycommonhuman-core handles HTTP, crawling, and passive recon. Vulnerability detection, payload generation, and result analysis belong in the tools that use it.
  • One consistent interface — every injection method follows the same call shape: (url, target, payload). No special cases per method type.
  • Threaded by default, deterministic when neededcrawl() uses a thread pool; pass threads=1 for single-threaded sequential crawling.
  • 100% branch coverage enforcedpytest --cov with fail_under=100 in CI. Every branch in every module is tested.

Tests

git clone https://github.com/commonhuman-lab/commonhuman-core.git
cd commonhuman-core
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest
pytest tests/unit/        # isolated unit tests only
pytest tests/regression/  # API surface contracts (requires stingxss + breachsql installed)

License

Licensed under the AGPLv3. You are free to use, modify, and distribute this software. If you run it as a service or distribute it, the source must remain open.

For commercial licensing, contact the author.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

commonhuman_core-0.1.0.tar.gz (27.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

commonhuman_core-0.1.0-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file commonhuman_core-0.1.0.tar.gz.

File metadata

  • Download URL: commonhuman_core-0.1.0.tar.gz
  • Upload date:
  • Size: 27.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for commonhuman_core-0.1.0.tar.gz
Algorithm Hash digest
SHA256 412527b5eec7409973ccd4938ed33a6ffd1e1947629d8903c2819e37e2c4a146
MD5 f4fe83a8884bb071be92546323fd5d76
BLAKE2b-256 b69847eb99722e9d0481bd92c61c4931be2e6514525e59c87c1f5e74e7875bc3

See more details on using hashes here.

File details

Details for the file commonhuman_core-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for commonhuman_core-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 82f0a2000897f3e4e21e616df218cb4bb50e9c68f53f6fa07bd9b321790dc560
MD5 809e9e8b861d8bb91456a05fa5218c09
BLAKE2b-256 ffc5683dbab10abafdb89bbfb94d8c8192e2cde01f9cff9d9ac76631ece72294

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page