ansferatu

Multifunctional tool for HTTP reconnaissance, web crawling and web directory bruteforce.

These details have not been verified by PyPI

Project links

Project description

Multifunctional tool for http reconnaissance, web crawling, web directory bruteforce. Based at PSpider

Killer features:

Fast multiurl crawling
Fast multiurl directory bruteforce
Find new domains without DNS bruteforce. (for example https://mail.ru --> 105 Domains of *.mail.ru)
To Do: dynamic creation dictionary for brute-force
To Do: deduplication based on Simhash
Headless browsing and forms fill-up as addtional option
To Do: add proper output to jsonl + html reports
To Do: Collect query parameters (for get and post)
To Do: better deduplication based on page hash

Installation

Ansferatu is a regular Python package. It requires Python 3.8+.

From PyPI:

pip3 install ansferatu

From source / GitHub:

pip3 install git+https://github.com/frostbits-security/ansferatu.git
# or, from a local checkout:
pip3 install .

Headless / form-filling support (optional). The --headless and --fill-forms modes rely on Playwright. Install the optional extra and download the Chromium runtime:

pip3 install 'ansferatu[headless]'
playwright install chromium

Installing the package exposes an ansferatu console command (equivalent to python3 -m ansferatu).

How to run

After installation, run via the ansferatu command:

ansferatu crawl --url https://mail.ru -o ./results/ --limit 1

Use as a library

The package can be imported into other Python tools:

from ansferatu import common_crawler, common_brute_from_file

common_crawler(
    url_list=["https://example.com"],
    scope=["example.com"],
    exclude_codes_list=[403, 404, 401],
    visit_count_limit=10,
    max_deep=2,
    threads=10,
    output_file="results.jsonl",
)

For lower-level control, build the spider directly:

from ansferatu.spider import WebSpider, TaskFetch

Docker

Build docker image:

docker build -t ansferatu .

Run the container (the image's entrypoint is the ansferatu command):

docker run --rm -it -v /tmp/ansferatu_out:/ansferatu/results ansferatu \
  crawl --url https://mail.ru -o /ansferatu/results/ --limit 1

Modes

crawl - run crawl for web sites. Main parameter is "visit_count_limit"

ansferatu crawl --url https://deti.mail.ru -o /home/sabotaged/BB/mail.ru/

crawl --headless - same crawl but with Playwright headless extraction for qualifying pages. Requires the headless extra: pip install 'ansferatu[headless]' && playwright install chromium.

ansferatu crawl --headless --url https://example.com -o ./results/

crawl --fill-forms - extends headless crawl with form detection and interaction. Detects <form> elements on pages, fills fields with smart defaults (email, password, search, etc.), submits forms and clicks buttons, then captures the resulting POST responses and new URLs. Implies --headless.

ansferatu crawl --fill-forms --url https://example.com -o ./results/

brute - classic web directories bruteforce. Needs wordlist.

ansferatu brute --url https://news.mail.ru -w ./wordlists/fuzz_big.txt -o /home/sabotaged/BB/mail.ru/

Modes task flow (queues and owners)

crawl puts start tasks into QueueFetch, then the queues are filled and drained by the workers shown below:

flowchart LR
  start([Start Task]) -->|set_start_task| qf[QueueFetch<br/>priority keys deep url repeat]
  qf --> fetchers[Fetchers<br/>multi-threading]
  fetchers -->|TaskExtract| qe[QueueExtract<br/>priority keys deep url content]
  fetchers -->|TaskHTMLHandle| qh[QueueHTMLHandle<br/>priority keys deep url content]
  qe --> extractor[Extractor]
  extractor -->|TaskFetch| qf
  qh --> html[HTML Handler]
  html -->|TaskSave if item| qs[QueueSave<br/>priority keys deep url item]
  qs --> saver[Saver]

  proxieser[Proxieser] -.->|optional| qp[QueueProxies]
  qp -.->|optional| fetchers

crawl --headless extends the regular crawl with a Playwright-based headless browser pipeline. Qualifying pages (decided by HeadlessCandidate) are routed to a single-threaded headless engine instead of the normal Extractor + HTML Handler path. The headless engine intercepts CDP network events to discover URLs and captures the fully-rendered page for the HTML Handler.

flowchart LR
  start([Start Task]) -->|set_start_task| qf[QueueFetch<br/>priority keys deep url repeat]
  qf --> fetchers[Fetchers<br/>multi-thread]

  fetchers -->|HeadlessCandidate?| decision{is<br/>candidate?}

  decision -->|No| qe[QueueExtract]
  decision -->|No| qh[QueueHTMLHandle]
  decision -->|Yes| qhl[QueueHeadless<br/>dedup: VisitLimit]

  qhl --> headless[HeadlessThread<br/>single thread<br/>Playwright + CDP]

  headless -->|intercepted URLs<br/>TaskFetch| qf
  headless -->|normalized page<br/>TaskHTMLHandle| qh

  qe --> extractor[Extractor]
  extractor -->|TaskFetch| qf

  qh --> html[HTML Handler<br/>_normalize_content]
  html -->|TaskSave| qs[QueueSave]
  qs --> saver[Saver]

Key points:

HeadlessCandidate decides which fetched pages qualify. Currently: root/index-like URLs (is_absolute) and HTML responses with status 200/301/302.
HeadlessExtractor (Playwright) uses lazy browser init on the worker thread to avoid thread-affinity issues. It hooks page.on("request") to capture all network URLs, then returns both discovered TaskFetch items and a TaskHTMLHandle with a normalized dict (status_code, url, html_text, headers, title, etc.).
CommonHTMLHandler accepts both requests.Response objects (regular path) and the normalized dict (headless path) via _normalize_content().
Deduplication: VisitLimit.check_headless_visited() prevents the same URL from being sent to headless twice. UrlFilter continues to deduplicate the fetch queue as usual.
When a fetched URL qualifies for headless, it skips the regular Extractor and HTML Handler; only the headless pipeline processes it.

crawl --fill-forms extends the headless pipeline with a two-phase form interaction system. Phase 1 (cheap): HeadlessExtractor calls FormDetector.detect(page) on the already-loaded page to produce universal form descriptors. Phase 2 (expensive, deferred): HeadlessFormInteractor picks up form tasks from a dedicated queue, opens the page in a separate browser, fills fields via FormFiller, submits, and captures results.

flowchart LR
  start([Start Task]) -->|set_start_task| qf[QueueFetch<br/>priority keys deep url repeat]
  qf --> fetchers[Fetchers<br/>multi-thread]

  fetchers -->|HeadlessCandidate?| decision{is<br/>candidate?}

  decision -->|No| qe[QueueExtract]
  decision -->|No| qh[QueueHTMLHandle]
  decision -->|Yes| qhl[QueueHeadless<br/>dedup: VisitLimit]

  qhl --> headless[HeadlessThread<br/>single thread<br/>Playwright + CDP]

  headless -->|intercepted URLs<br/>TaskFetch| qf
  headless -->|normalized page<br/>TaskHTMLHandle| qh
  headless -->|form descriptors<br/>TaskFormInteract| qfi[QueueFormInteract]

  qfi --> forminteract[FormInteractThread<br/>single thread<br/>separate Playwright browser]
  forminteract -->|POST response URLs<br/>TaskFetch| qf
  forminteract -->|POST response page<br/>TaskHTMLHandle| qh

  qe --> extractor[Extractor]
  extractor -->|TaskFetch| qf

  qh --> html[HTML Handler<br/>_normalize_content]
  html -->|TaskSave| qs[QueueSave]
  qs --> saver[Saver]

Key points for form interaction:

FormDetector scans the already-loaded page DOM for <form> elements. Pure detection, no extra navigation (~50ms overhead). Returns universal form descriptors.
Form descriptor schema: {form_selector, action, method, fields[], buttons[], page_url}. Designed to be self-contained so HeadlessFormInteractor needs no extra DOM inspection.
FormFiller maps input types/names to smart defaults (email, password, search, etc.). Supports custom value overrides via dict.
HeadlessFormInteractor runs in a dedicated thread with its own Playwright browser. It navigates to the page, fills fields, submits/clicks, and captures network traffic + the resulting page data. Results flow back through the normal URL_FETCH and HTM_HANDLE queues.
Budget cap: FormDetector.max_forms_per_page (default 5) and HeadlessFormInteractor.max_interactions_per_page prevent runaway on form-heavy pages.
The form interaction pipeline is fully independent from the headless extraction pipeline — separate queue, separate thread, separate browser instance.

brute skips extraction and only handles/save results from fetches:

flowchart LR
  start([Start Task]) -->|set_start_task| qf[QueueFetch<br/>priority keys deep url repeat]
  qf --> fetchers[Fetchers<br/>multi-threading]
  fetchers -->|TaskHTMLHandle| qh[QueueHTMLHandle<br/>priority keys deep url content]
  qh --> html[HTML Handler]
  html -->|TaskSave if item| qs[QueueSave<br/>priority keys deep url item]
  qs --> saver[Saver]

  proxieser[Proxieser] -.->|optional| qp[QueueProxies]
  qp -.->|optional| fetchers

How to change settings

Besides parsing the console arguments, ansferatu has a settings file for:

blacklist extentions for requests
blacklist extentions for parsing
HTTP request workers num
CPU consumed workers num
HTTP error_limit
limit of request to one host
HTTP request headers
ignored content-types for report
deduplication mode

The default file is stored in modules\settings\default_config.yaml

If you want to update settings, it's best to copy the file modules\settings\default_config.yaml to modules\settings\config.yaml and then edit config.yaml file.

How we avoid loops

checkRecursion() - check if something is going wrong and request start repeat the same path again and again, like: /blog/atricle/blog/article/... It is happening sometimes because of imperfection of extracting URLs process.

check_limits () - Check how many times we access to parent directory.
How it works. Let's use http://www.example.com/blog/articles/my_article_1.php as example.

We check how many times we visit http://www.example.com/blog/articles/
If it cross crawl_limit we mark this path as over_limit_pages.
We add +1 to crawl limit to upper path (http://www.example.com/blog/).
Go to step 1 (if this path also contains big amount of URLs we also would avoid this loop too)

Step by step at the last we ban visit this website, if all limits will be crossed.

How retries work

We have two types of error limit:

To retried URL
To add same URL in queue

Retries limit should be less than error limit.

When we got connection error with url we retried it before retries limit is over and leave this url for a while. Than we continue to add urls in queue (maybe it start answer after while) and if it still unavailable we ban it. But if url will answer we would reset the count.

Wappalazer role

Wappalazer work with app.json file. This file contains regexp database for search anything in server response. (cookies, headers, scripts, text in html, etc.)

The idea is use wappalazer’s regex engine for “bad place” searching:

All inputs

<input type="email">
<input type="password">
<input type="search">
<input type="submit">

SSRF

formcontrolname="url"

Submit buttons

<button class="aa" type="submit">Search</button>

File uploads

<input type="file">

Wappalazer could be used as simple vulnerability scanner:

Send specific request
Regexp search in server's answer.

Deduplication

Content length + word_count
Content length prediction (not fully tested)
To Do: Similarity check
- Check changes in HTML (search for new functions)

Development

Editable install (changes to the source are picked up immediately):

pip3 install -e '.[headless,dev]'

Run the test suite:

pytest

Building & publishing to PyPI

The project is configured with pyproject.toml (PEP 621). To build the distribution artifacts (source distribution + wheel):

pip3 install build
python3 -m build          # writes dist/ansferatu-<version>.tar.gz and .whl

Validate and upload with Twine:

pip3 install twine
twine check dist/*

# Test upload first (recommended): https://test.pypi.org
twine upload --repository testpypi dist/*

# Real upload
twine upload dist/*

Notes:

Bump version in pyproject.toml (and __version__ in ansferatu/__init__.py) before each release; PyPI rejects re-uploads of an existing version.
Uploading requires a PyPI account and an API token (configure it via ~/.pypirc or the TWINE_USERNAME=__token__ / TWINE_PASSWORD=<token> environment variables).
The package name ansferatu must be available on PyPI for the first upload.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Jun 23, 2026

This version

0.1.0

Jun 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ansferatu-0.1.0.tar.gz (170.0 kB view details)

Uploaded Jun 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ansferatu-0.1.0-py3-none-any.whl (175.0 kB view details)

Uploaded Jun 23, 2026 Python 3

File details

Details for the file ansferatu-0.1.0.tar.gz.

File metadata

Download URL: ansferatu-0.1.0.tar.gz
Upload date: Jun 23, 2026
Size: 170.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for ansferatu-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3f47ae4b105a959c392abf88eeb8f4f957358f735c4d24bf1f49117988d7184f`
MD5	`d80d2f5a98805fb0bddc633774719de4`
BLAKE2b-256	`043a28cc84d1c6a8c33c6b34a10e1b0bb39c4ce918b9bcb560bbd1a81e85e874`

See more details on using hashes here.

File details

Details for the file ansferatu-0.1.0-py3-none-any.whl.

File metadata

Download URL: ansferatu-0.1.0-py3-none-any.whl
Upload date: Jun 23, 2026
Size: 175.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for ansferatu-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0bf42960b1a1edefb013e1d083168866d39a5dc0bc074a37b39a748ae0af6b44`
MD5	`c2c8a7ba8c1ce58b7eeadc970f76bd83`
BLAKE2b-256	`596efd173e1053b149c0b018391db7aa5bb8ef859ee741ebd7bc8741866d8714`

See more details on using hashes here.

ansferatu 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

How to run

Use as a library

Docker

Modes

Modes task flow (queues and owners)

How to change settings

How we avoid loops

How retries work

Wappalazer role

Deduplication

Development

Building & publishing to PyPI

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes