CrawlerX - The Ultimate Web Crawler
Project description
CrawlerX โ Advanced Web Reconnaissance Crawler
CrawlerX is an advanced, multi-threaded web reconnaissance crawler built for security researchers, bug bounty hunters, and penetration testers. It focuses on deep endpoint discovery, GET/POST parameter extraction, API detection, resource categorization, and intelligent URL validation, with optional fuzzing and common path probing.
โจ Developed by @IMApurbo ๐ก๏ธ Use only on systems you own or have explicit permission to test.
๐ Core Capabilities
๐ Intelligent Crawling
-
Discovers HTML, JavaScript, CSS, JSON, and XML endpoints
-
Extracts URLs from:
- HTML attributes
- Inline JavaScript
- Event handlers
- CSS files
- JSON / API responses
-
Strong filtering to eliminate JavaScript noise & false positives
๐ Endpoint Discovery
- All endpoints
- Parameterized URLs
- Non-parameterized URLs
- Automatic domain + subdomain validation
Saved under:
endpoints/
โโโ all_endpoints.txt
โโโ all_endpoints.json
โโโ parameterized.txt
โโโ parameterized.json
โโโ non_parameterized.txt
๐งช GET & POST Parameter Extraction
-
Extracts parameters from HTML forms
-
Generates realistic default values
-
Saves:
- Parsed parameters (JSON)
- Raw
.reqfiles (Burp-ready)
get/
โโโ get_urls.txt
โโโ get_params.json
โโโ *.req
post/
โโโ post_urls.txt
โโโ post_params.json
โโโ *.req
โ๏ธ API Endpoint Detection
Detects common API patterns:
/api//v1/,/v2//rest//graphql.json,.xml
api/
โโโ api_endpoints.txt
โโโ api_endpoints.json
๐ Resource Categorization
Automatically classifies discovered resources:
- Images
- JavaScript
- Stylesheets
- Fonts
- Media
- Documents
- Other
resources/
โโโ images.txt / images.json
โโโ scripts.txt / scripts.json
โโโ stylesheets.txt / stylesheets.json
โโโ fonts.txt / fonts.json
โโโ media.txt / media.json
โโโ documents.txt / documents.json
โโโ other.txt / other.json
๐ณ Site Structure Mapping
Optional ASCII tree view of the entire site:
example.com
โโโ login
โโโ dashboard
โ โโโ profile
โ โโโ settings
โโโ api
โโโ v1
Saved to:
structure/structure.txt
๐ง Smart Features
- Robots.txt support (optional)
- Resume interrupted crawls using pickle state
- Incremental auto-save
- Graceful Ctrl+C handling
- Parameter fuzzing (numeric params)
- Common path probing (admin, api, backup, .env, etc.)
- Retry logic with backoff
- Verbose mode with interesting parameter highlighting
๐ฆ Installation
pip install crawlerx
๐งโ๐ป Usage
crawlerx -u <url> [options]
๐งพ Command-Line Options
| Flag | Description | Default |
|---|---|---|
-u, --url |
Target URL (required) | โ |
-o, --output |
Output directory | None |
--threads |
Concurrent threads (1โ20) | 5 |
--depth |
Crawl depth | 2 |
--delay |
Delay between requests | 0.1s |
--timeout |
Request timeout | 10s |
--ua |
Custom User-Agent | Browser UA |
-H, --headers |
Custom headers (Key:Value;) |
None |
--proxy |
HTTP/HTTPS proxy | None |
--exclude |
Excluded extensions | None |
--sub |
Include subdomains | False |
--structure |
Generate site structure | False |
--respect-robots |
Respect robots.txt | False |
--fuzz-params |
Fuzz numeric parameters | False |
--common-paths |
Probe common paths | False |
--cont |
Resume from crawl state | None |
--verbose |
Verbose logging | False |
๐งช Examples
Basic Crawl
crawlerx -u https://example.com
Save Output
crawlerx -u https://example.com -o results
Deep Crawl with Threads
crawlerx -u https://example.com --depth 4 --threads 10
Enable Fuzzing & Common Paths
crawlerx -u https://example.com --fuzz-params --common-paths
Resume Interrupted Crawl
crawlerx -u https://example.com --cont results/crawlerx_example.com/crawl_state.pkl
Generate Site Structure
crawlerx -u https://example.com --structure
๐ Output Directory Layout
crawlerx_<domain>/
โโโ endpoints/
โโโ get/
โโโ post/
โโโ api/
โโโ resources/
โโโ structure/
โโโ crawl_state.pkl
โ ๏ธ Legal Notice
๐จ Authorized use only This tool is intended for legal security testing. Unauthorized scanning may violate laws and ethical guidelines.
๐จโ๐ป Author
- IMApurbo
๐ License
Licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crawlerx-1.1.1-py3-none-any.whl.
File metadata
- Download URL: crawlerx-1.1.1-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c35350935ed3a35bf5a53d86d9686fdbe92b5cb18c87903c755ec6b93f47f39
|
|
| MD5 |
5226542afa04a0ffa5d9037e306eba4a
|
|
| BLAKE2b-256 |
0571f7373d60aa6f43cb92d8a58c9c01d0f30e0c3975e98f79b9307abce28cad
|