CrawlerX - The Ultimate Web Crawler

These details have not been verified by PyPI

Project links

Homepage

Project description

CrawlerX – Advanced Web Reconnaissance Crawler

CrawlerX is an advanced, multi-threaded web reconnaissance crawler built for security researchers, bug bounty hunters, and penetration testers. It focuses on deep endpoint discovery, GET/POST parameter extraction, API detection, resource categorization, and intelligent URL validation, with optional fuzzing and common path probing.

✨ Developed by @IMApurbo 🛡️ Use only on systems you own or have explicit permission to test.

🚀 Core Capabilities

🔍 Intelligent Crawling

Discovers HTML, JavaScript, CSS, JSON, and XML endpoints
Extracts URLs from:
- HTML attributes
- Inline JavaScript
- Event handlers
- CSS files
- JSON / API responses
Strong filtering to eliminate JavaScript noise & false positives

🔗 Endpoint Discovery

All endpoints
Parameterized URLs
Non-parameterized URLs
Automatic domain + subdomain validation

Saved under:

endpoints/
├── all_endpoints.txt
├── all_endpoints.json
├── parameterized.txt
├── parameterized.json
├── non_parameterized.txt

🧪 GET & POST Parameter Extraction

Extracts parameters from HTML forms
Generates realistic default values
Saves:
- Parsed parameters (JSON)
- Raw .req files (Burp-ready)

get/
├── get_urls.txt
├── get_params.json
├── *.req

post/
├── post_urls.txt
├── post_params.json
├── *.req

⚙️ API Endpoint Detection

Detects common API patterns:

/api/
/v1/, /v2/
/rest/
/graphql
.json, .xml

api/
├── api_endpoints.txt
├── api_endpoints.json

📁 Resource Categorization

Automatically classifies discovered resources:

Images
JavaScript
Stylesheets
Fonts
Media
Documents
Other

resources/
├── images.txt / images.json
├── scripts.txt / scripts.json
├── stylesheets.txt / stylesheets.json
├── fonts.txt / fonts.json
├── media.txt / media.json
├── documents.txt / documents.json
├── other.txt / other.json

🌳 Site Structure Mapping

Optional ASCII tree view of the entire site:

example.com
├── login
├── dashboard
│   ├── profile
│   └── settings
└── api
    └── v1

Saved to:

structure/structure.txt

🧠 Smart Features

Robots.txt support (optional)
Resume interrupted crawls using pickle state
Incremental auto-save
Graceful Ctrl+C handling
Parameter fuzzing (numeric params)
Common path probing (admin, api, backup, .env, etc.)
Retry logic with backoff
Verbose mode with interesting parameter highlighting

📦 Installation

pip install crawlerx

🧑‍💻 Usage

crawlerx -u <url> [options]

🧾 Command-Line Options

Flag	Description	Default
`-u, --url`	Target URL (required)	—
`-o, --output`	Output directory	None
`--threads`	Concurrent threads (1–20)	5
`--depth`	Crawl depth	2
`--delay`	Delay between requests	0.1s
`--timeout`	Request timeout	10s
`--ua`	Custom User-Agent	Browser UA
`-H, --headers`	Custom headers (`Key:Value;`)	None
`--proxy`	HTTP/HTTPS proxy	None
`--exclude`	Excluded extensions	None
`--sub`	Include subdomains	False
`--structure`	Generate site structure	False
`--respect-robots`	Respect robots.txt	False
`--fuzz-params`	Fuzz numeric parameters	False
`--common-paths`	Probe common paths	False
`--cont`	Resume from crawl state	None
`--verbose`	Verbose logging	False

🧪 Examples

Basic Crawl

crawlerx -u https://example.com

Save Output

crawlerx -u https://example.com -o results

Deep Crawl with Threads

crawlerx -u https://example.com --depth 4 --threads 10

Enable Fuzzing & Common Paths

crawlerx -u https://example.com --fuzz-params --common-paths

Resume Interrupted Crawl

crawlerx -u https://example.com --cont results/crawlerx_example.com/crawl_state.pkl

Generate Site Structure

crawlerx -u https://example.com --structure

📂 Output Directory Layout

crawlerx_<domain>/
├── endpoints/
├── get/
├── post/
├── api/
├── resources/
├── structure/
└── crawl_state.pkl

⚠️ Legal Notice

🚨 Authorized use only This tool is intended for legal security testing. Unauthorized scanning may violate laws and ethical guidelines.

👨‍💻 Author

IMApurbo

📜 License

Licensed under the MIT License. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.1.1

Dec 13, 2025

1.1.0

Jun 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crawlerx-1.1.1-py3-none-any.whl (15.0 kB view details)

Uploaded Dec 13, 2025 Python 3

File details

Details for the file crawlerx-1.1.1-py3-none-any.whl.

File metadata

Download URL: crawlerx-1.1.1-py3-none-any.whl
Upload date: Dec 13, 2025
Size: 15.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for crawlerx-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c35350935ed3a35bf5a53d86d9686fdbe92b5cb18c87903c755ec6b93f47f39`
MD5	`5226542afa04a0ffa5d9037e306eba4a`
BLAKE2b-256	`0571f7373d60aa6f43cb92d8a58c9c01d0f30e0c3975e98f79b9307abce28cad`

See more details on using hashes here.

crawlerx 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CrawlerX – Advanced Web Reconnaissance Crawler

🚀 Core Capabilities

🔍 Intelligent Crawling

🔗 Endpoint Discovery

🧪 GET & POST Parameter Extraction

⚙️ API Endpoint Detection

📁 Resource Categorization

🌳 Site Structure Mapping

🧠 Smart Features

📦 Installation

🧑‍💻 Usage

🧾 Command-Line Options

🧪 Examples

Basic Crawl

Save Output

Deep Crawl with Threads

Enable Fuzzing & Common Paths

Resume Interrupted Crawl

Generate Site Structure

📂 Output Directory Layout

⚠️ Legal Notice

👨‍💻 Author

📜 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes