Skip to main content

CrawlerX - The Ultimate Web Crawler

Project description

CrawlerX โ€“ Advanced Web Reconnaissance Crawler

CrawlerX is an advanced, multi-threaded web reconnaissance crawler built for security researchers, bug bounty hunters, and penetration testers. It focuses on deep endpoint discovery, GET/POST parameter extraction, API detection, resource categorization, and intelligent URL validation, with optional fuzzing and common path probing.

โœจ Developed by @IMApurbo ๐Ÿ›ก๏ธ Use only on systems you own or have explicit permission to test.


๐Ÿš€ Core Capabilities

๐Ÿ” Intelligent Crawling

  • Discovers HTML, JavaScript, CSS, JSON, and XML endpoints

  • Extracts URLs from:

    • HTML attributes
    • Inline JavaScript
    • Event handlers
    • CSS files
    • JSON / API responses
  • Strong filtering to eliminate JavaScript noise & false positives


๐Ÿ”— Endpoint Discovery

  • All endpoints
  • Parameterized URLs
  • Non-parameterized URLs
  • Automatic domain + subdomain validation

Saved under:

endpoints/
โ”œโ”€โ”€ all_endpoints.txt
โ”œโ”€โ”€ all_endpoints.json
โ”œโ”€โ”€ parameterized.txt
โ”œโ”€โ”€ parameterized.json
โ”œโ”€โ”€ non_parameterized.txt

๐Ÿงช GET & POST Parameter Extraction

  • Extracts parameters from HTML forms

  • Generates realistic default values

  • Saves:

    • Parsed parameters (JSON)
    • Raw .req files (Burp-ready)
get/
โ”œโ”€โ”€ get_urls.txt
โ”œโ”€โ”€ get_params.json
โ”œโ”€โ”€ *.req

post/
โ”œโ”€โ”€ post_urls.txt
โ”œโ”€โ”€ post_params.json
โ”œโ”€โ”€ *.req

โš™๏ธ API Endpoint Detection

Detects common API patterns:

  • /api/
  • /v1/, /v2/
  • /rest/
  • /graphql
  • .json, .xml
api/
โ”œโ”€โ”€ api_endpoints.txt
โ”œโ”€โ”€ api_endpoints.json

๐Ÿ“ Resource Categorization

Automatically classifies discovered resources:

  • Images
  • JavaScript
  • Stylesheets
  • Fonts
  • Media
  • Documents
  • Other
resources/
โ”œโ”€โ”€ images.txt / images.json
โ”œโ”€โ”€ scripts.txt / scripts.json
โ”œโ”€โ”€ stylesheets.txt / stylesheets.json
โ”œโ”€โ”€ fonts.txt / fonts.json
โ”œโ”€โ”€ media.txt / media.json
โ”œโ”€โ”€ documents.txt / documents.json
โ”œโ”€โ”€ other.txt / other.json

๐ŸŒณ Site Structure Mapping

Optional ASCII tree view of the entire site:

example.com
โ”œโ”€โ”€ login
โ”œโ”€โ”€ dashboard
โ”‚   โ”œโ”€โ”€ profile
โ”‚   โ””โ”€โ”€ settings
โ””โ”€โ”€ api
    โ””โ”€โ”€ v1

Saved to:

structure/structure.txt

๐Ÿง  Smart Features

  • Robots.txt support (optional)
  • Resume interrupted crawls using pickle state
  • Incremental auto-save
  • Graceful Ctrl+C handling
  • Parameter fuzzing (numeric params)
  • Common path probing (admin, api, backup, .env, etc.)
  • Retry logic with backoff
  • Verbose mode with interesting parameter highlighting

๐Ÿ“ฆ Installation

pip install crawlerx

๐Ÿง‘โ€๐Ÿ’ป Usage

crawlerx -u <url> [options]

๐Ÿงพ Command-Line Options

Flag Description Default
-u, --url Target URL (required) โ€”
-o, --output Output directory None
--threads Concurrent threads (1โ€“20) 5
--depth Crawl depth 2
--delay Delay between requests 0.1s
--timeout Request timeout 10s
--ua Custom User-Agent Browser UA
-H, --headers Custom headers (Key:Value;) None
--proxy HTTP/HTTPS proxy None
--exclude Excluded extensions None
--sub Include subdomains False
--structure Generate site structure False
--respect-robots Respect robots.txt False
--fuzz-params Fuzz numeric parameters False
--common-paths Probe common paths False
--cont Resume from crawl state None
--verbose Verbose logging False

๐Ÿงช Examples

Basic Crawl

crawlerx -u https://example.com

Save Output

crawlerx -u https://example.com -o results

Deep Crawl with Threads

crawlerx -u https://example.com --depth 4 --threads 10

Enable Fuzzing & Common Paths

crawlerx -u https://example.com --fuzz-params --common-paths

Resume Interrupted Crawl

crawlerx -u https://example.com --cont results/crawlerx_example.com/crawl_state.pkl

Generate Site Structure

crawlerx -u https://example.com --structure

๐Ÿ“‚ Output Directory Layout

crawlerx_<domain>/
โ”œโ”€โ”€ endpoints/
โ”œโ”€โ”€ get/
โ”œโ”€โ”€ post/
โ”œโ”€โ”€ api/
โ”œโ”€โ”€ resources/
โ”œโ”€โ”€ structure/
โ””โ”€โ”€ crawl_state.pkl

โš ๏ธ Legal Notice

๐Ÿšจ Authorized use only This tool is intended for legal security testing. Unauthorized scanning may violate laws and ethical guidelines.


๐Ÿ‘จโ€๐Ÿ’ป Author

  • IMApurbo

๐Ÿ“œ License

Licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawlerx-1.1.1-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file crawlerx-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: crawlerx-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for crawlerx-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7c35350935ed3a35bf5a53d86d9686fdbe92b5cb18c87903c755ec6b93f47f39
MD5 5226542afa04a0ffa5d9037e306eba4a
BLAKE2b-256 0571f7373d60aa6f43cb92d8a58c9c01d0f30e0c3975e98f79b9307abce28cad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page