Skip to main content

JavaScript extractor and regex scanner

Project description

JavaScript Extractor

A fast and flexible JavaScript extraction and regex scanning tool for security research, bug bounty hunting, and web application analysis.

jsxtractor crawls JavaScript files from a target website, applies configurable YAML-based regex groups, and exports structured extraction results.


Features

  • ✅ Extract JavaScript files from webpages
  • ✅ Supports Playwright browser mode
  • ✅ Supports authenticated crawling with persistent sessions
  • ✅ YAML-based regex pattern groups
  • ✅ Match API endpoints, secrets, tokens, keys, URLs, and custom patterns
  • ✅ JSON export support
  • ✅ Relative + absolute JavaScript URL handling
  • ✅ Headless and browser automation modes
  • ✅ Verbose/debug logging
  • ✅ Custom extraction context (--before, --after)
  • ✅ Custom Playwright storage state support
  • ✅ Reusable Python API
  • ✅ Installable via pip

Installation

Install from PyPI

pip install jsxtractor

Install Playwright browser

playwright install chromium

Usage

Basic Scan

jsxtractor https://example.com

Using Named Arguments

jsxtractor -u https://example.com

Using Custom Regex Group Directory

jsxtractor -u https://example.com -g ./groups

Enable Verbose Logging

jsxtractor -u https://example.com -v

Browser Mode

jsxtractor -u https://example.com --browser

Authentication / Login Mode

Interactive login mode allows authenticated JavaScript extraction using Playwright persistent sessions.

Login Example

jsxtractor -u https://example.com \
    --browser \
    --login \
    --login-url https://example.com/login \
    --login-success-indicator Logout

Force Re-login

jsxtractor -u https://example.com \
    --browser \
    --login \
    --force-relogin

Custom Storage State File

jsxtractor -u https://example.com \
    --browser \
    --login \
    --storage-state ./states/admin.json

Match Extraction Context

Show surrounding content around matches.

Example

jsxtractor -u https://example.com \
    --before 50 \
    --after 50

Regex Group Configuration

Regex groups are defined using YAML files.

Example Structure

js-extractor:
  name: Group Name
  patterns:
    - regex: "pattern1"
      description: "What this pattern matches"

    - regex: "pattern2"
      description: "Another pattern"

Predefined Regex Groups

A collection of community-maintained regex groups is available here:

The repository contains predefined regex groups for:

  • API endpoints and URLs
  • Environment variables and frontend configuration
  • Secrets, tokens, and sensitive text
  • postMessage communication patterns
  • Dependency confusion related modules/libraries
  • Additional JavaScript reconnaissance patterns
  • And more

Example Usage

jsxtractor -u https://example.com \
    -g ./js-extractor-groups

Clone Regex Groups Repository

git clone https://github.com/rdzsp/js-extractor-groups.git

Example Endpoint Patterns

js-extractor:
  name: Endpoints / URLs
  patterns:
    - regex: >-
        `((?:\/(?:v3|v4|v5|ads)(?:\/[^`\/\s]+)+(?:\/)?))`
      description: Template literal API endpoints

    - regex: >-
        "((?:\/(?:v3|v4|v5|ads)(?:\/[^"\/\s]+)+(?:\/)?))"
      description: Double-quoted API endpoints

    - regex: >-
        '((?:\/(?:v3|v4|v5|ads)(?:\/[^'\/\s]+)+(?:\/)?))'
      description: Single-quoted API endpoints

Output Format

Results are exported as JSON.

Example

[
  {
    "value": "/v3/users/get",
    "group_name": "Endpoints / URLs",
    "description": "Template literal API endpoints",
    "url": "https://example.com/assets/app.js"
  }
]

Default output file:

extraction_results.json

Command-Line Arguments

Argument Description
-u, --url Target URL
-g, --group Regex group directory
-b, --browser Enable Playwright browser mode
-l, --login Enable interactive login
-lu, --login-url Login page URL
-lsi, --login-success-indicator Successful login indicator text
-fr, --force-relogin Force new login session
-ss, --storage-state Custom Playwright storage state file
-af, --after Characters after match
-be, --before Characters before match
-o, --output Output JSON file
-v, --verbose Enable verbose logging
-t, --timeout Request timeout

Python API Usage

from jsxtractor.extractor import extract

results = extract(
    target_url="https://example.com",
    browser=True,
    verbose=True
)

print(results)

Project Structure

js-extractor/
├── pyproject.toml
├── README.md
├── src/
│   └── jsxtractor/
│       ├── __init__.py
│       ├── __main__.py
│       ├── cli.py
│       ├── extractor.py
│       ├── auth.py
│       └── utils.py

Troubleshooting

Install Playwright Browsers

playwright install chromium

Enable Browser Mode for Login

Authentication requires browser mode:

jsxtractor -u https://example.com \
    --browser \
    --login

No Display Detected

Interactive login requires a graphical display.

Linux users may need:

  • X11
  • Wayland
  • Desktop environment
  • X forwarding

Security Notice

Use responsibly and only on systems you own or are authorized to test.

This project is intended for:

  • Security research
  • Bug bounty hunting
  • Web application analysis
  • Authorized penetration testing

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsxtractor-1.0.1.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jsxtractor-1.0.1-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file jsxtractor-1.0.1.tar.gz.

File metadata

  • Download URL: jsxtractor-1.0.1.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for jsxtractor-1.0.1.tar.gz
Algorithm Hash digest
SHA256 80817133cdd41c45fac06d6d10d204032595a21eacd9e05d12502abfdf31b314
MD5 9514a7231617adcac6ca9c95d3c5dfc5
BLAKE2b-256 6c0903477d99a6dd8e47973a7eb33ce2ac8a09f7f37815892cb7970c68013be9

See more details on using hashes here.

File details

Details for the file jsxtractor-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: jsxtractor-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for jsxtractor-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 51ef814963b436f9f04b61ba2790422e279273e905ca4cbff453b946525b0847
MD5 0ce29d72e6807cf5bdfc694853c16ce3
BLAKE2b-256 798022ccacf9dace8ae56ddeffdd598c72279341e890a351f586ba23fc9995fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page