JavaScript extractor and regex scanner

These details have not been verified by PyPI

Project links

Project description

JavaScript Extractor

A fast and flexible JavaScript extraction and regex scanning tool for security research, bug bounty hunting, and web application analysis.

jsxtractor crawls JavaScript files from a target website, applies configurable YAML-based regex groups, and exports structured extraction results.

Features

✅ Extract JavaScript files from webpages
✅ Supports Playwright browser mode
✅ Supports authenticated crawling with persistent sessions
✅ YAML-based regex pattern groups
✅ Match API endpoints, secrets, tokens, keys, URLs, and custom patterns
✅ JSON export support
✅ Relative + absolute JavaScript URL handling
✅ Headless and browser automation modes
✅ Verbose/debug logging
✅ Custom extraction context (--before, --after)
✅ Custom Playwright storage state support
✅ Reusable Python API
✅ Installable via pip

Installation

Install from PyPI

pip install jsxtractor

Install Playwright browser

playwright install chromium

Usage

Basic Scan

jsxtractor https://example.com

Using Named Arguments

jsxtractor -u https://example.com

Using Custom Regex Group Directory

jsxtractor -u https://example.com -g ./groups

Enable Verbose Logging

jsxtractor -u https://example.com -v

Browser Mode

jsxtractor -u https://example.com --browser

Authentication / Login Mode

Interactive login mode allows authenticated JavaScript extraction using Playwright persistent sessions.

Login Example

jsxtractor -u https://example.com \
    --browser \
    --login \
    --login-url https://example.com/login \
    --login-success-indicator Logout

Force Re-login

jsxtractor -u https://example.com \
    --browser \
    --login \
    --force-relogin

Custom Storage State File

jsxtractor -u https://example.com \
    --browser \
    --login \
    --storage-state ./states/admin.json

Match Extraction Context

Show surrounding content around matches.

Example

jsxtractor -u https://example.com \
    --before 50 \
    --after 50

Regex Group Configuration

Regex groups are defined using YAML files.

Example Structure

js-extractor:
  name: Group Name
  patterns:
    - regex: "pattern1"
      description: "What this pattern matches"

    - regex: "pattern2"
      description: "Another pattern"

Predefined Regex Groups

A collection of community-maintained regex groups is available here:

js-extractor-groups

The repository contains predefined regex groups for:

API endpoints and URLs
Environment variables and frontend configuration
Secrets, tokens, and sensitive text
postMessage communication patterns
Dependency confusion related modules/libraries
Additional JavaScript reconnaissance patterns
And more

Example Usage

jsxtractor -u https://example.com \
    -g ./js-extractor-groups

Clone Regex Groups Repository

git clone https://github.com/rdzsp/js-extractor-groups.git

Example Endpoint Patterns

js-extractor:
  name: Endpoints / URLs
  patterns:
    - regex: >-
        `((?:\/(?:v3|v4|v5|ads)(?:\/[^`\/\s]+)+(?:\/)?))`
      description: Template literal API endpoints

    - regex: >-
        "((?:\/(?:v3|v4|v5|ads)(?:\/[^"\/\s]+)+(?:\/)?))"
      description: Double-quoted API endpoints

    - regex: >-
        '((?:\/(?:v3|v4|v5|ads)(?:\/[^'\/\s]+)+(?:\/)?))'
      description: Single-quoted API endpoints

Output Format

Results are exported as JSON.

Example

[
  {
    "value": "/v3/users/get",
    "group_name": "Endpoints / URLs",
    "description": "Template literal API endpoints",
    "url": "https://example.com/assets/app.js"
  }
]

Default output file:

extraction_results.json

Command-Line Arguments

Argument	Description
`-u`, `--url`	Target URL
`-g`, `--group`	Regex group directory
`-b`, `--browser`	Enable Playwright browser mode
`-l`, `--login`	Enable interactive login
`-lu`, `--login-url`	Login page URL
`-lsi`, `--login-success-indicator`	Successful login indicator text
`-fr`, `--force-relogin`	Force new login session
`-ss`, `--storage-state`	Custom Playwright storage state file
`-af`, `--after`	Characters after match
`-be`, `--before`	Characters before match
`-o`, `--output`	Output JSON file
`-v`, `--verbose`	Enable verbose logging
`-t`, `--timeout`	Request timeout

Python API Usage

from jsxtractor.extractor import extract

results = extract(
    target_url="https://example.com",
    browser=True,
    verbose=True
)

print(results)

Project Structure

js-extractor/
├── pyproject.toml
├── README.md
├── src/
│   └── jsxtractor/
│       ├── __init__.py
│       ├── __main__.py
│       ├── cli.py
│       ├── extractor.py
│       ├── auth.py
│       └── utils.py

Troubleshooting

Install Playwright Browsers

playwright install chromium

Enable Browser Mode for Login

Authentication requires browser mode:

jsxtractor -u https://example.com \
    --browser \
    --login

No Display Detected

Interactive login requires a graphical display.

Linux users may need:

X11
Wayland
Desktop environment
X forwarding

Security Notice

Use responsibly and only on systems you own or are authorized to test.

This project is intended for:

Security research
Bug bounty hunting
Web application analysis
Authorized penetration testing

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

May 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsxtractor-1.0.1.tar.gz (13.6 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jsxtractor-1.0.1-py3-none-any.whl (12.5 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file jsxtractor-1.0.1.tar.gz.

File metadata

Download URL: jsxtractor-1.0.1.tar.gz
Upload date: May 12, 2026
Size: 13.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for jsxtractor-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`80817133cdd41c45fac06d6d10d204032595a21eacd9e05d12502abfdf31b314`
MD5	`9514a7231617adcac6ca9c95d3c5dfc5`
BLAKE2b-256	`6c0903477d99a6dd8e47973a7eb33ce2ac8a09f7f37815892cb7970c68013be9`

See more details on using hashes here.

File details

Details for the file jsxtractor-1.0.1-py3-none-any.whl.

File metadata

Download URL: jsxtractor-1.0.1-py3-none-any.whl
Upload date: May 12, 2026
Size: 12.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for jsxtractor-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`51ef814963b436f9f04b61ba2790422e279273e905ca4cbff453b946525b0847`
MD5	`0ce29d72e6807cf5bdfc694853c16ce3`
BLAKE2b-256	`798022ccacf9dace8ae56ddeffdd598c72279341e890a351f586ba23fc9995fd`

See more details on using hashes here.

jsxtractor 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

JavaScript Extractor

Features

Installation

Install from PyPI

Install Playwright browser

Usage

Basic Scan

Using Named Arguments

Using Custom Regex Group Directory

Enable Verbose Logging

Browser Mode

Authentication / Login Mode

Login Example

Force Re-login

Custom Storage State File

Match Extraction Context

Example

Regex Group Configuration

Example Structure

Predefined Regex Groups

Example Usage

Clone Regex Groups Repository

Example Endpoint Patterns

Output Format

Example

Command-Line Arguments

Python API Usage

Project Structure

Troubleshooting

Install Playwright Browsers

Enable Browser Mode for Login

No Display Detected

Security Notice

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes