JavaScript extractor and regex scanner
Project description
JavaScript Extractor
A fast and flexible JavaScript extraction and regex scanning tool for security research, bug bounty hunting, and web application analysis.
jsxtractor crawls JavaScript files from a target website, applies configurable YAML-based regex groups, and exports structured extraction results.
Features
- ✅ Extract JavaScript files from webpages
- ✅ Supports Playwright browser mode
- ✅ Supports authenticated crawling with persistent sessions
- ✅ YAML-based regex pattern groups
- ✅ Match API endpoints, secrets, tokens, keys, URLs, and custom patterns
- ✅ JSON export support
- ✅ Relative + absolute JavaScript URL handling
- ✅ Headless and browser automation modes
- ✅ Verbose/debug logging
- ✅ Custom extraction context (
--before,--after) - ✅ Custom Playwright storage state support
- ✅ Reusable Python API
- ✅ Installable via
pip
Installation
Install from PyPI
pip install jsxtractor
Install Playwright browser
playwright install chromium
Usage
Basic Scan
jsxtractor https://example.com
Using Named Arguments
jsxtractor -u https://example.com
Using Custom Regex Group Directory
jsxtractor -u https://example.com -g ./groups
Enable Verbose Logging
jsxtractor -u https://example.com -v
Browser Mode
jsxtractor -u https://example.com --browser
Authentication / Login Mode
Interactive login mode allows authenticated JavaScript extraction using Playwright persistent sessions.
Login Example
jsxtractor -u https://example.com \
--browser \
--login \
--login-url https://example.com/login \
--login-success-indicator Logout
Force Re-login
jsxtractor -u https://example.com \
--browser \
--login \
--force-relogin
Custom Storage State File
jsxtractor -u https://example.com \
--browser \
--login \
--storage-state ./states/admin.json
Match Extraction Context
Show surrounding content around matches.
Example
jsxtractor -u https://example.com \
--before 50 \
--after 50
Regex Group Configuration
Regex groups are defined using YAML files.
Example Structure
js-extractor:
name: Group Name
patterns:
- regex: "pattern1"
description: "What this pattern matches"
- regex: "pattern2"
description: "Another pattern"
Predefined Regex Groups
A collection of community-maintained regex groups is available here:
The repository contains predefined regex groups for:
- API endpoints and URLs
- Environment variables and frontend configuration
- Secrets, tokens, and sensitive text
postMessagecommunication patterns- Dependency confusion related modules/libraries
- Additional JavaScript reconnaissance patterns
- And more
Example Usage
jsxtractor -u https://example.com \
-g ./js-extractor-groups
Clone Regex Groups Repository
git clone https://github.com/rdzsp/js-extractor-groups.git
Example Endpoint Patterns
js-extractor:
name: Endpoints / URLs
patterns:
- regex: >-
`((?:\/(?:v3|v4|v5|ads)(?:\/[^`\/\s]+)+(?:\/)?))`
description: Template literal API endpoints
- regex: >-
"((?:\/(?:v3|v4|v5|ads)(?:\/[^"\/\s]+)+(?:\/)?))"
description: Double-quoted API endpoints
- regex: >-
'((?:\/(?:v3|v4|v5|ads)(?:\/[^'\/\s]+)+(?:\/)?))'
description: Single-quoted API endpoints
Output Format
Results are exported as JSON.
Example
[
{
"value": "/v3/users/get",
"group_name": "Endpoints / URLs",
"description": "Template literal API endpoints",
"url": "https://example.com/assets/app.js"
}
]
Default output file:
extraction_results.json
Command-Line Arguments
| Argument | Description |
|---|---|
-u, --url |
Target URL |
-g, --group |
Regex group directory |
-b, --browser |
Enable Playwright browser mode |
-l, --login |
Enable interactive login |
-lu, --login-url |
Login page URL |
-lsi, --login-success-indicator |
Successful login indicator text |
-fr, --force-relogin |
Force new login session |
-ss, --storage-state |
Custom Playwright storage state file |
-af, --after |
Characters after match |
-be, --before |
Characters before match |
-o, --output |
Output JSON file |
-v, --verbose |
Enable verbose logging |
-t, --timeout |
Request timeout |
Python API Usage
from jsxtractor.extractor import extract
results = extract(
target_url="https://example.com",
browser=True,
verbose=True
)
print(results)
Project Structure
js-extractor/
├── pyproject.toml
├── README.md
├── src/
│ └── jsxtractor/
│ ├── __init__.py
│ ├── __main__.py
│ ├── cli.py
│ ├── extractor.py
│ ├── auth.py
│ └── utils.py
Troubleshooting
Install Playwright Browsers
playwright install chromium
Enable Browser Mode for Login
Authentication requires browser mode:
jsxtractor -u https://example.com \
--browser \
--login
No Display Detected
Interactive login requires a graphical display.
Linux users may need:
- X11
- Wayland
- Desktop environment
- X forwarding
Security Notice
Use responsibly and only on systems you own or are authorized to test.
This project is intended for:
- Security research
- Bug bounty hunting
- Web application analysis
- Authorized penetration testing
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jsxtractor-1.0.1.tar.gz.
File metadata
- Download URL: jsxtractor-1.0.1.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80817133cdd41c45fac06d6d10d204032595a21eacd9e05d12502abfdf31b314
|
|
| MD5 |
9514a7231617adcac6ca9c95d3c5dfc5
|
|
| BLAKE2b-256 |
6c0903477d99a6dd8e47973a7eb33ce2ac8a09f7f37815892cb7970c68013be9
|
File details
Details for the file jsxtractor-1.0.1-py3-none-any.whl.
File metadata
- Download URL: jsxtractor-1.0.1-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51ef814963b436f9f04b61ba2790422e279273e905ca4cbff453b946525b0847
|
|
| MD5 |
0ce29d72e6807cf5bdfc694853c16ce3
|
|
| BLAKE2b-256 |
798022ccacf9dace8ae56ddeffdd598c72279341e890a351f586ba23fc9995fd
|