Pure Python JavaScript deobfuscator
Project description
PyJSClear
Pure Python JavaScript deobfuscator. Combines the functionality of obfuscator-io-deobfuscator (13 AST transforms for obfuscator.io output) and javascript-deobfuscator (hex escape decoding, static array unpacking, property access cleanup) into a single Python library with no Node.js dependency.
Installation
pip install pyjsclear
For development:
git clone https://github.com/intezer/PyJSClear.git
cd PyJSClear
pip install -e .
pip install pytest
Usage
Python API
from pyjsclear import deobfuscate, deobfuscate_file
# From a string
cleaned = deobfuscate(obfuscated_code)
# From a file
deobfuscate_file("input.js", "output.js")
# Or get the result as a string
cleaned = deobfuscate_file("input.js")
Command line
# File to stdout
pyjsclear input.js
# File to file
pyjsclear input.js -o output.js
# Stdin to stdout
cat input.js | pyjsclear -
# With custom iteration limit
pyjsclear input.js --max-iterations 20
What it does
PyJSClear applies 16 transforms in a multi-pass loop until no further changes are made (up to 50 iterations by default):
| # | Transform | Description |
|---|---|---|
| 1 | StringRevealer | Decode obfuscator.io string arrays (basic, base64, RC4), including rotation IIFEs, wrapper functions, multiple decoders per file, and SequenceExpression-wrapped rotation patterns |
| 2 | HexEscapes | Normalize \xHH/\uHHHH escape sequences in string literal AST nodes |
| 3 | UnusedVariableRemover | Remove variables with zero references |
| 4 | ConstantProp | Propagate constant literals to all reference sites |
| 5 | ReassignmentRemover | Eliminate redundant x = y reassignment chains |
| 6 | DeadBranchRemover | Remove unreachable if(true)/if(false) and ternary branches |
| 7 | ObjectPacker | Consolidate sequential obj.x = ... assignments into object literals |
| 8 | ProxyFunctionInliner | Inline single-return proxy functions at all call sites |
| 9 | SequenceSplitter | Split comma expressions (a(), b(), c()) into separate statements; extract (0, fn)(args) indirect call prefixes; normalize loop/if bodies to block statements |
| 10 | ExpressionSimplifier | Evaluate static expressions: 3 + 5 -> 8, ![] -> false, typeof undefined -> "undefined", test ? false : true -> !test |
| 11 | LogicalToIf | Convert a && b() / a || b() in statement position to if-statements |
| 12 | ControlFlowRecoverer | Recover linear code from "1|0|3".split("|") + while/switch dispatch patterns |
| 13 | PropertySimplifier | Convert obj["prop"] to obj.prop where valid |
| 14 | AntiTamperRemover | Remove self-defending and anti-debug IIFEs |
| 15 | ObjectSimplifier | Inline proxy object property accesses |
| 16 | StringRevealer | Second pass to catch strings exposed by earlier transforms |
Safety guarantees
- Never expands output: if the deobfuscated result is larger than the input, the original code is returned unchanged.
- Never crashes on valid JS: parse errors fall back to returning the original source. Transform exceptions are caught per-transform and skipped.
Testing
pytest tests/ # all tests
pytest tests/test_regression.py # regression suite (35 tests across 25 samples)
pytest tests/ -n auto # parallel execution (requires pytest-xdist)
Validated against six datasets totalling 47,836 files (full datasets, no sampling):
| Dataset | Files | Crashes | Expanded | Reduced | Source |
|---|---|---|---|---|---|
| E1 technique samples | 20 | 0 | 0 | 13 | JSIMPLIFIER |
| Kaggle Obfuscated | 1,477 | 0 | 0 | 1,199 | Kaggle |
| Kaggle NotObfuscated | 1,898 | 0 | 0 | 217 | Kaggle |
| MalJS (malware) | 23,212 | 0 | 0 | 3,193 | JSIMPLIFIER |
| BenignJS | 21,209 | 0 | 0 | 4,354 | JSIMPLIFIER |
| E1 original (clean) | 20 | 0 | 0 | 15 | JSIMPLIFIER |
Files >200KB or exceeding a 15-second wall-clock timeout are skipped and counted as unchanged (14,529 of MalJS, 940 of BenignJS). BenignJS reductions are genuine deobfuscation of obfuscated JS scraped from benign websites. A handful of Kaggle NotObfuscated files are mislabeled (genuinely obfuscated Angular test specs). E1 original reductions come from minor whitespace/formatting cleanup by the code generator.
Head-to-head vs Node.js tools (obfuscator-io-deobfuscator + javascript-deobfuscator pipeline):
On the Kaggle Obfuscated dataset (1,477 files), PyJSClear reduces 1,199 files while the Node.js pipeline changes zero — the dataset's lightweight obfuscation (hex escapes, basic string arrays without parseInt checksums) falls outside obfuscator-io-deobfuscator's detection heuristics. On the E1 and MalJS datasets (heavily obfuscated), PyJSClear produces smaller output on 93.8% of files where at least one tool changed output, driven by dead-code removal, proxy-function inlining, bracket-to-dot conversion, and control-flow recovery.
Parse coverage: PyJSClear uses esprima2 which supports ES2024 syntax, including arrow functions, optional chaining, nullish coalescing, and more.
Architecture
Built on esprima2 (ESTree-compatible JS parser with ES2024 support) with a custom code generator, AST traverser (enter/exit/replace/remove), and scope analysis. Transforms run in a fixed order within a convergence loop; StringRevealer runs both first and last to handle string arrays before and after other transforms modify wrapper function structure.
Limitations
- Large files (>100KB) with deep obfuscation can be slow due to the
multi-pass architecture. Consider using
max_iterationsto limit passes. - Not all obfuscator.io configurations are handled — some advanced string encoding patterns may not be fully decoded. Supported encodings: basic (index lookup), base64, RC4, and multi-decoder (multiple encoding types sharing one string array).
License
Apache License 2.0 — see LICENSE.
This project is a derivative work based on obfuscator-io-deobfuscator (Apache 2.0) and javascript-deobfuscator (Apache 2.0). See THIRD_PARTY_LICENSES.md and NOTICE for full attribution.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyjsclear-0.1.0.tar.gz.
File metadata
- Download URL: pyjsclear-0.1.0.tar.gz
- Upload date:
- Size: 67.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db503b0ab190a70442179a316d4408c53323c5b617ba74965c3323010e85a81d
|
|
| MD5 |
829b6ac5c23a41477dd783a6f23860de
|
|
| BLAKE2b-256 |
9226df893bff79077f750e13bacb12336e508b2b1a0445692b98d806e508f5f5
|
File details
Details for the file pyjsclear-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pyjsclear-0.1.0-py3-none-any.whl
- Upload date:
- Size: 68.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2105456c9a5e73b3a52dd9589dc5a1f128aea7bc8120333c410ab84864f1d66a
|
|
| MD5 |
78aa48fd1dd3207e37a5039186335694
|
|
| BLAKE2b-256 |
d2076d6b65e27b8caf370ce832d161e3f0b2eb92a3251b153720865e302791d7
|