Skip to main content

Pure Python JavaScript deobfuscator

Project description

PyJSClear

PyJSClear

Pure Python JavaScript deobfuscator. Combines the functionality of obfuscator-io-deobfuscator (13 AST transforms for obfuscator.io output) and javascript-deobfuscator (hex escape decoding, static array unpacking, property access cleanup) into a single Python library with no Node.js dependency.

Installation

pip install pyjsclear

For development:

git clone https://github.com/intezer/PyJSClear.git
cd PyJSClear
pip install -e .
pip install pytest

Usage

Python API

from pyjsclear import deobfuscate, deobfuscate_file

# From a string
cleaned = deobfuscate(obfuscated_code)

# From a file
deobfuscate_file("input.js", "output.js")

# Or get the result as a string
cleaned = deobfuscate_file("input.js")

Command line

# File to stdout
pyjsclear input.js

# File to file
pyjsclear input.js -o output.js

# Stdin to stdout
cat input.js | pyjsclear -

# With custom iteration limit
pyjsclear input.js --max-iterations 20

What it does

PyJSClear applies 16 transforms in a multi-pass loop until no further changes are made (up to 50 iterations by default):

# Transform Description
1 StringRevealer Decode obfuscator.io string arrays (basic, base64, RC4), including rotation IIFEs, wrapper functions, multiple decoders per file, and SequenceExpression-wrapped rotation patterns
2 HexEscapes Normalize \xHH/\uHHHH escape sequences in string literal AST nodes
3 UnusedVariableRemover Remove variables with zero references
4 ConstantProp Propagate constant literals to all reference sites
5 ReassignmentRemover Eliminate redundant x = y reassignment chains
6 DeadBranchRemover Remove unreachable if(true)/if(false) and ternary branches
7 ObjectPacker Consolidate sequential obj.x = ... assignments into object literals
8 ProxyFunctionInliner Inline single-return proxy functions at all call sites
9 SequenceSplitter Split comma expressions (a(), b(), c()) into separate statements; extract (0, fn)(args) indirect call prefixes; normalize loop/if bodies to block statements
10 ExpressionSimplifier Evaluate static expressions: 3 + 5 -> 8, ![] -> false, typeof undefined -> "undefined", test ? false : true -> !test
11 LogicalToIf Convert a && b() / a || b() in statement position to if-statements
12 ControlFlowRecoverer Recover linear code from "1|0|3".split("|") + while/switch dispatch patterns
13 PropertySimplifier Convert obj["prop"] to obj.prop where valid
14 AntiTamperRemover Remove self-defending and anti-debug IIFEs
15 ObjectSimplifier Inline proxy object property accesses
16 StringRevealer Second pass to catch strings exposed by earlier transforms

Safety guarantees

  • Never expands output: if the deobfuscated result is larger than the input, the original code is returned unchanged.
  • Never crashes on valid JS: parse errors fall back to returning the original source. Transform exceptions are caught per-transform and skipped.

Testing

pytest tests/                           # all tests
pytest tests/test_regression.py         # regression suite (35 tests across 25 samples)
pytest tests/ -n auto                   # parallel execution (requires pytest-xdist)

Validated against six datasets totalling 47,836 files (full datasets, no sampling):

Dataset Files Crashes Expanded Reduced Source
E1 technique samples 20 0 0 13 JSIMPLIFIER
Kaggle Obfuscated 1,477 0 0 1,199 Kaggle
Kaggle NotObfuscated 1,898 0 0 217 Kaggle
MalJS (malware) 23,212 0 0 3,193 JSIMPLIFIER
BenignJS 21,209 0 0 4,354 JSIMPLIFIER
E1 original (clean) 20 0 0 15 JSIMPLIFIER

Files >200KB or exceeding a 15-second wall-clock timeout are skipped and counted as unchanged (14,529 of MalJS, 940 of BenignJS). BenignJS reductions are genuine deobfuscation of obfuscated JS scraped from benign websites. A handful of Kaggle NotObfuscated files are mislabeled (genuinely obfuscated Angular test specs). E1 original reductions come from minor whitespace/formatting cleanup by the code generator.

Head-to-head vs Node.js tools (obfuscator-io-deobfuscator + javascript-deobfuscator pipeline):

On the Kaggle Obfuscated dataset (1,477 files), PyJSClear reduces 1,199 files while the Node.js pipeline changes zero — the dataset's lightweight obfuscation (hex escapes, basic string arrays without parseInt checksums) falls outside obfuscator-io-deobfuscator's detection heuristics. On the E1 and MalJS datasets (heavily obfuscated), PyJSClear produces smaller output on 93.8% of files where at least one tool changed output, driven by dead-code removal, proxy-function inlining, bracket-to-dot conversion, and control-flow recovery.

Parse coverage: PyJSClear uses esprima2 which supports ES2024 syntax, including arrow functions, optional chaining, nullish coalescing, and more.

Architecture

Built on esprima2 (ESTree-compatible JS parser with ES2024 support) with a custom code generator, AST traverser (enter/exit/replace/remove), and scope analysis. Transforms run in a fixed order within a convergence loop; StringRevealer runs both first and last to handle string arrays before and after other transforms modify wrapper function structure.

Limitations

  • Large files (>100KB) with deep obfuscation can be slow due to the multi-pass architecture. Consider using max_iterations to limit passes.
  • Not all obfuscator.io configurations are handled — some advanced string encoding patterns may not be fully decoded. Supported encodings: basic (index lookup), base64, RC4, and multi-decoder (multiple encoding types sharing one string array).

License

Apache License 2.0 — see LICENSE.

This project is a derivative work based on obfuscator-io-deobfuscator (Apache 2.0) and javascript-deobfuscator (Apache 2.0). See THIRD_PARTY_LICENSES.md and NOTICE for full attribution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyjsclear-0.1.0.tar.gz (67.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyjsclear-0.1.0-py3-none-any.whl (68.7 kB view details)

Uploaded Python 3

File details

Details for the file pyjsclear-0.1.0.tar.gz.

File metadata

  • Download URL: pyjsclear-0.1.0.tar.gz
  • Upload date:
  • Size: 67.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for pyjsclear-0.1.0.tar.gz
Algorithm Hash digest
SHA256 db503b0ab190a70442179a316d4408c53323c5b617ba74965c3323010e85a81d
MD5 829b6ac5c23a41477dd783a6f23860de
BLAKE2b-256 9226df893bff79077f750e13bacb12336e508b2b1a0445692b98d806e508f5f5

See more details on using hashes here.

File details

Details for the file pyjsclear-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyjsclear-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 68.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for pyjsclear-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2105456c9a5e73b3a52dd9589dc5a1f128aea7bc8120333c410ab84864f1d66a
MD5 78aa48fd1dd3207e37a5039186335694
BLAKE2b-256 d2076d6b65e27b8caf370ce832d161e3f0b2eb92a3251b153720865e302791d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page