Skip to main content

A Graph-Based ROP Gadget Finder for every architecture

Project description

LCSAJdump Logo

LCSAJdump

PyPI Downloads

Universal Graph-Based Framework for Automated Gadget Discovery

Status License: MIT



LCSAJdump is a static analysis framework designed to discover Return-Oriented Programming (ROP) and Jump-Oriented Programming (JOP) gadgets. Unlike traditional scanners, LCSAJdump is architecture-agnostic and employs a graph-based approach to uncover vulnerabilities invisible to common linear tools.


Why LCSAJdump?

Common ROP scanners use a linear "sliding-window" approach over the binary's executable bytes. This method systematically fails to identify Shadow Gadgets: execution chains that traverse non-contiguous memory blocks connected by unconditional jumps or conditional branches.

LCSAJdump overcomes this limitation by reconstructing the Control-Flow Graph (CFG) through LCSAJ (Linear Code Sequence and Jump) analysis. By modeling the binary as a directed graph of basic blocks, the tool identifies:

  1. Contiguous Gadgets: Standard linear sequences terminating in a control-flow transfer.
  2. Shadow Gadgets (Non-Contiguous): Complex chains that bypass "bad bytes" (e.g., null bytes) by utilizing instructions that would otherwise be unreachable via linear scanning.

Key Features

  • Multi-Architecture Support: Native support for RISC-V (64GC), x86-64, and ARM64, easily extendable to other architectures via modular profiles.
  • Graph-Based Analysis: Segments the .text section into LCSAJ basic blocks and reconstructs flow relationships using NetworkX.
  • Rainbow BFS Algorithm: Proprietary backward Breadth-First Search starting from control-flow sinks. Now features an O(1) Early-Drop Uniqueness Filter and Hard-Cap Instruction Limits to prevent state explosion and ensure ultra-fast analysis even on dense CISC binaries.
  • Lazy Graph Build: Graph construction retains only nodes reachable from gadget tails within --depth hops, drastically reducing memory and build time on large binaries (e.g., libc) while producing identical results.
  • Two-Stage Ranking Engine: Combines a hyper-fast heuristic baseline (Bayesian-optimized via Optuna) with a deep-learning LightGBM ML model that refines gadget quality using structural and semantic features.
  • Zero-Overhead Inference: The ML model is integrated natively and runs by default, processing tens of thousands of nodes in seconds. It acts as a highly effective filter, rejecting noisy jumps and returning clean, highly controllable gadget chains. Hosted on Hugging Face.
  • Pruning Parameters: Configurable "Darkness" factor to balance analysis depth and performance, preventing infinite loops in cyclic graphs.

Supported Architectures

(see Benchmarks).

LCSAJdump is designed to be universal. Currently supported:

  • RISC-V 64-bit (RV64GC): Full support for compressed 16-bit instructions.
  • x86-64: Handles variable-length overlapping instructions. Safely navigates dense graphs without memory explosion.
  • ARM64: Handles 32-bit instructions and deeply filters out bloated gadgets via strict heuristic penalties.
  • Other Architectures: Can be easily implemented by defining new profiles in config.py.

Installation

Via Pip (Recommended)

pip install lcsajdump

From Source (Development)

git clone [https://github.com/Chris1sFlaggin/LCSAJdump.git](https://github.com/Chris1sFlaggin/LCSAJdump.git)
cd LCSAJdump
pip install -r requirements.txt

Usage

LCSAJdump offers a powerful CLI for precise binary analysis:

Standard Analysis (Default RISC-V):

python LCSAJdump.py <path_to_binary>

Advanced Analysis (Specifying Architecture and Output File):

lcsajdump -a riscv64 -d 15 -k 10 -l 20 -o gadgets.txt <path_to_binary>

Export as JSON with bad-char filter:

lcsajdump -a x86_64 -d 20 -k 5 -b "000a0d" --json -o gadgets.json <path_to_binary>

Note: Use -o after --json to save JSON to file. Without --json, -o saves plain text.

Save plain text output:

lcsajdump -a riscv64 -d 15 -k 10 -l 20 -o gadgets.txt <path_to_binary>

Analyze all executable sections:

lcsajdump --all-exec -d 25 -k 10 -l 30 <path_to_binary>

Force strictly algorithmic ranking (bypass ML):

lcsajdump --algo <path_to_binary>

CLI Options

Flag Type Default Description
-a, --arch TEXT auto Target architecture (auto, riscv64, x86_64, arm64). Auto-detected from ELF header.
-d, --depth INTEGER 20 Max search depth in LCSAJ blocks. Controls chain length.
-k, --darkness INTEGER 5 Pruning threshold — max visits per node. Higher = more gadgets, slower scan.
-l, --limit INTEGER 10 Max number of gadgets to display in the output.
-s, --min-score INTEGER 0 Minimum heuristic score for a gadget to appear in results.
-i, --instructions INTEGER 15 Max number of instructions contained in a single LCSAJ node.
-v, --verbose FLAG Enable verbose output for detailed per-gadget results.
-o, --output PATH Write output to file. Plain text by default; use with --json for JSON output.
-b, --bad-chars TEXT Hex bytes to filter from gadget addresses (e.g. "000a0d").
--json FLAG Output gadgets as structured JSON. Combine with -o to save to file.
--all-exec FLAG Analyze all executable sections, not just .text.
-al, --algo FLAG Use strictly the algorithmic ranking (bypass ML).
--version FLAG Show the installed version and exit.
--help FLAG Show help message and exit.


📊 Accuracy & Benchmarks

LCSAJdump is backed by a rigorous, incrementally validated test suite located in the benchmarkTests/ directory.

Through 14 major iterations of semantic feature engineering, the hybrid model has learned to discriminate gadgets based on actual memory side-effects (extracted via angr symbolic execution) rather than purely syntactic heuristics.

When evaluated on monolithic, real-world executables like libc.so.6, the engine achieves a mathematically near-perfect NDCG@1 = 0.9833 and NDCG@10 = 0.9656. The Two-Stage engine successfully prioritizes clean stack-popping sequences and ret2csu-like calls, while heavily penalizing crash-prone fixed-offset jumps that deceive traditional static scanners.


🧠 Developer & ML Guide

The repository is structured to support both end-users and ML researchers.

  • Production Engine: The core CLI seamlessly integrates the inference engine using models hosted on Hugging Face, requiring no manual model loading.
  • ML Pipeline: The lcsajdump/ml_study/ directory contains the complete pipeline used to train the models:
    • build_dataset.py: Extracts structural and semantic features from a corpus of CTF binaries.
    • train_model.py: Trains the LightGBM LambdaRank model and outputs the .pkl models.
    • kfold_cv.py: Validates the dataset using K-Fold Cross Validation.

Contributing (Open for Forks!)

The framework is open to new implementations. To add a new architecture:

  1. Fork the repository.
  2. Open lcsajdump/core/config.py.
  3. Add a new profile to the ARCH_PROFILES dictionary, defining jump mnemonics, return mnemonics, and registers for the desired architecture (e.g., x86_64).
  4. Submit a Pull Request.

License

This project is released under the MIT license. See the LICENSE file for details.


Project Link

Visit the project web page: LCSAJdump web page


Made by Chris1sflaggin as a research project for Automated Gadget Discovery.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lcsajdump-2.0.0.tar.gz (66.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lcsajdump-2.0.0-py3-none-any.whl (71.7 kB view details)

Uploaded Python 3

File details

Details for the file lcsajdump-2.0.0.tar.gz.

File metadata

  • Download URL: lcsajdump-2.0.0.tar.gz
  • Upload date:
  • Size: 66.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for lcsajdump-2.0.0.tar.gz
Algorithm Hash digest
SHA256 70b6c71256b4b6afa6412d53647e87b94660ab66bf8975aa34b82162968d1350
MD5 d905af68a804eef16b5b73f9921ac0f2
BLAKE2b-256 9d2429293d36657fcedba5ddf830c782f74096b87dc30afd3dd7f969839b3c2a

See more details on using hashes here.

File details

Details for the file lcsajdump-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: lcsajdump-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 71.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for lcsajdump-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 56e1aac8768198474d85183f2f92b91e7bbe7f9e361babd3f06e98cf5081cfeb
MD5 681e32ac3202b2d96b9912672ce6844c
BLAKE2b-256 b6034283e313b39693252f95141e41cfdb545d02c7924761197ab86655e72cf8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page