Skip to main content

A Graph-Based ROP Gadget Finder for every architecture

Reason this release was yanked:

hella worst model

Project description

LCSAJdump Logo

LCSAJdump

PyPI Downloads

Universal Graph-Based Framework for Automated Gadget Discovery

Status License: MIT



LCSAJdump is a static analysis framework designed to discover Return-Oriented Programming (ROP) and Jump-Oriented Programming (JOP) gadgets. Unlike traditional scanners, LCSAJdump is architecture-agnostic and employs a graph-based approach to uncover vulnerabilities invisible to common linear tools.


Why LCSAJdump?

Common ROP scanners use a linear "sliding-window" approach over the binary's executable bytes. This method systematically fails to identify Shadow Gadgets: execution chains that traverse non-contiguous memory blocks connected by unconditional jumps or conditional branches.

LCSAJdump overcomes this limitation by reconstructing the Control-Flow Graph (CFG) through LCSAJ (Linear Code Sequence and Jump) analysis. By modeling the binary as a directed graph of basic blocks, the tool identifies:

  1. Contiguous Gadgets: Standard linear sequences terminating in a control-flow transfer.
  2. Shadow Gadgets (Non-Contiguous): Complex chains that bypass "bad bytes" (e.g., null bytes) by utilizing instructions that would otherwise be unreachable via linear scanning.

Key Features

  • Multi-Architecture Support: Native support for RISC-V (64GC), x86-64, and ARM64, easily extendable to other architectures via modular profiles.
  • Graph-Based Analysis: Segments the .text section into LCSAJ basic blocks and reconstructs flow relationships using NetworkX.
  • Rainbow BFS Algorithm: Proprietary backward Breadth-First Search starting from control-flow sinks. Now features a qualitative feedback loop (penalty_threshold darkness) and Hard-Cap limits to prevent state explosion and ensure ultra-fast analysis even on dense CISC binaries.
  • Lazy Graph Build: Graph construction retains only nodes reachable from gadget tails within --depth hops, drastically reducing memory and build time on large binaries (e.g., libc) while producing identical results.
  • Two-Stage Ranking Engine: Combines a hyper-fast heuristic baseline (Bayesian-optimized via Optuna) with a deep-learning LightGBM ML model that refines gadget quality using structural and semantic features.
  • Zero-Overhead Inference: The ML model is integrated natively and runs by default, processing tens of thousands of nodes in seconds. It acts as a highly effective filter, rejecting noisy jumps and returning clean, highly controllable gadget chains. Hosted on Hugging Face.
  • Pruning Parameters: Configurable "Darkness" factor to balance analysis depth and performance, preventing infinite loops in cyclic graphs.

Supported Architectures

(see Benchmarks).

LCSAJdump is designed to be universal. Currently supported:

  • RISC-V 64-bit (RV64GC): Full support for compressed 16-bit instructions.
  • x86-64: Handles variable-length overlapping instructions. Safely navigates dense graphs without memory explosion.
  • ARM64: Handles 32-bit instructions and deeply filters out bloated gadgets via strict heuristic penalties.
  • Other Architectures: Can be easily implemented by defining new profiles in config.py.

Installation

Via Pip (Recommended)

pip install lcsajdump

From Source (Development)

git clone [https://github.com/Chris1sFlaggin/LCSAJdump.git](https://github.com/Chris1sFlaggin/LCSAJdump.git)
cd LCSAJdump
pip install -r requirements.txt

Usage

LCSAJdump offers a powerful CLI for precise binary analysis:

Standard Analysis (Default RISC-V):

python LCSAJdump.py <path_to_binary>

Advanced Analysis (Specifying Architecture and Output File):

lcsajdump -a riscv64 -d 15 -k 10 -l 20 -o gadgets.txt <path_to_binary>

Export as JSON with bad-char filter:

lcsajdump -a x86_64 -d 20 -k 5 -b "000a0d" --json -o gadgets.json <path_to_binary>

Note: Use -o after --json to save JSON to file. Without --json, -o saves plain text.

Save plain text output:

lcsajdump -a riscv64 -d 15 -k 10 -l 20 -o gadgets.txt <path_to_binary>

Analyze all executable sections:

lcsajdump --all-exec -d 25 -k 10 -l 30 <path_to_binary>

Force strictly algorithmic ranking (bypass ML):

lcsajdump --algo <path_to_binary>

CLI Options

Flag Type Default Description
-a, --arch TEXT auto Target architecture (auto, riscv64, x86_64, arm64). Auto-detected from ELF header.
-d, --depth INTEGER 20 Max search depth in LCSAJ blocks. Controls chain length.
-k, --darkness INTEGER 5 Pruning threshold — max visits per node. Higher = more gadgets, slower scan.
-l, --limit INTEGER 10 Max number of gadgets to display in the output.
-s, --min-score INTEGER 0 Minimum heuristic score for a gadget to appear in results.
-i, --instructions INTEGER 15 Max number of instructions contained in a single LCSAJ node.
-v, --verbose FLAG Enable verbose output for detailed per-gadget results.
-o, --output PATH Write output to file. Plain text by default; use with --json for JSON output.
-b, --bad-chars TEXT Hex bytes to filter from gadget addresses (e.g. "000a0d").
--json FLAG Output gadgets as structured JSON. Combine with -o to save to file.
--all-exec FLAG Analyze all executable sections, not just .text.
-al, --algo FLAG Use strictly the algorithmic ranking (bypass ML).
--version FLAG Show the installed version and exit.
--help FLAG Show help message and exit.


Accuracy & Benchmarks

LCSAJdump is backed by a rigorous, incrementally validated test suite located in the benchmarkTests/ directory.

Through 15 major iterations of semantic feature engineering, the hybrid model has learned to discriminate gadgets based on actual memory side-effects (extracted via angr symbolic execution) rather than purely syntactic heuristics.

When evaluated on monolithic, real-world executables like libc.so.6, the engine achieves a mathematically near-perfect NDCG@1 = 0.8549 and NDCG@5 = 0.8374. The Two-Stage engine successfully prioritizes clean stack-popping sequences and ret2csu-like calls, while heavily penalizing crash-prone fixed-offset jumps that deceive traditional static scanners.


Developer & ML Guide

The repository is structured to support both end-users and ML researchers.

  • Production Engine: The core CLI seamlessly integrates the inference engine using models hosted on Hugging Face, requiring no manual model loading.
  • ML Pipeline: The lcsajdump/ml_study/ directory contains the complete pipeline used to train the models:
    • build_dataset.py: Extracts structural and semantic features from a corpus of CTF binaries.
    • train_model.py: Trains the LightGBM LambdaRank model and outputs the .pkl models.
    • kfold_cv.py: Validates the dataset using K-Fold Cross Validation.

Contributing (Open for Forks!)

The framework is open to new implementations. To add a new architecture:

  1. Fork the repository.
  2. Open lcsajdump/core/config.py.
  3. Add a new profile to the ARCH_PROFILES dictionary, defining jump mnemonics, return mnemonics, and registers for the desired architecture (e.g., x86_64).
  4. Submit a Pull Request.

License

This project is released under the MIT license. See the LICENSE file for details.


Project Link

Visit the project web page: LCSAJdump web page


Made by Chris1sflaggin as a research project for Bachelor's Thesis.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lcsajdump-2.1.0.tar.gz (69.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lcsajdump-2.1.0-py3-none-any.whl (73.8 kB view details)

Uploaded Python 3

File details

Details for the file lcsajdump-2.1.0.tar.gz.

File metadata

  • Download URL: lcsajdump-2.1.0.tar.gz
  • Upload date:
  • Size: 69.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for lcsajdump-2.1.0.tar.gz
Algorithm Hash digest
SHA256 5112aac72de1f46d6263f57991866be2f431422328376930612acd785d2b5260
MD5 3b988a4eaa3e39bab6bbec11a0f1d2b2
BLAKE2b-256 ce7ba11ba1f268ca05651fbb79fe6558294d09ccdcad9d1af15a06c13d3ba73f

See more details on using hashes here.

File details

Details for the file lcsajdump-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: lcsajdump-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 73.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for lcsajdump-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e3392408ae4dd2e52776e7f198c073b60e89b6d00469a648786e9e247995a8fe
MD5 f2350869d517eeaaa9e98875b1147d99
BLAKE2b-256 be4389f9fbeb87efd27f460186972e63596dd7d1b104ad8aa74f2b1106a57c5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page