Skip to main content

A fast and accurate disassembler

Project description

Datalog Disassembly

DDisasm is a fast disassembler which is accurate enough for the resulting assembly code to be reassembled. DDisasm is implemented using the datalog (souffle) declarative logic programming language to compile disassembly rules and heuristics. The disassembler first parses ELF/PE file information and decodes a superset of possible instructions to create an initial set of datalog facts. These facts are analyzed to identify code location, symbolization, and function boundaries. The results of this analysis, a refined set of datalog facts, are then translated to the GTIRB intermediate representation for binary analysis and reverse engineering. The GTIRB pretty printer may then be used to pretty print the GTIRB to reassemblable assembly code.

Binary Support

Binary formats:

  • ELF (Linux)
  • PE (Windows)

Instruction Set Architectures (ISAs):

  • x86_32
  • x86_64
  • ARM32
  • ARM64
  • MIPS32

Getting Started

You can run a prebuilt version of Ddisasm using Docker:

docker pull grammatech/ddisasm:latest

Ddisasm can be used to disassemble a binary into the GTIRB representation. We can try it with one of the examples included in the repository.

First, start the Ddisasm docker container:

docker run -v $PWD/examples:/examples -it grammatech/ddisasm:latest

Within the Docker container, let us build one of the examples:

apt update && apt install gcc -y
cd /examples/ex1
gcc ex.c -o ex

Now we can proceed to disassemble the binary:

ddisasm ex --ir ex.gtirb

Once you have the GTIRB representation, you can make programmatic changes to the binary using GTIRB or gtirb-rewriting.

Then, you can use gtirb-pprinter (included in the Docker image) to produce a new version of the binary:

gtirb-pprinter ex.gtirb -b ex_rewritten

Internally, gtirb-pprinter will generate an assembly file and invoke the compiler/assembler (e.g. gcc) to produce a new binary. gtirb-pprinter will take care or generating all the necessary command line options to generate a new binary, including compilation options, library dependencies, or version linker scripts.

You can also use gtirb-pprinter to generate an assembly listing for manual modification:

gtirb-pprinter ex.gtirb --asm ex.s

This assembly listing can then be manually recompiled:

gcc -nostartfiles ex.s -o ex_rewritten

Please take a look at our documentation for additional information.

Documentation

Contributing

See CONTRIBUTING.md

External Contributors

  • Programming Language Group, The University of Sydney: Initial support for ARM64.
  • Github user gogo2464: Documentation refactoring.

Cite

  1. Datalog Disassembly
@inproceedings {flores-montoya2020,
    author = {Antonio Flores-Montoya and Eric Schulte},
    title = {Datalog Disassembly},
    booktitle = {29th USENIX Security Symposium (USENIX Security 20)},
    year = {2020},
    isbn = {978-1-939133-17-5},
    pages = {1075--1092},
    url = {https://www.usenix.org/conference/usenixsecurity20/presentation/flores-montoya},
    publisher = {USENIX Association},
    month = aug,
}
  1. GTIRB
@misc{schulte2020gtirb,
    title={GTIRB: Intermediate Representation for Binaries},
    author={Eric Schulte and Jonathan Dorn and Antonio Flores-Montoya and Aaron Ballman and Tom Johnson},
    year={2020},
    eprint={1907.02859},
    archivePrefix={arXiv},
    primaryClass={cs.PL}
}
  1. Ddisasm WIS
@INPROCEEDINGS{11023516,
  author={Flores-Montoya, Antonio and Lim, Junghee and Seitz, Adam and Sood, Akshay and Raff, Edward and Holt, James},
  booktitle={2025 IEEE Symposium on Security and Privacy (SP)},
  title={Disassembly as Weighted Interval Scheduling with Learned Weights},
  year={2025},
  volume={},
  number={},
  pages={3033-3050},
  keywords={Measurement;Privacy;Accuracy;Heuristic algorithms;Reverse engineering;Binary codes;Benchmark testing;Scheduling;Inference algorithms;Security;disassembly;reverse engineering;learning;binary analysis},
  doi={10.1109/SP61157.2025.00192}}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ddisasm-1.9.3-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (26.1 MB view details)

Uploaded Python 3manylinux: glibc 2.12+ x86-64

File details

Details for the file ddisasm-1.9.3-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for ddisasm-1.9.3-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 d582dfb8cdec2975c4d998ee89ad4bbbef17d937ae6e208462b9007ca4462229
MD5 40d4a41e9f7a7298d9c67c70f9f04cb7
BLAKE2b-256 61c0d4a5385dc9ec5e5a25781723a6f1377ca0b805df5e0ab3ee895dadc87714

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page