Skip to main content

Python API to synthesize Triton AST's

Project description

Qsynthesis

QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions. It aims at facilitating code deobfuscation. The algorithm is greybox approach combining both a blackbox I/O based synthesis and a whitebox AST search to synthesize sub-expressions (if the root node cannot be synthesized).

This algorithm as originaly been described at the BAR academic workshop:

The code has been release as part of the following Black Hat talk:

Disclaimer: This framework is experimental, and shall only be used for experimentation purposes. It mainly aims at stimulating research in this area.

Documentation

The installation, examples, and API documentation is available on the dedicated documentation: Documentation

Functionalities

The core synthesis is based on Triton symbolic engine on which is built the whole framework. It provides the following functionalities:

  • synthesis of bitvector expressions
  • ability to check through SMT the semantic equivalence of synthesized expressions
  • ability to synthesize constants (if the expression encode a constant)
  • ability to improve oracles (pre-computed tables) overtime through a learning mechanism
  • ability to reassemble synthesized expression back to assembly
  • ability to serve oracles through a REST API to facilitate the synthesis usage
  • an IDA plugin providing an integration of the synthesis

Quick start

Installation

In order to work Triton first has to be installed: install documentation. Triton does not automatically install itself in a virtualenv, copy it in your venv or use --system-site-packages when configuring your venv.

Then:

$ git clone https://github.com/quarkslab/qsynthesis.git
$ cd qsynthesis
$ pip3 install '.[all]'

The [all] will installed all dependencies (see the documentation for a light install).

Table generation

The synthesis algorithm requires generating oracle tables derived from a grammar (a set of variables and operators). Qsynthesis installation provides the utility qsynthesis-table-manager enabling manipulating tables. The following command generate a table with 3 variables of 64 bits, 5 operators using a vector of 16 inputs. We limit the generation to 5 million entries.

$ qsynthesis-table-manager generate -bs 64 --var-num 3 --input-num 16 --random-level 5 --ops AND,NEG,MUL,XOR,NOT --watchdog 80 --limit 5000000 my_oracle_table
Generate Table
Watchdog value: 80.0
Depth 2 (size:3) (Time:0m0.23120s)
Depth 3 (size:21) (Time:0m0.23198s)
Depth 4 (size:574) (Time:0m0.26068s)
Depth 5 (size:400858) (Time:0m21.23231s)
Threshold reached, generation interrupted
Stop required
Depth 5 (size:5000002) (Time:4m52.56009s) [RAM:9.52Gb]

Note: The generation process is RAM consuming the --watchdog enables setting a percentage of the RAM above which the generation is interrupted.

Synthesizing a bitvector expression

We then can try simplifying a seemingly obfuscated expression with:

from qsynthesis import SimpleSymExec, TopDownSynthesizer, InputOutputOracleLevelDB

blob = b'UH\x89\xe5H\x89}\xf8H\x89u\xf0H\x89U\xe8H\x89M\xe0L\x89E\xd8H\x8bE' \
       b'\xe0H\xf7\xd0H\x0bE\xf8H\x89\xc2H\x8bE\xe0H\x01\xd0H\x8dH\x01H\x8b' \
       b'E\xf8H+E\xe8H\x8bU\xe8H\xf7\xd2H\x0bU\xf8H\x01\xd2H)\xd0H\x83\xe8' \
       b'\x02H!\xc1H\x8bE\xe0H\xf7\xd0H\x0bE\xf8H\x89\xc2H\x8bE\xe0H\x01\xd0' \
       b'H\x8dp\x01H\x8bE\xf8H+E\xe8H\x8bU\xe8H\xf7\xd2H\x0bU\xf8H\x01\xd2' \
       b'H)\xd0H\x83\xe8\x02H\t\xf0H)\xc1H\x89\xc8H\x83\xe8\x01]\xc3'

# Perform symbolic execution of the instructions
symexec = SimpleSymExec("x86_64")
symexec.initialize_register('rip', 0x40B160)  # arbitrary address
symexec.initialize_register('rsp', 0x800000)  # arbitrary stack
symexec.execute_blob(blob, 0x40B160)
rax = symexec.get_register_ast("rax")  # retrieve rax register expressions

# Load lookup tables
ltm = InputOutputOracleLevelDB.load("my_oracle_table")

# Perform Synthesis of the expression
synthesizer = TopDownSynthesizer(ltm)
synt_rax, simp = synthesizer.synthesize(rax)

print(f"expression: {rax.pp_str}")
print(f"synthesized expression: {synt_rax.pp_str} [{simp}]")

Limitations

  • synthesis accuracy limited by pre-computed tables exhaustivness
  • table generation limited by RAM consumption
  • reassembly cannot involve memory variable, destination is necessarily a register and architecture depends on llvmlite (thus mostly x86_64)
  • the code references trace-based synthesis which is disabled (as the underlying framework is not yet open-source)

Authors

  • Robin David (@RobinDavid), Quarkslab

Contributors

Huge thanks to contributors to this research:

  • Luigi Coniglio
  • Jonathan Salwan

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qsynthesis-0.2.0.tar.gz (74.3 kB view details)

Uploaded Source

Built Distribution

qsynthesis-0.2.0-py3-none-any.whl (83.2 kB view details)

Uploaded Python 3

File details

Details for the file qsynthesis-0.2.0.tar.gz.

File metadata

  • Download URL: qsynthesis-0.2.0.tar.gz
  • Upload date:
  • Size: 74.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for qsynthesis-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8d3b20a6e56d38069837b2edec048abfefef64c9d326e10f9e8fe3dcb31fcd01
MD5 df8b4aa3fe527fae83b5994fdf5e1776
BLAKE2b-256 b3392f607f63f2e68dc136f04436d256385d6ab53b1bac442ea960cd303b4308

See more details on using hashes here.

File details

Details for the file qsynthesis-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: qsynthesis-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 83.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for qsynthesis-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5491e0413e98181c811454ffbd38ccc4bd982584a6e9147c0dc333b3a35cc24
MD5 ea4f182195612409f8eed373cf7ed662
BLAKE2b-256 18d81c9ab7739e58c0e0454c666589c2fc72489f0155a1440a14c9613051e62a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page