Skip to main content

QBindiff binary diffing tool based on a Network Alignment problem

Project description

QBinDiff

QBinDiff is an experimental binary diffing tool addressing the diffing as a Network Alignement Quadratic Problem.

But why developing yet another differ when Bindiff works well?

Bindiff is great, no doubt about it, but we have no control on the diffing process. Also, it works great on standard binaries but it lacks flexibility on some corner-cases (embedded firmwares, diffing two portions of the same binary etc...).

A key idea of QBinDiff is enabling tuning the diffing programmatically by:

  • writing its own feature
  • being able to enforce some matches
  • emphasizing either on the content of functions (similarity) or the links between them (callgraph)

In essence, the idea is to be able to diff by defining its own criteria which sometimes, is not the control-flow and instructions but could for instance, be data-oriented.

Last, QBinDiff as primarily been designed with the binary-diffing use-case in mind, but it can be applied to various other use-cases like social-networks. Indeed, diffing two programs boils down to determining the best alignment of the call graph following some similarity criterion.

Indeed, solving this problem is APX-hard, that why QBinDiff uses a machine learning approach (more precisely optimization) to approximate the best match.

Like Bindiff, QBinDiff also works using an exported disassembly of program obtained from IDA. Originally using BinExport, it now also support Quokka as backend, which extracted files, are more exhaustive and also more compact on disk (good for large binary dataset).

[!NOTE] QBinDiff is an experimental tool for power-user where many parameters, features, thresholds or weights can be adjusted. Obtaining good results usually requires tuning these parameters.

(Please note that QBinDiff does not intend to be faster than other differs, but rather being more flexible.)

[!WARNING] QBinDiff graph alignment is very memory intensive (compute large matrices), it can fill RAM if not cautious. Try not diffing binaries larger than +10k functions. For large program use very high sparsity ratio (0.99).

Documentation

The documentation can be found on the diffing portal or can be manually built with

pip install .[doc]
cd doc
make html

Below you will find some sections extracted from the documentation. Please refer to the full documentation in case of issues.

Installation

QBinDiff can be installed through pip with:

pip install qbindiff

As some part of the algorithm are very CPU intensive the installation will compile some components written in native C/C++.

As depicted above, QBinDiff relies on some projects (also developed at Quarkslab):

  • python-binexport, wrapper on the BinExport protobuf format.
  • python-bindiff, wrapper around bindiff (used to write results as Bindiff databases)
  • Quokka, another binary exported based on IDA. Faster than binexport and more exhaustive (thus diffing more relevant)

Usage (command line)

After installation, the binary qbindiff is available in the path. It takes in input two exported files and start the diffing analysis. The result can then be exported in a BinDiff file format. The default format for input files is BinExport, for a complete list of backend loader look at the -l1, --loader1 option in the help. The complete command line options are:

Usage: qbindiff [OPTIONS] <primary file> <secondary file>

  QBinDiff is an experimental binary diffing tool based on machine learning technics, namely Belief propagation.

Options:
  -l1, --loader1 <loader>       Loader type to be used for the primary. Must be one of these ['binexport', 'quokka',
                                'ida']  [default: binexport]
  -l2, --loader2 <loader>       Loader type to be used for the secondary. Must be one of these ['binexport', 'quokka',
                                'ida']  [default: binexport]
  -f, --feature <feature>       Features to use for the binary analysis, it can be specified multiple times.
                                Features may be weighted by a positive value (default 1.0) and/or compared with a
                                specific distance (by default the option -d is used) like this <feature>:<weight>:<distance>.
                                For a list of all the features available see --list-features.
  -n, --normalize               Normalize the Call Graph (can potentially lead to a partial matching). [default
                                disabled]
  -d, --distance <function>     The following distances are available ['canberra', 'euclidean', 'cosine',
                                'jaccard_strong']  [default: canberra]
  -s, --sparsity-ratio FLOAT    Ratio of least probable matches to ignore. Between 0.0 (nothing is ignored) to 1.0
                                (only perfect matches are considered)  [default: 0.75]
  -sr, --sparse-row             Whether to build the sparse similarity matrix considering its entirety or processing
                                it row per row
  -t, --tradeoff FLOAT          Tradeoff between function content (near 1.0) and call-graph information (near 0.0)
                                [default: 0.75]
  -e, --epsilon FLOAT           Relaxation parameter to enforce convergence  [default: 0.5]
  -i, --maxiter INTEGER         Maximum number of iteration for belief propagation  [default: 1000]
  -e1, --executable1 PATH       Path to the primary raw executable. Must be provided if using quokka loader
  -e2, --executable2 PATH       Path to the secondary raw executable. Must be provided if using quokka loader
  -o, --output PATH             Write output to PATH
  -ff, --file-format [bindiff|csv]
                                The file format of the output file  [default: csv]
  -v, --verbose                 Activate debugging messages. Can be supplied multiple times to increase verbosity
  --version                     Show the version and exit.
  --arch-primary TEXT           Force the architecture when disassembling for the primary. Format is
                                'CS_ARCH_X:CS_MODE_Ya,CS_MODE_Yb,...'
  --arch-secondary TEXT         Force the architecture when disassembling for the secondary. Format is
                                'CS_ARCH_X:CS_MODE_Ya,CS_MODE_Yb,...'
  --list-features               List all the available features
  -h, --help                    Show this message and exit.

Library usage

The strength of qBinDiff is to be usable as a python library. The following snippet shows an example of loading to binexport files and to compare them using the mnemonic feature.

from qbindiff import QBinDiff, Program
from qbindiff.features import WeisfeilerLehman
from pathlib import Path

p1 = Program(Path("primary.BinExport"))
p2 = Program(Path("secondary.BinExport"))

differ = QBinDiff(p1, p2)
differ.register_feature_extractor(WeisfeilerLehman, 1.0, distance='cosine')

differ.process()

mapping = differ.compute_matching()
output = {(match.primary.addr, match.secondary.addr) for match in mapping}

Contributing & Contributors

Any help, or feedback is greatly appreciated via Github issues, pull requests.

Current:

  • Robin David
  • Riccardo Mori
  • Roxane Cohen

Past:

  • Alexis Challande
  • Elie Mengin

All contributions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qbindiff-1.2.0.tar.gz (3.9 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

qbindiff-1.2.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (393.3 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

qbindiff-1.2.0-pp310-pypy310_pp73-manylinux_2_17_i686.manylinux2014_i686.whl (399.9 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ i686

qbindiff-1.2.0-cp312-cp312-musllinux_1_1_x86_64.whl (979.8 kB view details)

Uploaded CPython 3.12musllinux: musl 1.1+ x86-64

qbindiff-1.2.0-cp312-cp312-musllinux_1_1_i686.whl (1.0 MB view details)

Uploaded CPython 3.12musllinux: musl 1.1+ i686

qbindiff-1.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (420.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

qbindiff-1.2.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl (428.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ i686

qbindiff-1.2.0-cp311-cp311-musllinux_1_1_x86_64.whl (985.9 kB view details)

Uploaded CPython 3.11musllinux: musl 1.1+ x86-64

qbindiff-1.2.0-cp311-cp311-musllinux_1_1_i686.whl (1.0 MB view details)

Uploaded CPython 3.11musllinux: musl 1.1+ i686

qbindiff-1.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (429.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

qbindiff-1.2.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl (436.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ i686

qbindiff-1.2.0-cp310-cp310-musllinux_1_1_x86_64.whl (982.3 kB view details)

Uploaded CPython 3.10musllinux: musl 1.1+ x86-64

qbindiff-1.2.0-cp310-cp310-musllinux_1_1_i686.whl (1.0 MB view details)

Uploaded CPython 3.10musllinux: musl 1.1+ i686

qbindiff-1.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (433.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

qbindiff-1.2.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl (437.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ i686

File details

Details for the file qbindiff-1.2.0.tar.gz.

File metadata

  • Download URL: qbindiff-1.2.0.tar.gz
  • Upload date:
  • Size: 3.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for qbindiff-1.2.0.tar.gz
Algorithm Hash digest
SHA256 a46f25abda7415726b8d04239743fabd9777b8e27837e0929dbe2b770727075b
MD5 1d0d9d8dffc5bb2923543056de923c80
BLAKE2b-256 a567aed44492718dbfbcc4155e215861a55ee89c0841dcbd666176b546f6449e

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b571f361a836ef997090f67cb963f4c720a23c1a41d975c600b77c1628527ba4
MD5 06276019cd65d16b9e30525cd73dcc98
BLAKE2b-256 fad61649b4733769b5274de6ab7ab5924ccd443abb007cb6cee40d2835259252

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-pp310-pypy310_pp73-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-pp310-pypy310_pp73-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 2e5d1be2ff9622020bee930e1fabb7f9a486ab0b26b51ebc75c37154abc72521
MD5 69cc187e5dd5d690f034822fb79e1875
BLAKE2b-256 289608c58e1e013f74de9df9f76869d0449dc0782d1c6448f9cb4294c9e1210e

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-cp312-cp312-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-cp312-cp312-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 c1e4f54403e6ec2241a1715c6d2434c6f9e54d88a4e543d9c60d71cc94b9222d
MD5 50746fa27cbc63836e7242239f554d2a
BLAKE2b-256 140c78ff9f5ea7bd0814e03bf61206037b4ff6f0160ad54cb798f8e13537b348

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-cp312-cp312-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-cp312-cp312-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 550be39dbc8e437ad6fef00de2a051f8aa377ad5b3b81791e90e9b7b01f4b807
MD5 00fbe48c440861ab3595e116934c366b
BLAKE2b-256 df31a772a323b9088ffea195acaa09d69cc7593d2847405d43b37e7c860f90c7

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 90652213b3e26fa4caa21ad09ed719b03e016fd15977b3ae2708d4a0221c61c1
MD5 54199f51fdd235e6c5fd7e0c5a5c737f
BLAKE2b-256 fc7fd93ca337e676077806f0f3fa7e3e240b679a6707c0a1742786c5eb3912a8

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 ff85421c7031790f5d249e2c338d328efaa0e3f4d787d1982be2527f16aded29
MD5 41c372d4ede2a5c7113e4293237321ad
BLAKE2b-256 1b182e7556f56197cb051484d63d9907a8e552bcb33b97f57cd5cdfdcdb830de

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 5613585b167dbe5f100aa95de2f2c8394bbdd41f50b280e3d443504bdc418937
MD5 7425fee1b2d61108be3b87f8a8a982f2
BLAKE2b-256 d26577409dc75ac54903249082b11291e0959e0ec400df963d427cd131b3e908

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-cp311-cp311-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-cp311-cp311-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 270c403da550b622f5a8a67d8c7b58cf1f0671e1f35b13e634a663ca064c00db
MD5 b5e9bc8d2a245fcf571f2125c2839555
BLAKE2b-256 171190ad80f54b38b1a2a1e5735e44a14f95bf13c9c68a7a8c121a6a8227bb29

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 17420c40d3644a4dbc555753d49ddf8d85bd38bdfc916aaaa463681817ba8f1e
MD5 43fbeacf6c367c9b0b7f8f350d4a8364
BLAKE2b-256 21f0077d7fd0591ff58928ed98e70b1ceb408783cf8a927d30093beaa36ed29c

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 9b15ae003f6f67c7348949662dae18c8af7313a43fd75c95828631e4674e5cf9
MD5 5bc0341a25c0a18ee9716f93106675b0
BLAKE2b-256 916999a7122f3610649eaa99316b49d91e3140cd51a80b6567221799267a0c44

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 e6b3928f7a1f97aadfff8c0a92dd56ba26a142d7fb71fe7ded8b4d0fe104362e
MD5 552842881f3c58d4e39b25a7c1dee246
BLAKE2b-256 f7d097fd99f04c141a7f05a5bfb552d426da8c871842f0bc102934951e4c330e

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 c89a6954adf83468855e55d0dd11b035fb9717264a3784ab5711c2f1a04c9c06
MD5 72ebac6ae033e2a5d8187bc36d366627
BLAKE2b-256 95eb2c30b9609019088403dd9d26b9d6a488bbdacfb97df4883949067c64efac

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6c71cb0c00659ad18a119def44ae73f2a37bf0b0b542a7432c1fa1d88982f44a
MD5 8bd621422a0de6c209a6f2f6e37722d7
BLAKE2b-256 ca62b218e7aac51b5bd14eaa2ccd41688d65ef494a621b98ccb255a1270cf677

See more details on using hashes here.

File details

Details for the file qbindiff-1.2.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for qbindiff-1.2.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 3e2491f8da01196e4c36807409e0b2b1bb2009f43458663f9ccb8230ff794605
MD5 0d5621ca74d91ca92cae783c7d0dd8bd
BLAKE2b-256 4210fbecf3578c74d84b5fcf4435a4349cc7798fe99b78dcd9fbed3b34d6580d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page