QBindiff binary diffing tool based on a Network Alignment problem
Reason this release was yanked:
Test release
Project description
qBinDiff
qBinDiff is an experimental binary diffing addressing the diffing as a Network Alignement Quadratic Problem. But why developing yet another differ when Bindiff works well ? We love bindiff, but we have no control at all on the diffing process. Also, it works great on standard binaries but it is more complex to put it in practice on some cornercases (embedded firmwares, diffing two portions of the same binary etc).
The key idea is to enable programing the diffing by:
- writing its own feature
- being able to enforce some matches
- being able to put the emphasis on either the content of functions (similarity) or the links between them (callgraph)
In essence, the idea is to be able to diff by defining its own criteria which sometimes, are not the control-flow CFG and instruction but more data-oriented for instance.
Last, qbindiff as primarly been designed with the binary-diffing use-case in mind, but it can be applied to various other use-cases like social-networks. Indeed, diffing two programs boils down to determining the best alignement of the call graph following some similarity criterias.
Indeed, solving this problem, APX-hard, that why we use a machine learning approach (more precisely optimization) to approximate the best match.
Likewise Bindiff, qBinDiff also works using an exported disassembly of program obtained from IDA. Originally using BinExport, it now also support Quokka as backend which extracted file is more exhaustive and also more compact on disk (good for large binary dataset).
Note: qBinDiff is an experimental tool for power-user where many parameters, thresholds or weights can be adjusted. Use it at your own risks.
(Please note that qBinDiff does not intend to be faster to Bindiff or other differ counterparts)
Installation
qBinDiff can be installed through pip with:
pip install qbindiff
As some part of the algorithm are very CPU intensive the installation will compile some components written in native c.
As depicted above, qBinDiff relies on some projects (also developed at Quarkslab):
- python-binexport, wrapper on the BinExport protobuf format.
- python-bindiff, wrapper around bindiff (used to write results as Bindiff databases)
- Quokka, another binary exported based on IDA. Faster than binexport and more exhaustive (thus diffing more relevant)
Usage (command line)
After installation, the binary qbindiff
is available in the path.
It takes in input two exported files and start the diffing analysis. The result can then
be exported in a BinDiff file format.
The default format for input files is BinExport,
for a complete list of backend loader look at the -l, --loader
option in the help.
The complete command line options are:
Usage: qbindiff [OPTIONS] <primary file> <secondary file>
qBinDiff is an experimental binary diffing tool based on machine learning technics, namely Belief propagation.
Options:
-l, --loader <loader> Loader type to be used. Must be one of these ['binexport', 'qbinexport']. [default: binexport]
-f, --features <feature> The following features are available:
- bnb: Number of basic blocks in the function
- meanins: Mean number of instructions per basic blocks in the function
- Gmd: Mean degree of the function
- Gd: Density of the function flow graph
- Gnc: Number of components in the function (non-connected flow graphs)
- Gdi: Diamater of the function flow graph
- Gt: Transitivity of the function flow graph
- Gcom: Number of graph communities (Louvain modularity)
- cnb: Number of children of the function
- pnb: Number of parents of the function
- rnb: Number of relatives of the function
- lib: Call to library functions (local function)
- dat: References to data in the instruction
- wlgk: Weisfeiler-Lehman Graph Kernel
- fname: Match the function names
- M: Mnemonic of instructions feature
- Mt: Mnemonic and type of operand feature
- Gp: Group of the instruction (FPU, SSE, stack..)
- addr: Address of the function as a feature
- dat: References to data in the instruction
- cst: Numeric constant (32/64bits) in the instruction (not addresses)
Features may be weighted by a positive value (default 1.0) and compared with a specificdistance (by default the option -d is used) like this <feature>:<weight>:<distance>
[default: ('bnb', 'meanins', 'Gmd', 'Gd', 'Gnc', 'Gdi', 'Gt', 'cnb', 'pnb', 'rnb', 'lib', 'dat', 'M', 'Mt', 'Gp', 'addr', 'dat', 'cst')]
-n, --normalize Normalize the Call Graph (can potentially lead to a partial matching). [default disabled]
-d, --distance <function> The following distances are available ('canberra', 'correlation', 'cosine', 'euclidean')
[default: canberra]
-s, --sparsity-ratio FLOAT Ratio of least probable matches to ignore. Between 0.0 to 1.0 [default: 0.75]
-t, --tradeoff FLOAT Tradeoff between function content (near 1.0) and call-graph information (near 0.0) [default: 0.75]
-e, --epsilon FLOAT Relaxation parameter to enforce convergence [default: 0.50]
-i, --maxiter INTEGER Maximum number of iteration for belief propagation [default: 1000]
-e1, --executable1 PATH Path to the primary raw executable. Must be provided if using qbinexport loader
-e2, --executable2 PATH Path to the secondary raw executable. Must be provided if using qbinexport loader
-o, --output PATH Write output to PATH
-ff, --file-format [bindiff] The file format of the output file. Supported formats are [bindiff]. [default: bindiff]
--enable-cortexm Enable the usage of the cortex-m extension when disassembling
-v, --verbose Activate debugging messages
-h, --help Show this message and exit.
Library usage
The strength of qBinDiff is to be usable as a python library. The following snippet shows an example of loading to binexport files and to compare them using the mnemonic feature.
from qbindiff import QBinDiff, Program
from qbindiff.features import WeisfeilerLehman
from pathlib import Path
p1 = Program(Path("primary.BinExport"))
p2 = Program(Path("secondary.BinExport"))
differ = QBinDiff(p1, p2)
differ.register_feature_extractor(WeisfeilerLehman, 1.0, distance='cosine')
differ.process()
mapping = differ.compute_matching()
output = {(match.primary.addr, match.secondary.addr) for match in mapping}
Documentation
The documentation is available on the diffing portal.
Custom diffing
TODO: Example diffing something unrelated to diffing.
Papers and conference
TODO:
Cite qBinDiff
TODO: ASE
Contributing & Contributors
Any help, or feedback is greatly appreciated via Github issues, pull requests.
Current:
- Robin David
- Riccardo Mori
- Roxane Cohen
Past:
- Alexis Challande
- Elie Mengin
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for qbindiff-0.2.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2dda425aa3452ba6925e1c09859eb9ac29d6b46c4a975742f95a5c2d1d49f7ae |
|
MD5 | 448906db3aa078c4e24cbbb96cd990e5 |
|
BLAKE2b-256 | 1672c03a6977dfff712d87e770f080f8056ea2ed9cf5ea812b49d616e7db8062 |
Hashes for qbindiff-0.2.0-pp310-pypy310_pp73-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 125b1dadad2f5bb0ffc68904f7220e7ed001b4f92fa457b35fef44c0fcf66896 |
|
MD5 | f17b90eecec7ba00ffaf82cdfc2eced6 |
|
BLAKE2b-256 | 0a5df5eed9f2b144e7a9a90aca2eccc8f9ec2d7e626a27cf75f6d0241320897c |
Hashes for qbindiff-0.2.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf8cbaf36e1db16cb3872eb0007dd4f038c58d1941ea8d06623ca4ae4987c2ea |
|
MD5 | 03062c978a3c68062e322b11013fb359 |
|
BLAKE2b-256 | 95b1ab51b336335d94f235bf165ca05174c3ece027f95dfda3b4d25727db12fe |
Hashes for qbindiff-0.2.0-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08a391c370e1bd54143ade323a4d4a9e3466f8f9edc705b13fad546924987f40 |
|
MD5 | 40ea88807a73e252de3543902041fbbd |
|
BLAKE2b-256 | c5b8ed11eb4eedec372270f47ee404e139484fbedf446d58a95363061ded0b8a |
Hashes for qbindiff-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db31a025ad9b4159b95e98b5829362b58f824262c92179c2c47b3fdaed370b3d |
|
MD5 | b46a570701c4dbcc0e196304f1c22661 |
|
BLAKE2b-256 | 5d088e1cf63972ef01263b7971fb8887b716633b112b229d8785ce1759b3a85b |
Hashes for qbindiff-0.2.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5f73101fb28742a437c65979984e7699fc799572198f9fc712c6c6d0f1c5e13 |
|
MD5 | 9dd8e21bd90922ac0b116d04e0714be4 |
|
BLAKE2b-256 | 3587a5ff13e8d5bd98635325e3d0f4a5a9684b6f133cf0c6701673911b6bd8b0 |
Hashes for qbindiff-0.2.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a913f32870af59ffa7abcbf30a4520d264dc366f8c4894793a74bb91c9d6bc7c |
|
MD5 | 600028059986a905a630dd1be5ba39da |
|
BLAKE2b-256 | 458cf9618c339d495c06ab539edae5764d81cad1017faf0b741b82ee2f7e9193 |
Hashes for qbindiff-0.2.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | be667cab4ab4ed2ee93449061bab9919b4fcf9b826199cc79d471965461ad57c |
|
MD5 | b97802e6a66f9f8eb51fd16705f82e73 |
|
BLAKE2b-256 | 113fc32a90a44f1ad664f31587ce970bea7a10eadfe5257c53c4670392056f36 |
Hashes for qbindiff-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d3211703aa3bf389622d2186180fc65c0b909e841edf6c135f09f2b9a9f9cb1 |
|
MD5 | aefb51860c5bebe824e0a9e0155b6da9 |
|
BLAKE2b-256 | d2073ad4b0680eab580c69f89b296ea0a8e567a2b72c4e7a3a0f5a6d5e68fff5 |
Hashes for qbindiff-0.2.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4eff0751a788e376ce879a55991feb886a67715a589467177065ba0259ac39b6 |
|
MD5 | f87698b57cec730da2d9ccfeaa80a228 |
|
BLAKE2b-256 | 06d2de824a27442c90806cce7bc9c6a2ebfc8ace67492aa83d5bb00cc9b36e48 |