A tool to compare elf binaries
Project description
elf_diff - A Tool to Compare Elf Binaries
An Initial Example
Before going into detail about what elf_diff does, let's start with an example of a multi page html report that was generated as part of one of the regression tests of this project.
The test compiles, links and compares the elf files of two similar versions of a simple C++ program. The self contained HTML report that is generated in a subdirectory allows to explore the prominent similarities and differences between the symbols defined in the two elf files.
HTML reports look as follows. Please click on the table headers to proceed to the generated HTML pages.
Multi Page | Single Page |
---|---|
Pdf reports may be generated alternatively.
Please see the examples section at the end of this document for more usage examples.
Purpose
- resource/performance optimization
- debugging
- learning/teaching
The main purpose of elf_diff is to determine how specific changes to a piece of software affect resource consumption and performance. The tool may also serve to compare two independent change sets or just to have fun and learn how changes reflect in the generated assembly code.
The following information is part of elf_diff's report pages:
- differences in the amount of program storage and static RAM usage
- symbols that are only present in one of the two versions
- symbols whose naming or call signature is similar in both versions, e.g. as a result of symbol renaming or subtly changing call signatures
- assembly code discrepancies of functions with identical names and call signatures
As elf_diff operates on elf-files, it is fairly language and platform agnostic. All it requires to work is a suitable set of GNU Binutils for the target platform.
Introduction
This tool compares pairs of ELF binary files and provides information about differences in the contained symbols with respect to the space that they occupy in program memory (functions and global data) and in RAM (global data). Binary pairs that are passed to elf_diff are typically two versions of the same program/library/firmware. elf_diff can help you to find out about the impact of your changes on your code's resource consumption.
The differences between the binaries are summarized in tables that contain information about persisting, disappeared and new symbols. elf_diff also attempts to find pairs of matching symbols that might have been subject to renaming or signature changes (modified function arguments). Please be warned that the means to determine such symbol relations are very limited when working with binaries. False positives will result.
For all those symbols that have been subject to changes and also for the new and disappeared symbols, the tool provides diff-like comparisons of the disassembly.
All results are presented in either HTML or pdf files. HTML documents are cross-linked to conveniently allow jumping back and forth between bits of information, e.g. tabular information and symbol disassemblies. Du to the potentially large amount of information, some parts of the HTML reports are ommitted in the pdf files.
elf_diff has two modes of operation, pair-reports and mass-reports. While the former compares two binaries, the latter generates an overview-report for a set of binary-pairs. Such overview-reports list only the changes in terms of symbol sizes and the amount of symbols, no disassembly is provided to gain feasible document sizes.
Requirements
elf_diff is a Python script. It mostly uses standard libraries but also some non-standard packages (see the file requirements.txt
) for more information.
elf_diff works and is automatically tested with Python 2 and 3.
Setup
The following procedure is required to install the elf_diff Python package.
- Install Python version >= 3.0
- Install the elf_diff package via one of the following commands
python3 -m pip install elf_diff
(Linux)py -m pip install elf_diff
(Windows)
Alternatively when developing elf_diff, the following steps are required:
- Install Python version >= 3.0
- Clone the elf_diff repo from github.
- Install required packages via one of the following commands
python3 -m pip install -r requirements.txt
(Linux)py -m pip install -r requirements.txt
(Windows)
- Add the
bin
directory of the elf_diff repo to your platform search path
To run elf_diff from the local git-sandbox, please use the script bin/elf_diff
, e.g. as bin/elf_diff -h
to display the help string.
Usage
There is a small difference between running Python on Linux and Windows. To display elf_diff's help page in a console window, type the following in a linux console
python3 -m elf_diff -h
or
py -m elf_diff -h
in a Windows console.
In the examples provided below, we go with the Linux syntax. Please replace the keyword python3
with py
when executing the respective examples in a Windows environment.
Generating Pair-Reports
To generate a pair report, two binary files need to be passed to elf_diff via the command line. Let's assume those files are named my_old_binary.elf
and my_new_binary.elf
.
The following command will generate a multipage html report in a subdirectory of your current working directory.
python3 -m elf_diff my_old_binary.elf my_new_binary.elf
Generating Mass-Reports
Mass reports require a driver file (yaml syntax) that specifies a list of binaries to compare pair-wise.
Let's assume you have two pairs of binaries that reside in a directory /home/my_user
.
binary_a_old.elf <-> binary_a_new.elf
binary_b_old.elf <-> binary_b_new.elf
A driver file (named my_elf_diff_driver.yaml
) would then contain the following information:
binary_pairs:
- old_binary: "/home/my_user/binary_a_old.elf"
new_binary: "/home/my_user/binary_a_new.elf"
short_name: "A short name"
- old_binary: "/home/my_user/binary_b_old.elf"
new_binary: "/home/my_user/binary_b_new.elf"
short_name: "B short name"
The short_name
parameters are used in the result tables to reference the respective binary pairs.
By using the driver file, we can now run a mass-report as
python3 -m elf_diff --mass_report --driver_file my_elf_diff_driver.yaml
This will generate a HTML file elf_diff_mass_report.html
in your current working directory.
Generating pdf-Files
pdf files are generated by supplying the output file name using the parameter pdf_file
either at the command line
python3 -m elf_diff --pdf_file my_pair_report.pdf my_old_binary.elf my_new_binary.elf
or from within a driver file, e.g.
pdf_file: "my_pair_report.pdf"
Specifying an Alternative HTML File Location
Similar to specifying an explicit filename for pdf files, the same can be done for our HTML output files, either via the command line
python3 -m elf_diff --html_file my_pair_report.hmtl my_old_binary.elf my_new_binary.elf
or from within a driver file, e.g.
html_file: "my_pair_report.html"
this will create a single file HTML report (with the exact same content as generated pdf files).
Specifying an Alternative HTML Directory
To generate a multi-page HTML report use the command line flag --html_dir
to generate the HTML files e.g. in directory my_target_dir
.
python3 -m elf_diff --html_dir my_target_dir my_pair_report.hmtl my_old_binary.elf my_new_binary.elf
Using Driver Files
The driver files that we already met when generating mass-reports can also generally be used to run elf_diff. Any parameters that can be passed as command line arguments to elf_diff can also occur in a driver file, e.g.
python3 -m elf_diff --mass_report --pdf_file my_file.pdf ...
In my_elf_diff_driver.yaml
mass_report: True
pdf_file: my_file.pdf
...
Supplying a Project Title
A project title could e.g. be a short name that summarizes the changes that you applied between the old and the new version of the compared binaries. Supply a title via the parameter project_title
.
Adding Background Information
Additional information about the compared binaries can be added to pair-reports. Use the parameters old_info_file
and new_info_file
to supply filenames of text files whose content is supposed to be added to the report.
It is also possible to add general information to reports, e.g. about programming language or compiler version or about the build-system. This is supported through the build_info
parameter which enables supplying a string that is added to the report. For longer strings, this can be conveniently done via the driver-file.
Everything that follows after build_info: >
in the example will be added to the report.
build_info: >
This build
info is added to the report.
The whitespaces in front of these lines are removed, the line breaks are
preserved.
Using Alias Strings
If you want to obtain anonymized reports, it is not desirable to reveile details about your user name (home directory) or the directory structure. In such a case, the binary filenames can be replaced by alias wherever they would appear in the reports.
Supply alias names using the old_alias
and new_alias
parameters for the old or the new version of the binaries, respectively.
Working with Cross-Build Binaries
When working on firmware projects for embedded devices, you typically will be using a cross-build environment. If based on GNU gcc, such an environment usually not only ships with the necessary compilers but also with a set of additional tools called GNU Binutils.
elf_diff uses some of these tools to inspect binaries, namely nm
, objdump
and size
. Although some information about binaries can be determined even with the host-version of these tools, it is e.g. not possible to retreive disassemblies.
In a cross-build environment, Binutils executable are usually bundled in a specific directory. They also often have a platform-specific prefix, to make them distinguishabel from their host-platform siblings. For the avr-version of Binutils e.g., that is shipped with the Arduino development suite, the prefix avr-
is used. The respective commands are, thus, named avr-nm
, avr-objdump
and avr-size
.
To make those dedicated binaries known to elf_diff, please add the binutils directory to the PATH environment variable, use the parameters bin_dir
and bin_prefix
or explicitly define the
commands e.g. objdump_command
(see command help).
A pair-report generation command for the avr-plattform would e.g. read
python3 -m elf_diff --bin_dir <path_to_avr_binaries> --bin_prefix "avr-" my_old_binary.elf my_new_binary.elf
The string <path_to_avr_binaries>
in the above example would of course be replaced by the actual directory path where the binaries live.
Generating a Template Driver File
To generate a template driver file that can serve as a basis for your own
driver files, just run elf_diff with the driver_template_file
parameter, e.g. as
python3 -m elf_diff --driver_template_file my_template.yaml
Template files contain the default values of all available parameters, or - if the temple file is generated in the same session where a report was created - the template file will contain the actual settings used for the report generation.
Selecting and Excluding Symbols
By means of the command line arguments symbol_selection_regex
and symbol_exclusion_regex
, symbols can be explicitly selected and excluded.
The specified regular expressions are applied to both the old and the old binary. For more fine grained selection, please used the *_old
and *_new
versions of the
respective command line arguments.
Assembly Code
For most developers who are used to program in high level languages, assembly code is a mystery. Still, there is some information that an assembly-novice can gather from observing assembly code. Starting with the number of assembly code statements. Normally less means good. The more assembly statements there are representing a high level language statement, the more time the processor needs to process them. On the contrary, sometimes there may be a suspiciously low number of assembly statements which might indicate that the compiler has optimized away something that it shouldn't have.
All this, of course, relies on the knowledge about what assembly code is associated with which line of source.
This information is not included in compiled binaries by default. The compiler must explicitly be told to export additional debugging information. For the gcc-compiler the flag -g
, e.g., will cause this information to be emitted. But careful, some build systems when building debug versions replace optimization flags like -O3
with the debug flag -g
. This is not what you want when looking at the performance of your code. Instead you want to add the -g
flag and keep the optimization flag(s) in place. CMake, e.g. has a configuration variable CMAKE_BUILD_TYPE
that can be set to the value RelWithDebInfo
to enable a release build (with optimization enabled) that also comes with debug symbols.
For binaries with debug symbols included, elf_diff will annotate the assembly code by adding the high level language statements that it was generated from.
Examples
Simple Example
An example taken from the regression test bench that compares two binaries of two very simple C++ programs.
libstdc++
Comparison of two versions of libstdc++ shipping with gcc 4.8 and 5. There are vast differences between those two library versions which result in a great number of symbols being reported. The following command demonstrates how report generation can be resticted to a subset of symbols by using regular expressions.
In the example we select only those symbols related to class std::string
.
# Generated on Ubuntu 20.04 LTS
python3 -m elf_diff \
--symbol_selection_regex "^std::string::.*" # select any symbol name starting with std::string:: \
--pdf_file libstdc++_std_string_diff.pdf # generate a pdf file \
/usr/lib/gcc/x86_64-linux-gnu/4.8/libstdc++.a # path to old binary \
/usr/lib/gcc/x86_64-linux-gnu/5/libstdc++.a # path to new binary
TODO
The following is a list of features that would be nice to have and will or will not be added in the future:
- read debug symbols from separate debug libraries and annotate assembly code
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file elf_diff-0.3.4.tar.gz
.
File metadata
- Download URL: elf_diff-0.3.4.tar.gz
- Upload date:
- Size: 772.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4336df8515ac7c295081f7470e5689ed372a32f26bf8e5644c6156e73aabee59 |
|
MD5 | 09feff45a6aca9d455a285d8cb2522df |
|
BLAKE2b-256 | 2f1705c566d31b82ca827c11d406195e235877f72c46f669305ed9ce0413bbb7 |
File details
Details for the file elf_diff-0.3.4-py3-none-any.whl
.
File metadata
- Download URL: elf_diff-0.3.4-py3-none-any.whl
- Upload date:
- Size: 75.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d65cff369440fd0b0f316e553bc1356764b04983c5392b1ff5318391b6541c04 |
|
MD5 | e5a081d2b65ae47b8eaa547e90fee3a1 |
|
BLAKE2b-256 | 6b3dd6db064a05e10a8fb16db04602517350c336346362415bc6fc210ffce8ce |