Skip to main content

A tool to compare elf binaries

Project description

Python application

elf_diff - A Tool to Compare Elf Binaries

An Initial Example

Before going into detail about what elf_diff does, let's start with an example of a multi page html report that was generated as part of one of the regression tests of this project.

The test compiles, links and compares the elf files of two similar versions of a simple C++ program. The self contained HTML report that is generated in a subdirectory allows to explore the prominent similarities and differences between the symbols defined in the two elf files.

HTML reports look as follows. Please click on the table headers to proceed to the generated HTML pages.

Multi Page Single Page

Pdf reports may be generated alternatively.

Please see the examples section at the end of this document for more usage examples.

Purpose

  • resource/performance optimization
  • debugging
  • learning/teaching

The main purpose of elf_diff is to determine how specific changes to a piece of software affect resource consumption and performance. The tool may also serve to compare two independent change sets or just to have fun and learn how changes reflect in the generated assembly code.

The following information is part of elf_diff's report pages:

  • differences in the amount of program storage and static RAM usage
  • symbols that are only present in one of the two versions
  • symbols whose naming or call signature is similar in both versions, e.g. as a result of symbol renaming or subtly changing call signatures
  • assembly code discrepancies of functions with identical names and call signatures

As elf_diff operates on elf-files, it is fairly language and platform agnostic. All it requires to work is a suitable set of GNU Binutils for the target platform.

Introduction

This tool compares pairs of ELF binary files and provides information about differences in the contained symbols with respect to the space that they occupy in program memory (functions and global data) and in RAM (global data). Binary pairs that are passed to elf_diff are typically two versions of the same program/library/firmware. elf_diff can help you to find out about the impact of your changes on your code's resource consumption.

The differences between the binaries are summarized in tables that contain information about persisting, disappeared and new symbols. elf_diff also attempts to find pairs of matching symbols that might have been subject to renaming or signature changes (modified function arguments). Please be warned that the means to determine such symbol relations are very limited when working with binaries. False positives will result.

For all those symbols that have been subject to changes and also for the new and disappeared symbols, the tool provides diff-like comparisons of the disassembly.

All results are presented in either HTML or pdf files. HTML documents are cross-linked to conveniently allow jumping back and forth between bits of information, e.g. tabular information and symbol disassemblies. Du to the potentially large amount of information, some parts of the HTML reports are ommitted in the pdf files.

elf_diff has two modes of operation, pair-reports and mass-reports. While the former compares two binaries, the latter generates an overview-report for a set of binary-pairs. Such overview-reports list only the changes in terms of symbol sizes and the amount of symbols, no disassembly is provided to gain feasible document sizes.

Requirements

elf_diff is a Python script. It mostly uses standard libraries but also some non-standard packages (see the file requirements.txt) for more information.

elf_diff works and is automatically tested with Python 2 and 3.

Setup

The following procedure is required to install the elf_diff Python package.

  1. Install Python version >= 3.0
  2. Install the elf_diff package via one of the following commands
    • python3 -m pip install elf_diff (Linux)
    • py -m pip install elf_diff (Windows)

Alternatively when developing elf_diff, the following steps are required:

  1. Install Python version >= 3.0
  2. Clone the elf_diff repo from github.
  3. Install required packages via one of the following commands
    • python3 -m pip install -r requirements.txt (Linux)
    • py -m pip install -r requirements.txt (Windows)
  4. Add the bin directory of the elf_diff repo to your platform search path

To run elf_diff from the local git-sandbox, please use the script bin/elf_diff, e.g. as bin/elf_diff -h to display the help string.

Usage

There is a small difference between running Python on Linux and Windows. To display elf_diff's help page in a console window, type the following in a linux console

python3 -m elf_diff -h

or

py -m elf_diff -h

in a Windows console.

In the examples provided below, we go with the Linux syntax. Please replace the keyword python3 with py when executing the respective examples in a Windows environment.

Generating Pair-Reports

To generate a pair report, two binary files need to be passed to elf_diff via the command line. Let's assume those files are named my_old_binary.elf and my_new_binary.elf.

The following command will generate a multipage html report in a subdirectory of your current working directory.

python3 -m elf_diff my_old_binary.elf my_new_binary.elf

Generating Mass-Reports

Mass reports require a driver file (yaml syntax) that specifies a list of binaries to compare pair-wise.

Let's assume you have two pairs of binaries that reside in a directory /home/my_user.

binary_a_old.elf <-> binary_a_new.elf
binary_b_old.elf <-> binary_b_new.elf

A driver file (named my_elf_diff_driver.yaml) would then contain the following information:

binary_pairs:
    - old_binary: "/home/my_user/binary_a_old.elf"
      new_binary: "/home/my_user/binary_a_new.elf"
      short_name: "A short name"
    - old_binary: "/home/my_user/binary_b_old.elf"
      new_binary: "/home/my_user/binary_b_new.elf"
      short_name: "B short name"

The short_name parameters are used in the result tables to reference the respective binary pairs.

By using the driver file, we can now run a mass-report as

python3 -m elf_diff --mass_report --driver_file my_elf_diff_driver.yaml

This will generate a HTML file elf_diff_mass_report.html in your current working directory.

Generating pdf-Files

pdf files are generated by supplying the output file name using the parameter pdf_file either at the command line

python3 -m elf_diff --pdf_file my_pair_report.pdf my_old_binary.elf my_new_binary.elf

or from within a driver file, e.g.

pdf_file: "my_pair_report.pdf"

Specifying an Alternative HTML File Location

Similar to specifying an explicit filename for pdf files, the same can be done for our HTML output files, either via the command line

python3 -m elf_diff --html_file my_pair_report.hmtl my_old_binary.elf my_new_binary.elf

or from within a driver file, e.g.

html_file: "my_pair_report.html"

this will create a single file HTML report (with the exact same content as generated pdf files).

Specifying an Alternative HTML Directory

To generate a multi-page HTML report use the command line flag --html_dir to generate the HTML files e.g. in directory my_target_dir.

python3 -m elf_diff --html_dir my_target_dir my_pair_report.hmtl my_old_binary.elf my_new_binary.elf

Using Driver Files

The driver files that we already met when generating mass-reports can also generally be used to run elf_diff. Any parameters that can be passed as command line arguments to elf_diff can also occur in a driver file, e.g.

python3 -m elf_diff --mass_report --pdf_file my_file.pdf ...

In my_elf_diff_driver.yaml

mass_report: True
pdf_file: my_file.pdf
...

Supplying a Project Title

A project title could e.g. be a short name that summarizes the changes that you applied between the old and the new version of the compared binaries. Supply a title via the parameter project_title.

Adding Background Information

Additional information about the compared binaries can be added to pair-reports. Use the parameters old_info_file and new_info_file to supply filenames of text files whose content is supposed to be added to the report.

It is also possible to add general information to reports, e.g. about programming language or compiler version or about the build-system. This is supported through the build_info parameter which enables supplying a string that is added to the report. For longer strings, this can be conveniently done via the driver-file.

Everything that follows after build_info: > in the example will be added to the report.

build_info: >
  This build
  info is added to the report.
  The whitespaces in front of these lines are removed, the line breaks are
  preserved.

Using Alias Strings

If you want to obtain anonymized reports, it is not desirable to reveile details about your user name (home directory) or the directory structure. In such a case, the binary filenames can be replaced by alias wherever they would appear in the reports.

Supply alias names using the old_alias and new_alias parameters for the old or the new version of the binaries, respectively.

Working with Cross-Build Binaries

When working on firmware projects for embedded devices, you typically will be using a cross-build environment. If based on GNU gcc, such an environment usually not only ships with the necessary compilers but also with a set of additional tools called GNU Binutils.

elf_diff uses some of these tools to inspect binaries, namely nm, objdump and size. Although some information about binaries can be determined even with the host-version of these tools, it is e.g. not possible to retreive disassemblies.

In a cross-build environment, Binutils executable are usually bundled in a specific directory. They also often have a platform-specific prefix, to make them distinguishabel from their host-platform siblings. For the avr-version of Binutils e.g., that is shipped with the Arduino development suite, the prefix avr- is used. The respective commands are, thus, named avr-nm, avr-objdump and avr-size.

To make those dedicated binaries known to elf_diff, please add the binutils directory to the PATH environment variable, use the parameters bin_dir and bin_prefix or explicitly define the commands e.g. objdump_command (see command help).

A pair-report generation command for the avr-plattform would e.g. read

python3 -m elf_diff --bin_dir <path_to_avr_binaries> --bin_prefix "avr-" my_old_binary.elf my_new_binary.elf

The string <path_to_avr_binaries> in the above example would of course be replaced by the actual directory path where the binaries live.

Generating a Template Driver File

To generate a template driver file that can serve as a basis for your own driver files, just run elf_diff with the driver_template_file parameter, e.g. as

python3 -m elf_diff --driver_template_file my_template.yaml

Template files contain the default values of all available parameters, or - if the temple file is generated in the same session where a report was created - the template file will contain the actual settings used for the report generation.

Selecting and Excluding Symbols

By means of the command line arguments symbol_selection_regex and symbol_exclusion_regex, symbols can be explicitly selected and excluded. The specified regular expressions are applied to both the old and the old binary. For more fine grained selection, please used the *_old and *_new versions of the respective command line arguments.

Assembly Code

For most developers who are used to program in high level languages, assembly code is a mystery. Still, there is some information that an assembly-novice can gather from observing assembly code. Starting with the number of assembly code statements. Normally less means good. The more assembly statements there are representing a high level language statement, the more time the processor needs to process them. On the contrary, sometimes there may be a suspiciously low number of assembly statements which might indicate that the compiler has optimized away something that it shouldn't have.

All this, of course, relies on the knowledge about what assembly code is associated with which line of source. This information is not included in compiled binaries by default. The compiler must explicitly be told to export additional debugging information. For the gcc-compiler the flag -g, e.g., will cause this information to be emitted. But careful, some build systems when building debug versions replace optimization flags like -O3 with the debug flag -g. This is not what you want when looking at the performance of your code. Instead you want to add the -g flag and keep the optimization flag(s) in place. CMake, e.g. has a configuration variable CMAKE_BUILD_TYPE that can be set to the value RelWithDebInfo to enable a release build (with optimization enabled) that also comes with debug symbols.

For binaries with debug symbols included, elf_diff will annotate the assembly code by adding the high level language statements that it was generated from.

Examples

Simple Example

An example taken from the regression test bench that compares two binaries of two very simple C++ programs.

libstdc++

Comparison of two versions of libstdc++ shipping with gcc 4.8 and 5. There are vast differences between those two library versions which result in a great number of symbols being reported. The following command demonstrates how report generation can be resticted to a subset of symbols by using regular expressions. In the example we select only those symbols related to class std::string.

# Generated on Ubuntu 20.04 LTS 
python3 -m elf_diff \
   --symbol_selection_regex "^std::string::.*"   # select any symbol name starting with std::string:: \
   --pdf_file libstdc++_std_string_diff.pdf      # generate a pdf file \
   /usr/lib/gcc/x86_64-linux-gnu/4.8/libstdc++.a # path to old binary \
   /usr/lib/gcc/x86_64-linux-gnu/5/libstdc++.a   # path to new binary

TODO

The following is a list of features that would be nice to have and will or will not be added in the future:

  • read debug symbols from separate debug libraries and annotate assembly code

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elf_diff-0.3.3.tar.gz (772.7 kB view details)

Uploaded Source

Built Distribution

elf_diff-0.3.3-py3-none-any.whl (75.2 kB view details)

Uploaded Python 3

File details

Details for the file elf_diff-0.3.3.tar.gz.

File metadata

  • Download URL: elf_diff-0.3.3.tar.gz
  • Upload date:
  • Size: 772.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for elf_diff-0.3.3.tar.gz
Algorithm Hash digest
SHA256 67e7b67d35ee2fc1ec414cd24ccd43e35b2a778937099646ec2d2438569a492e
MD5 ec7c8ee6003ef873ae5f4e708cd90de7
BLAKE2b-256 4026c745a0b0aa0c06bc86e67a45a3eac5911f57516e098c2dfbd113106d1334

See more details on using hashes here.

File details

Details for the file elf_diff-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: elf_diff-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 75.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for elf_diff-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 00f47b05dbfac8e7351dc2c0caa21bc13391976c07f028368177ea4b426d64fc
MD5 5a3e2bc2941b57d125f01e7e4ff31151
BLAKE2b-256 d675db66a923dcc1df04eb57d277f6ef1c2734af737974512a834a5a16a1ea54

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page