Skip to main content

GrammaTech Intermediate Representation for Binaries

Project description

GTIRB

The GrammaTech Intermediate Representation for Binaries (GTIRB) is a machine code analysis and rewriting data structure. It is intended to facilitate the communication of binary IR between programs performing binary disassembly, analysis, transformation, and pretty printing. GTIRB is modeled on LLVM-IR, and seeks to serve a similar functionality of encouraging communication and interoperability between tools.

The remainder of this file describes various aspects of GTIRB:

Structure

GTIRB has the following structure. Solid lines denote inheritance. Dotted lines denote reference by UUID.

GTIRB Data Structure

IR

An instance of GTIRB may include multiple modules (Module) which represent loadable objects such as executables or libraries, an inter-procedural control flow graph (IPCFG), and Auxiliary Data tables (AuxData) which can hold arbitrary analysis results in user-defined formats which can easily reference other elements of the IR. Each module holds information such as symbols (Symbol) and sections which themselves hold the actual bytes and data and code blocks of the module. The CFG consists of basic blocks (Block) and control flow edges between these blocks. Each data or code block references a range of bytes in a byte interval (ByteInterval). A section may hold one large byte interval holding all blocks---if the relative positions of blocks in that section are defined---or may hold one byte interval per block---if the relative positions of blocks is not defined, e.g. for the code blocks in the .text section during program rewriting. Each symbol holds a pointer to the block or datum it references.

Instructions

GTIRB explicitly does NOT represent instructions or instruction semantics but does provide symbolic operand information and access to the bytes. There are many intermediate languages (IL)s for representation of instruction semantics (e.g., BAP's BIL, Angr's Vex, or Ghidra's P-code). GTIRB works with these or any other IL by storing instructions generally and efficiently as raw machine-code bytes and separately storing the symbolic and control flow information. The popular Capstone/Keystone decoder/encoder provide an excellent option to read and write instructions from/to GTIRB's machine-code byte representation without committing to any particular semantic IL. By supporting multiple ILs and separate storage of analysis results in auxiliary data tables GTIRB enables collaboration between independent binary analysis and rewriting teams and tools.

Auxiliary Data

GTIRB provides for the sharing of additional information, e.g. analysis results, in the form of AuxData objects. These can store maps and vectors of basic GTIRB types in a portable way. The GTIRB manual describes the structure for common types of auxiliary data such as function boundary information, type information, or results of common analyses in Standard AuxData Schemata.

UUIDs

Every element of GTIRB---e.g., modules (Module), symbols (Symbol), and blocks (Block)---has a universally unique identifier (UUID). UUIDs allow both first-class IR components and AuxData tables to reference elements of the IR.

Instructions and symbolic operands can be addressed by the class Offset which encapsulates a UUID (that refers to the instruction's block) and an offset.

Installing

Packages currently existing for easily installing GTIRB (and attendant tooling on Ubuntu and Arch Linux). See below for instructions.

Ubuntu

Packages for Ubuntu 16 and 18 are available in the GTIRB apt repository. The GTIRB package has some dependencies which are only available in other PPAs. You will have to add these PPAs to your system in order to install the GTIRB package.

Instructions for adding the appropriate PPAS and installing GTIRB on each platform follow.

Ubuntu16

sudo add-apt-repository ppa:maarten-fonville/protobuf
sudo add-apt-repository ppa:mhier/libboost-latest
echo "deb https://grammatech.github.io/gtirb/pkgs/xenial ./" | sudo tee -a /etc/apt/sources.list.d/gtirb.list
sudo apt-get update
sudo apt-get install --allow-unauthenticated gtirb

Ubuntu18

sudo add-apt-repository ppa:mhier/libboost-latest
echo "deb [trusted=yes] https://grammatech.github.io/gtirb/pkgs/bionic ./" | sudo tee -a /etc/apt/sources.list.d/gtirb.list
sudo apt-get update
sudo apt-get install gtirb

Arch Linux

Arch packages are available for download from https://grammatech.github.io/gtirb/pkgs/arch/ and may be directly installed with pacman.

Additionally, the Arch User Repository (AUR) https://aur.archlinux.org/ has packages for GTIRB (gtirb-git) the GTIRB Pretty Printer (gtirb-pprinter-git) and the datalog disassembler (ddisasm-git). Note that installing ddisasm-git will cause the other two packages to be installed as well given that they are both dependencies.

The following command will build and install GTIRB using the popular aur helper yay.

yay gtirb-git

Building

GTIRB should successfully build in 64-bits with GCC, Clang, and Visual Studio compilers supporting at least C++17. GTIRB uses CMake which must be installed with at least version 3.10.

The common build process looks like this:

mkdir build
cd build
# Note: You may wish to add some -D arguments to the next command. See below.
cmake <path/to/gtirb>
cmake --build .
# Run the test suite.
bin/TestGTIRB

For customizing the GTIRB build, you can get a list of customization options by navigating to your build directory and running:

cmake -LH

Requirements

To build and install GTIRB, the following requirements should be installed:

  • CMake, version 3.10.0 or higher.
    • Ubuntu 18 provides this version via the APT package cmake.
    • Ubuntu 16 and earlier provide out of date versions; build from source on those versions.
  • Protobuf, version 3.0.0 or later.
    • Ubuntu 18 provides this version via the APT packages libprotobuf-dev and protobuf-compiler.
    • Ubuntu 16 and earlier provide out of date versions; build from source on those versions.

Usage

GTIRB is designed to be serialized using Google's protocol buffers (i.e., protobuf), enabling easy and efficient use from any programming language.

GTIRB may also be used through a dedicated API implemented in multiple languages. The APIs provide efficient data structures suitable for use by binary analysis and rewriting applications; see below for details.

Using Serialized GTIRB Data

The serialized protobuf data produced by GTIRB allows for exploration and manipulation in the language of your choice. The Google protocol buffers homepage lists the languages in which protocol buffers can be used directly; users of other languages can convert the protobuf-formatted data to JSON format and then use the JSON data in their applications.

The proto directory in this repository contains the protocol buffer message type definitions for GTIRB. You can inspect these .proto files to determine the structure of the various GTIRB message types. The top-level message type is IR.

For more details, see Using Serialized GTIRB Data.

GTIRB API Implementations

The GTIRB API is currently available in C++, Python, and Common Lisp. For language-independent API information, see GTIRB Components. For information about the different API implementations, see:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gtirb-1.6.1.tar.gz (40.9 kB view details)

Uploaded Source

Built Distribution

gtirb-1.6.1-py3-none-any.whl (55.7 kB view details)

Uploaded Python 3

File details

Details for the file gtirb-1.6.1.tar.gz.

File metadata

  • Download URL: gtirb-1.6.1.tar.gz
  • Upload date:
  • Size: 40.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.3.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.9

File hashes

Hashes for gtirb-1.6.1.tar.gz
Algorithm Hash digest
SHA256 f04a4f42a664be44c1a4c7789006890f10ebcb0107ca1238491c6e54cb23d56d
MD5 7f622b345316145e2f99847686320daf
BLAKE2b-256 8d701c04202e7ef1c89beefd13c737672e830cc4468e915462513d2190088255

See more details on using hashes here.

File details

Details for the file gtirb-1.6.1-py3-none-any.whl.

File metadata

  • Download URL: gtirb-1.6.1-py3-none-any.whl
  • Upload date:
  • Size: 55.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.3.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.9

File hashes

Hashes for gtirb-1.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9f8363c3ff602580c5bafec17c9339cfccdfec3f583f3ade2831e4b30081adbb
MD5 18556ca5dafc6e67440591104507eb17
BLAKE2b-256 0bf61761bd29958d5ab35ff6beee7349ceb73d8e643e7126342c1fc15c600a32

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page