Skip to main content

No project description provided

Project description

Build status Code coverage information Supported versions of Python PyPi package version


Clade is a tool for intercepting build commands (stuff like compilation, linking, mv, rm, and all other commands that are executed during build). Intercepted commands can be parsed (to search for input and output files, and options) and then used for various purposes:

  • generating compilation database;
  • obtaining information about dependencies between source and object files;
  • obtaining information about the source code (source code querying);
  • generating function call graph;
  • running software verification tools;
  • visualization of all collected information;
  • and for much more.

The interception of build commands is independent of the project type and used programming languages. However, all other functionality available in Clade IS dependent. Currently only C projects are supported, but other languages and additional functionality can be supported through the built-in extension mechanism.


An important part of Clade - a build commands intercepting library - is written in C and it needs to be compiled before use. It will be performed automatically at the installation stage, but you will need to install some prerequisites beforehand:

  • Python 3 (>=3.4)
  • cmake (>=3.3)
  • make
  • C and C++ compiler (gcc or clang)
  • Linux only: python3-dev (Ubuntu) or python3-devel (openSUSE) package

Optional dependencies:

  • For obtaining information about the C code you will need CIF installed. CIF is an interface to Aspectator which in turn is a GCC based tool that implements aspect-oriented programming for the C programming language. You may download compiled CIF on CIF releases page.
  • Graphviz for some visualization capabilities
  • Linux only: gcc-multilib (Ubuntu) or gcc-32bit (openSUSE) package to intercept build commands of projects leveraging multilib capabilities

Clade works on Linux and macOS. Partial support for Windows will be implemented soon.


To install the latest stable version just run the following command:

$ pip3 install clade

For development purposes you may install Clade in “editable” mode directly from the repository (clone it on your computer beforehand):

$ pip3 install -e .

You can check that Clade works as expected on your machine by running the test suite from the repository:

$ pytest

How to use

All functionality is available both as command-line scripts and as Python modules that you can import and use, so the following examples will include both use cases.

Build command intercepting

Clade can intercept the exec calls issued by the build tool for each build command. To do this we have developed a shared library (called libinterceptor) that redefine such exec functions: before creating a new process our exec functions store the information about the command into a separate file. The library is than injected into the build process using LD_PRELOAD (Linux) and DYLD_INSERT_LIBRARIES (macOS) mechanisms provided by the dynamic linker.

An explanation of LD_PRELOAD

Intercepting of build commands is quite easy: all you need is to wrap your main build command like this:

$ clade-intercept make

where make should be replaced by your project build command. The output file called cmds.txt will be stored in the current directory and will contain all intercepted commands, one per line.

You can change the path to to the file where intercepted commands will be saved using -o (–output) option:

$ clade-intercept -o /work/cmds.txt make

In case the build process of your project consists of several independent steps, you can still create one single cmds.txt file using -a (–append) option:

$ clade-intercept make step_one
$ clade-intercept -a make step_two

As a result, build commands of the second make command will be appended to the cmds.txt file created previously.

There is an alternative fallback intercepting method that is based on wrappers. It can be used when LD_PRELOAD is unavailable:

$ clade-intercept -f make

Unfortunately, for now wrappers can’t intercept commands that are executed bypassing the PATH environment variable: for example, gcc command can be intercepted, but calling directly to /usr/bin/gcc cannot. We have plans to implement some workarounds to mitigate this issue.

You can intercept build commands from a python script:

from clade.intercept import Interceptor
i = Interceptor(command=["make"], output="cmds.txt", append=False, fallback=False)

Content of cmds.txt file

Let’s look at the simple makefile:

    gcc main.c -o main
    rm main

If we try to intercept make all command, the following cmds.txt file will be produced (on macOS):


You can try to use cmds.txt file directly, but its format is not quite user-friendly and is subject to change. It is a good idea not to rely on the format of cmds.txt file and use the interface module instead:

from clade.cmds import get_all_cmds
cmds = get_all_cmds("cmds.txt")

where cmds is a list of dictionaries representing each intercepted command. For example, dictionary that represents gcc command from the above makefile looks like this:

    "command": [
    "cwd": "/work/simple_make",
    "id": "3",
    "pid": "2",
    "which": "/usr/bin/gcc"


  • command - is intercepted command itself;
  • cwd - is a path to the directory where the command was executed;
  • id - is a unique identifier assigned to the command;
  • pid - is an identifier of the parent command (command that executed the current one - in our example it is an identifier of the make command);
  • which - path to an executable file that was executed as a result of this command.

It should be noted that all other functionality available in Clade use cmds.txt file as input. Due to this you do not need to rebuild your project every time you want to use it - you can just use previously generated cmds.txt file.

Parsing of intercepted commands

Once build commands are intercepted they can be parsed to search for input and output files, and options. Currently there are extensions in Clade for parsing following commands:

  • C compilation commands (cc, gcc, clang, various cross compilers);
  • linker commands (ld);
  • assembler commands (as);
  • archive commands (ar);
  • move commands (mv);
  • object copy commands (objcopy, Linux only).

These extensions can be executed from command line through clade-cc, clade-ld, clade-as, clade-ar, clade-mv, clade-objcopy commands respectively. They all have similar input interface and the format of output files, so let’s just look at clade-cc command. It can be executed as follows:

$ clade-cc cmds.txt

As a result, a working directory named clade will be created:

├── CC/
│   ├── cmds.json
│   ├── cmds/
│   ├── deps/
│   ├── opts/
│   └── unparsed/
├── PidGraph/
└── Storage/

Top-level directories are in turn working directories of corresponding extensions that were executed inside clade-cc command. CC extension is the one we wanted to execute, but there are also other extensions - PidGraph and Storage - that were executed implicitly by CC because it depends on the results of their work. Let’s skip them for now.

Inside CC directory there is a bunch of other directories and cmds.json file with parsed compilation commands. Again, it is a list of dictionaries representing each parsed command. Let’s look at the parsed command from the above example:


Its structure is quite simple: there is a list of input files, a list of output files, a list of options, and some other info that is self-explanatory.

CC extension also identify dependencies of the main source file for each compilation command. Dependencies are the names of all included header files, even ones included indirectly. Clade stores them inside deps subfolder. For example, dependencies of the parsed command with id=”3” can be found in deps/3.json file:


Besides dependencies, all other parsed commands (ld, mv, and so on) will also look this way: as a list of dictionaries representing each parsed command, with “command”, “id”, “in”, “opts” and “out” fields.

CC extension (and all others, of course) can also be imported and used as a Python module:

from import CC

# Initialize extension with a path to the working directory
c = CC(work_dir="clade")

# Execute parsing of intercepted commands
# This step can be skipped if commands are already parsed
# and stored in the working directory

# Get a list of all parsed commands
parsed_cmds = c.load_all_cmds()
for cmd in parsed_cmds:
    # Get a list of dependencies
    deps = c.load_deps_by_id(cmd["id"])

Pid graph

Each intercepted command, except for the first one, is executed by another, parent command. For example, gcc internally executes cc1 and as commands, so gcc is their parent. Clade knows about this connection and tracks it by assigning to each intercepted command two attributes: a unique identifier (id) and identifier of its parent (pid). This information is stored in the pid graph and can be obtained using clade-pid-graph command line tool:

$ clade-pid-graph cmds.txt
$ tree clade -L 2

└── PidGraph
    ├── pid_by_id.json
    └── pid_graph.json

Two files will be generated. First one - pid_by_id.json - is a simple mapping from ids to their pids and looks like this:

    "1": "0",
    "2": "1",
    "3": "2",
    "4": "2",
    "5": "1"

Another one - pid_graph.json - stores information about all parent commands for a given id:

    "1": ["0"],
    "2": ["1", "0"],
    "3": ["2", "1", "0"],
    "4": ["2", "1", "0"],
    "5": ["1", "0"]

Pid graph can be imported and used as a Python module:

from clade.extensions.pid_graph import PidGraph

# Initialize extension with a path to the working directory
c = PidGraph(work_dir="clade")

# Execute parsing of intercepted commands
# This step can be skipped if commands are already parsed
# and stored in the working directory

# Get all information
pid_by_id = c.load_pid_by_id()
pid_graph = c.load_pid_graph()

Other extensions use pid graph to filter duplicate commands. For example, on macOS executing “gcc main.c” command leads to the chain of execution of the following commands:

  • /usr/bin/gcc main.c
  • /Library/Developer/CommandLineTools/usr/bin/gcc main.c
  • /usr/bin/xcrun clang main.c
  • /Library/Developer/CommandLineTools/usr/bin/clang main.c
  • /Library/Developer/CommandLineTools/usr/bin/clang -cc1 …

So, for a single compilation command, several commands will be actually intercepted. You probably need only one of them (the very first one), so Clade filter all duplicate ones using pid graph: Clade simply do not parse all child commands of already parsed command. This behavior is of course configurable and can be disabled.

Pid graph can be visualized with Graphviz using one of the configuration options:

An example of the pid graph

Note: pid graph can be used with any project (not only with ones written in C).

Command graph

Clade can connect commands by their input and output files. This information is stored in the command graph and can be obtained using clade-cmd-graph command line tool.

To appear in the command graph an intercepted command needs to be parsed to search for input and output files. By default only commands parsed by CC, LD and MV extensions are parsed and appeared in the command graph. This behavior can be changed via configuration, which will be described below.

Let’s consider the following makefile:

    gcc -S main.c -o main.s  # id = 1
    as main.s -o main.o      # id = 2
    mv main.o main           # id = 3

Using clade-cmd-graph these commands can be connected:

$ clade-pid-graph cmds.txt

├── CmdGraph/
│   └── cmd_graph.json
├── CC/
├── LD/
├── MV/
├── PidGraph/
└── Storage/

where cmd_graph.json looks like this (commands are represented by their identifiers and the type of extensions that parsed it):

        "type": "CC",
        "used_by": ["2", "3"],
        "using": []
        "type": "AS",
        "used_by": ["3"],
        "using": ["1"]
        "type": "MV",
        "used_by": [],
        "using": ["1", "2"]

Command graph can be imported and used as a Python module:

from clade.extensions.cmd_graph import CmdGraph

# Initialize extension with a path to the working directory
c = CmdGraph(work_dir="clade")

# Execute parsing of intercepted commands
# This step can be skipped if commands are already parsed
# and stored in the working directory

# Get the command graph
cmd_graph = c.load_cmd_graph()

Command graph can be visualized with Graphviz using one of the configuration options:

An example of the command graph

Source graph

For a given source file Clade can show in which commands this file is compiled, and in which commands it is indirectly used. This information is called source graph and can be generated using clade-src-graph command line utility:

$ clade-src-graph cmds.txt

├── SrcGraph/
│   └── src_graph.json
├── CmdGraph/
├── CC/
├── LD/
├── MV/
├── PidGraph/
└── Storage/

Source graph for the Makefile presented in the command graph section above will be located in the src_graph.json file and look like this:

    "/usr/include/stdio.h": {
        "compiled_in": ["1"],
        "loc": 414,
        "used_by": ["2", "3"]
        "compiled_in": ["1"],
        "loc": 5,
        "used_by": ["2", "3"],
        "compiled_in": ["2"],
        "loc": 20,
        "used_by": ["3"],

For simplicity information about other files has been removed from the presented source graph. As always, commands are represented through their unique identifiers. loc field contains information about the size of the source file: number of the lines of code.

Source graph can be imported and used as a Python module:

from clade.extensions.src_graph import SrcGraph

# Initialize extension with a path to the working directory
c = SrcGraph(work_dir="clade")

# Execute parsing of intercepted commands
# This step can be skipped if commands are already parsed
# and stored in the working directory

# Get the source graph
src_graph = c.load_src_graph()

Call graph

Clade can generate function call graph for a given project written in C. This requires CIF installed on your computer, and path to its bin directory added to the PATH environment variable.

Call graph can be generated through command line utility clade-callgraph:

$ clade-callgraph cmds.txt

├── Callgraph/
│   ├── callgraph/
│   ├── callgraph.json
│   ├── calls_by_ptr.json
│   ├── used_in.json
│   └── err.log
├── CC/
├── LD/
├── MV/
├── PidGraph/
├── Info/
├── Functions/
│   ├── functions_by_file/
│   ├── functions_by_file.json
│   └── functions.json
└── Storage/

Call graph itself is stored inside callgraph.json file and can be rather large. Let’s look at a small part of the call graph generated for the Linux kernel:

    "drivers/net/usb/asix_common.c": {
        "asix_get_phy_addr": {
            "called_in": {
                "drivers/net/usb/asix_devices.c": {
                    "ax88172_bind": {
                        "242": {"match_type" : 1}
                    "ax88178_bind": {
                        "809": {"match_type" : 1}
            "calls": {
                "drivers/net/usb/asix_common.c": {
                    "asix_read_phy_addr": {
                        "235": {"match_type" : 5}
            "type": "global"

There is “drivers/net/usb/asix_common.c” file with definition of the “asix_get_phy_addr” function. This function is called in the “drivers/net/usb/asix_devices.c” file by “ax88172_bind” function on line “242” and by “ax88178_bind” function on line “809”. “match_type” is an internal information needed for debug purposes. Also this function calls “asix_read_phy_addr” file from the “drivers/net/usb/asix_common.c” file on the line “235”.

All functions that call “asix_get_phy_addr” function or are called by it are also present in the call graph, but were excluded from the above example.

Callgraph extension uses “Function” extension to get information about function definitions and declarations. They are stored in the functions.json file:

    "asix_get_phy_addr": {
        "drivers/net/usb/asix_common.c": {
            "declarations": {
                "drivers/net/usb/asix.h": {
                    "line": "204",
                    "signature": "int asix_get_phy_addr(struct usbnet *);",
                    "type": "global"
            "line": "232",
            "signature": "int asix_get_phy_addr(struct usbnet *dev);",
            "type": "global"

For each function definition there is information about corresponding declaration, line numbers in which the definition and declaration are located, function signature and type (global or static).

Callgraph and Functions can be imported and used as Python modules:

from clade.extensions.callgraph import Callgraph
from clade.extensions.functions import Functions

# Initialize extension with a path to the working directory
c = Callgraph(work_dir="clade")

# Execute parsing of intercepted commands
# This step can be skipped if commands are already parsed
# and stored in the working directory

# Get the call graph
callgraph = c.load_callgraph()

# Usage looks quite ugly, yes
# This will be improved
for file in callgraph:
    for func in callgraph[file]:
        for caller_file in callgraph[file][func]["called_in"]:
            for caller_func in callgraph[file][func]["called_in"][caller_file]:
                for call_line in callgraph[file][func]["called_in"][caller_file][caller_func]:

        for called_file in callgraph[file][func]["calls"]:
            for called_func in callgraph[file][func]["calls"][called_file]:
                for call_line in callgraph[file][func]["calls"][called_file][called_func]:

f = Functions(work_dir="clade")
functions = f.load_functions()
# The usage is quite similar, so it is omitted

Compilation database

Command line tool for generating compilation database has a different interface, compared to most other command line tools available in Clade. In that regard it’s more like clade-intercept command. Compilation database can be generated using clade command:

$ clade make

where make should be replaced by your project build command. As a result your project will be build and the compile_commands.json file will be created in the current directory.

If you have cmds.txt file you can skip the build process and get compile_comands.json much faster:

$ clade -c cmds.txt

Other options are available through –help option.

Compilation database can be imported and used as a Python module:

from clade.intercept import Interceptor
from clade.extensions.cdb import CDB

# Initialize extension with a path to the working directory
c = CDB(work_dir="clade")

# Intercept build commands
# This step can be skipped if build commands are already intercepted
cmds_txt = "cmds.txt"
i = Interceptor(command=["make"], output=cmds_txt)

# Generate compilation database
# This step can be skipped if compilation database is already generated
# and stored in the working directory

# Get generated compilation database
compilation_database = c.load_cdb()


There is a bunch of options that can be changed to alter the behaviour of various tools available in Clade. If you execute these tools from the command line (tools like clade-cc, clade-callgraph, clade-cmd-graph, and so on), then the configuration can be passed via the “-c” option like this:

$ clade-cc -c conf.json cmds.txt

where conf.json is a json file with some configuration options:

    "PidGraph.as_picture": true,
    "CmdGraph.requires": [
    "CC.which_list": ["/usr.bin.gcc", "^.*clang$"]

The configuration can be also passed as a Python dictionary:

from import CC

conf = {"PidGraph.as_picture": True}
c = CC(work_dir="clade", conf=conf)

which list

Let’s highlight some notable configuration options and let’s start with options for extensions that parse intercepted commands to search for input and output files, and options. These extensions need to know which commands to parse. They have a list of predefined regular expressions that they try to match with the which field of an intercepted command. For example, CC extension have the following list:

which_list = [

Obviously, execution of /usr/bin/gcc will be matched, as well as /usr/bin/clade, or /usr/local/bin/powerpc-elf-gcc-7, so all such commands will be treated as compilation commands and parsed accordingly. Sometimes this list is not enough, so there is an option to change it:

"CC.which_list": ["regexp_to_match_your_compiler"]

Options for other such extensions look the same, you just need to replace CC by the name of the extension, so, for example, “LD.which_list” will be the option to change the list of regexes for LD extension.

Visualization options

Currently there are two small options to visualize pid graph and cmd graph using Graphviz:

    "PidGraph.as_picture": true,
    "CmdGraph.as_picture": true

If they are set, then next to pid_graph.json and cmd_graph.json files respectively pdf files containing Graphviz output will appear.

List of commands to parse

If you want to generate command graph, or source graph, or call graph, then you need to specify which commands to parse via “CmdGraph.requires” option. If you want to parse all commands that are supported now, then the value of this option will be:

    "CmdGraph.requires": ["CC", "LD", "MV", "AR", "Objcopy"]


There is predefined set of options for the following projects that can be used in addition to user-defined configuration:

  • Linux kernel (preset linux_kernel)
  • Busybox (presets busybox_linux, busybox_macos)
  • Apache (presets apache_linux, apache_macos)

If you want to execute Clade on one of these projects then it might be a good idea to use this presets, since they will definitely save you from having to deal with various problems and mess with the configuration:

$ clade-cc -p linux_kernel cmds.txt


from import CC

c = CC(work_dir="clade", preset="linux_kernel")


File with intercepted commands is empty

Access control mechanisms on different operating systems might disable library injection that is used by Clade to intercept build commands:

  • SELinux on Fedora, CentOS, RHEL;
  • System Integrity Protection on macOS;
  • Mandatory Integrity Control on Windows (disables similar mechanisms)

A solution is to use fallback intercepting mechanism that is based on wrappers.

File with intercepted commands is not complete

Sometimes some commands are intercepted, so file cmds.txt is present and not empty, but other commands are clearly missing. Such behaviour should be reported so the issue can be fixed, but until then you can try to use fallback intercepting mechanism that is based on wrappers.

Wrong ELF class

Build command intercepting may result in the following error:

ERROR: object '' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.

It is because your project leverages multilib capabilities, but libinterceptor library that is used to intercept build commands is compiled without multilib support. You need to install gcc-multilib (Ubuntu) or gcc-32bit (openSUSE) package and reinstall Clade. libinterceptor library will be recompiled and your issue will be fixed.

Not all intercepted compilation commands are parsed

The reason is because CC extension that parse intercepted commands cannot identify a command as a compilation command. You can help it by specifying “CC.which_list” configuration option, in which you should write a list of regexes that will match your compiler. For example, if path to your compiler is ~/.local/bin/c_compiler, than “CC.which_list” may be set like this:

"CC.which_list": ["^.*?c_compiler$"]

If you want to parse not only commands executed by your compiler, but by system gcc as well, then you can add it to the list too:

"CC.which_list": ["^.*?c_compiler$", ""^.*gcc$"]

How to set configuration option is described in Configuration section of this readme.

Compilation database miss some commands

Same as above.

Command graph is not connected properly

Most certainly it is due to the fact that some type of commands is unparsed. If there is an extension in Clade that can parse them, then you will need to specify it via the option “CmdGraph.requires”:

    "CmdGraph.requires": ["CC", "LD", "MV", "AR", "Objcopy"]

Otherwise such extension should be developed.

Similar problems with the source graph and the call graph can be fixed via the same option, since they use the command graph internally.


Clade is inspired by the Bear project created by László Nagy.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
ldv_clade-2.0.post2-py3.6.egg (164.9 kB) Copy SHA256 hash SHA256 Egg 3.6 Oct 24, 2018
ldv-clade-2.0.post2.tar.gz (977.0 kB) Copy SHA256 hash SHA256 Source None Oct 24, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page