Skip to main content

Non-intrusive bazel compile_commands.json extractor

Project description

Yacce is a non-intrusive compile_commands.json extractor for Bazel (experimental, local compilation, Linux only)

Yacce extracts compile_commands.json and build system insights from a build system by supervising the local compilation process with strace. Yacce primarily supports Bazel (other build systems might be added later).

Motivation

Only open-source history of Bazel development spans for over a decade, and yet - it has a ton of C++ specific features, while one of very important ones, - generation of compile_commands.json, - is still not there. There situation is so ridiculous that even G's own commands had to invent and support their own "wheels" to make compile_commands for their projects (sample refs: 1, 2).

But there already exist several decent generic compile_commands.json extractors, external to Bazel, with hedronvision/bazel-compile-commands-extractor being the most well-known and, probably, respected. Why bother?

There are several reasons:

  • their usability is horrible, - extractors I've seen (I don't claim I saw all of them in existence!) requires one to make a certain nontrivial modification of the build system and specifically list there what targets and how exactly are going to be compiled just to spew the damn compile_commands!
    • what if I'm supporting a complex project spanning across multiple code bases, that don't employ such extractor, and I have to work on many code branches across many different remote machines? I'd have to first extract potentially branch specific build targets, and then manually inject extractor's code into the build system. Do this a few times a week, and you'll start to genuinely dislike Bazel (if you don't yet).
    • why it can't be made as simple as, for example, in CMake with its -DCMAKE_EXPORT_COMPILE_COMMANDS=1 ?
  • completely orthogonal to usability there is an InfoSec consideration: what if I don't want to add a 3rd party, potentially compromisable dependency, into my project? I have no idea what it does internally there and what could it inject into my binaries under the hood. Why does an extractor have to be intrusive?

Benefits of yacce

Supervising a build system doing compilation with a standard system tool have several great benefits:

  • Yacce is super user-friendly and simple to use. It's basically a drop-in prefix for a shell command you could use to build the project, be it bazel build ..., bazel run ..., or even MY_ENV_VAR="value" ./build/compile.sh arg1 .. argn. Just prepend your build command with yacce -- and hit enter.
  • strace lets yacce see real compiler invocations, hence compile_commands.json made from strace log reflects the way you build the project precisely, with all the custom configuration details you might have used, and independently of what the build system lets you to know and not know about that.
  • Compilation of all external dependencies as well as linking commands, are automatically included (with a microsecond timing resolution, if needed).
  • There are just no InfoSec risks by design (of course, beyond running a code of yacce itself, though it's rather small and is easy to verify). Yacce is completely external to the build system and doesn't interfere with it in any way.

Limitations

However, the supervising approach have some intrinsic limitations, which make it not suitable for all use-cases supported by Bazel:

  • strace needs to be installed (apt install strace), which limits yacce to basically Linux only.
  • compilation could only happen locally, on the very same machine, on which yacce runs. This leaves out a Bazel RBE, and requires building the project from an empty cache, if the cache is used.
  • while yacce doesn't care how you launch the build system and lets you use any script or a command you like, eventually, it should build only one Bazel workspace. Yacce does not check if this limitation is respected by a user, though typically, it's easy to fulfil.

If this is a hard no-go for you, suffer with consider other extractor, such as the above mentioned hedronvision's tool.

There are some "soft" limitations that might be removed in the future, such as:

  1. currently yacce does not support incremental builds (i.e. you'd have to fully recompile the project to update compile_commands.json). The fix for that is simple and just a matter of implementation.
  2. It looks like strace sometimes might produce...misformed logs. I always get what I expect on Debian 12-13, but I had to implement a special handling for unexpected line-breaks it sometimes produces on Ubuntu 22.04. I can't guarantee that there are no other quirks that could break log parsing.
  3. Bazel is monstrous. While yacce works nicely with some code bases, there might be edge cases, that aren't properly handled.
  4. One can't just take all the compiler invocations a build system does and simply dump them to a compile_command.json. A certain filtering is mandatory, and that requires parsing compiler's arguments:
    • gcc- and clang- compatible compilers are the only supported.
    • 100% correct compiler's argument parsing requires implementing 100% of compiler's own CLI parser, which is not done and will never be done. Yacce's parser is good enough for many uses, but certainly not for all. Yacce could diagnose some edge cases and warn of potentially incorrect results, but, again, - certainly not all edge cases are covered by the diagnostics.

You're unlikely to hit the last two. However, if you will, you know what to do (please file a bug report, or better submit a PR).

Give yacce a try with pip install yacce! Prepend the build command with yacce -- and let me know how it goes!

Examples of extracting compile_commands from Bazel

First, install yacce with pip install yacce. Python 3.10+ is supported.

Second, ensure you have strace installed with sudo apt install strace. Some distributions have it installed by default.

1. Extracting compile_commands.json for JAX (jaxlib wheel)

JAX is one of Google's machine learning frameworks. It has interface code written in Python, while most high performance code is in C++ seen with Python bindings. A compiled part is called jaxlib and is responsible beyond some general JAX parts for a CPU-based execution backend. We'll be using current latest JAX v0.7.2 here.

Compiling jaxlib is a good first example for yacce, because it has quite a large code base with at least one dependency, XLA (a machine learning compiler), that is almost always being worked upon in parallel with the jaxlib itself. By default, JAX's build system will fetch XLA from a pinned commit, but since we emulate a real developer work here, we'll also checkout that pinned commit to a local directory, so we could work on it, and then tell JAX's build system to use that local directory instead of the pinned commit. Yacce will automatically generate a single compile_commands.json for both jaxlib and XLA.

First, let's setup the workspace:

mkdir /src_jax && cd /src_jax # the dir for both JAX and XLA sources
( git clone https://github.com/openxla/xla && cd ./xla \
  && git checkout 0fccb8a6037019b20af2e502ba4b8f5e0f98c8f6 )
git clone --branch jax-v0.7.2 --depth 1 https://github.com/jax-ml/jax

Now we have /src_jax/jax directory having v0.7.2 JAX commit checkout, and /src_jax/xla having the same XLA commit, that's designed for JAX v0.7.2. Time to build!

Without yacce, we'd use the following command inside ./jax directory:

python3 ./build/build.py build --wheels=jaxlib --verbose --use_clang false \
  --target_cpu_features=native --bazel_options=--override_repository=xla=../xla

With yacce, we just prepend the command with yacce -- like this

cd ./jax # since we didn't change the dir yet
yacce -- python3 ./build/build.py build --wheels=jaxlib --verbose --use_clang false \
  --target_cpu_features=native --bazel_options=--override_repository=xla=../xla

At the start, yacce will test if strace and bazel are available, and then it will ask your permission to execute bazel clean command. Starting with a clean state is mandatory for yacce to capture all compilation commands, but since cleaning and rebuilding from scratch might be expensive, yacce tries to prevent accidental harm by asking a permission. You can authorize it to do that from command line if you invoke yacce with a --clean always argument like this: yacce --clean always -- .

After doing bazel clean, yacce will setup strace supervision over Bazel's server execution, and then will launch the build script. When the build finishes, yacce will start strace log processing and in few seconds it'll write /src_jax/jax/compile_command.json containing all C++ source files used for jaxlib and for parts of XLA, that were required by jaxlib.

Now fire up your IDE and point clangd to that file, so it starts indexing it. In VSCode with clangd extension installed, if /src_jax is the main opened directory (workspace), then one could open Settings / Extensions / clangd, and click "Add Item" for clangd.arguments settings, putting --compile-commands-dir=${workspaceFolder}/jax there and then do ctrl+shift+p, "clangd.restart".

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yacce-0.9.7.tar.gz (37.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yacce-0.9.7-py3-none-any.whl (35.0 kB view details)

Uploaded Python 3

File details

Details for the file yacce-0.9.7.tar.gz.

File metadata

  • Download URL: yacce-0.9.7.tar.gz
  • Upload date:
  • Size: 37.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for yacce-0.9.7.tar.gz
Algorithm Hash digest
SHA256 c7fcac45f12d71b9e1fc8946806a9323c4cf41711413f5b3ed14ac0cdf88d2bb
MD5 47af3c997a2dce97ad2a4c7026d923d7
BLAKE2b-256 67d0f795d3887045cbeb606bf82440703e7146e4e01c930fad4177be4ad69d77

See more details on using hashes here.

File details

Details for the file yacce-0.9.7-py3-none-any.whl.

File metadata

  • Download URL: yacce-0.9.7-py3-none-any.whl
  • Upload date:
  • Size: 35.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for yacce-0.9.7-py3-none-any.whl
Algorithm Hash digest
SHA256 2ee54e265b126954fa54b26b2741d0c006ac34f22c170c8b84a29d63801372f8
MD5 d69e5e0662fecec02b42d37d67b7996b
BLAKE2b-256 afac9deae6bb42e38c3ffb1959d967a1d0f5a5012bfc9ca180b123f1db92fcc0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page