Skip to main content

A tool to recover a fully analyzable .ELF from a raw kernel, through extracting the kernel symbol table (kallsyms)

Project description

vmlinux-to-elf

This tool allows to obtain a fully analyzable .ELF file from a vmlinux/vmlinuz/bzImage/zImage kernel image (either a raw binary blob or a preexisting but stripped .ELF file), with recovered function and variable symbols.

Landing illustration

For this, it scans your kernel for a kernel symbol table (kallsyms), a compressed symbol table that is present in almost every kernel, mostly unaltered.

Because the concerned symbol table is originally compressed, it should recover strings that aren't visible in the original binary.

It produces an .ELF file that you can analyze using IDA Pro and Ghidra. This tool is hence useful for embedded systems reverse engineering.

Get it from the Snap Store   Get it on Flathub

Usage:

# Command line:
vmlinux-to-elf <input_kernel.bin> <output_kernel.elf>

# Command line, list symbol addresses only:
kallsyms-finder <input_kernel.bin> # If installed with uv
vmlinux-to-elf.kallsyms-finder # If installed with snap

# Command line, just decompress the kernel:
vmlinuz-decompressor <input_kernel.bin> <output_kernel.bin> # If installed with uv
vmlinux-to-elf.vmlinuz-decompressor # If installed with snap

# Graphical:
vmlinux-to-elf-gui # If installed with uv
vmlinux-to-elf.gui # If installed with snap
flatpak run re.fossplant.vmlinux-to-elf # If installed with flatpak

Application main screen Application kernel offsets view

Installation:

# Install CLI+GUI with Snap (recommended on Ubuntu)
sudo snap install vmlinux-to-elf

# Install CLI+GUI with yay (recommended on Arch, Manjaro)
yay -S vmlinux-to-elf

# Install CLI+GUI with uv (example with Fedora)
sudo dnf install -y uv glib2-devel libadwaita-devel gtk4-devel
uv tool install vmlinux-to-elf[gui]
vmlinux-to-elf-gui --install-metadata # Install .desktop file

# Install CLI with uv and GUI with Flatpak (recommended on
# distributions with libadwaita < 1.6)
sudo dnf install -y uv flatpak
uv tool install vmlinux-to-elf

flatpak remote-add --user --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo
flatpak install re.fossplant.vmlinux-to-elf

Local development environment setup:

sudo snap install --classic astral-uv
sudo apt install git
git clone git@github.com:marin-m/vmlinux-to-elf.git

# Dependencies for the GTK-4 GUI
sudo apt install libgirepository-2.0-dev libgtk-4-dev libadwaita-1-dev \
    gir1.2-adw-1 gir1.2-gtk-4.0 python3-dev glib-compile-resources

cd vmlinux-to-elf
# Download Python modules and initialize virtualenv (creates ".venv",
# call "source .venv/bin/activate" to set up)
uv sync --extra gui
# Add vmlinux-to-elf to $PATH, so that the commands are callable
# system-wide (creates a symlink to the source in "~/.local/bin")
uv tool install -e .[gui]

Features

  • Take a raw binary blob or ELF kernel file as an input [OK]
  • Automatically detect and unpack the main compression formats used for the Linux kernel [OK]
  • Find and extract the embedded kernel symbols table (kallsyms) from the input file [OK]
  • Infer the instruction set architecture, endianness, bit size, relying upon other things on common function prologue signatures [OK]
  • Infer the entry point of the kernel from the symbols contained in the kallsyms table [OK]
  • Provide basic inference for the kernel base address [OK] (for now, consider that it is the first "TEXT" symbol address of the binary with the lower 0xfff bits clear - seems to work well enough)
  • Unpack certain types of Android boot.img files, starting with an ANDROID! or UNCOMPRESSED_IMG magic [OK]
  • Produce an .ELF file fully analyzable with IDA Pro or Ghidra as an output [OK]

How does it work, really?

A brief history of the "kallsyms" symbol table can be found at the top of the "kallsyms.py" file. Briefly, this was introduced circa 2004 in the Linux kernel in its current form and is used to print the "Kernel oops" messages, among other things.

It contains tuples of "symbol name", "symbol address", "symbol type" (symbol types being designated with a single letter in a fashion similar to the nm utility), this information being tightly packed with a simple compression algorithm.

The schema below displays how this information is serialized into the kernel, the offset of each respective structure being detected by vmlinux-to-elf through heuristics:

Array name Description Sample contents
kallsyms_addresses (or kallsyms_offsets + kallsyms_relative_base) The addresses (or offsets relative to a base, in recent kernels) of each symbol, as an array 80 82 00 C0 80 82 00 C0 80 82 00 C0 0C 84 00 C0 B4 84 00 C0 5C 85 00 C0 60 85 00 C0 60 85 00 C0 ...
kallsyms_num_syms The total number of symbols, as an integer (useful for checking for endianness, alignment, correct decoding of the symbols table) 54 D4 00 00
kallsyms_names The compressed, length-separated symbol names themselves. Each byte in the compressed symbol strings references an index in the "kallsyms_token_index" array, that itself references the offset of a character or string fragment in the "kallsyms_token_table" array. 09 54 64 6F 5F E1 F1 66 F5 25 05 54 F3 74 AB 74 0E 54 FF AB ...
kallsyms_markers A lookup table serving to find quickly the approximative offset of a compressed symbol name in "kallsyms_names": every 256 symbols, an offset to the concerned symbol in "kallsyms_names" is added as a long to this table. 00 00 00 00 03 0C 00 00 0C 18 00 00 1B 24 00 00 0F 31 00 00 DA 3D 00 00 CF 4A 00 00 ...
kallsyms_seqs_of_names This lookup table (present in 6.2+ kernels only) contains an array sequence of packed 3-byte integers, where array indexes match the alphanumeric order for a given symbol name, and array values match the corresponding entry indexes in the kallsyms_addresses and kallsyms_names arrays
kallsyms_token_table Null-terminated string fragments or characters that may be contained in kernel symbol names. This can contain at most 256 string fragments or characters. Indexes corresponding to ASCII code points which are actually used in any kernel symbol will correspond to the concerned ASCII character, other positions will contain a statistically chosen string fragment. This tool tries to heuristically find this array across the passed file first in order to find the kallsyms symbols table. 73 69 00 67 70 00 74 74 00 79 6E 00 69 6E 74 5F 00 66 72 00 ...
kallsyms_token_index 256 words, each mapping to the offsets of the characters or string fragments designated by their respective indexes in "kallsyms_token_table". 00 00 03 00 06 00 09 00 0C 00 11 00 14 00 1B 00 1E 00 22 00 2C 00 30 00 35 00 38 00 ...

These fields have variable alignment and field size. The field sizes may vary over architecture and kernel version too. For this reason, vmlinux-to-elf has been tested over a variety of cases.

OpenWRT since 2013 has a patch that removes compression over the kallsyms table by default (when building kallsyms has been enabled by the user). They do this in order to save space when re-compressing over the kernel using LZMA.

This means that the kallsyms_token_table and kallsyms_token_address entries disappear, and that the symbol names use plain text ASCII instead. This case is supported too.

In standard Linux 6.2 kernels, kallsyms arrays are encoded in the following order:

  1. kallsyms_addresses (or kallsyms_offsets + kallsyms_relative_base)
  2. kallsyms_num_syms
  3. kallsyms_names
  4. kallsyms_markers
  5. kallsyms_seqs_of_names (6.2+ only)
  6. kallsyms_token_table
  7. kallsyms_token_index

For Linux 6.4+ kernels, this layout is changed to:

  1. kallsyms_num_syms
  2. kallsyms_names
  3. kallsyms_markers
  4. kallsyms_token_table
  5. kallsyms_token_index
  6. kallsyms_addresses (or kallsyms_offsets + kallsyms_relative_base)
  7. kallsyms_seqs_of_names

While these are parsed in the following order by vmlinux-to-elf's parsing algorithm:

  1. kallsyms_token_table (before-last structure)
  2. kallsyms_token_index (last structure, forwards)
  3. kallsyms_markers (backwards)
  4. kallsyms_names (backwards again)
  5. kallsyms_num_syms (backwards again)
  6. kallsyms_addresses (or kallsyms_offsets + kallsyms_relative_base) (backwards again)

Kernels support

It should support kernels from version 2.6.10 (December 2004), until the current 6.4 (as of August 2023). Only kernels explicitly configured without CONFIG_KALLSYMS should not be supported. If this kernel configuration variable was not set at build, then you will get: KallsymsNotFoundException: No embedded symbol table found in this kernel.

For raw kernels, the following architectures can be detected (using magics from binwalk): MIPSEL, MIPSEB, ARMEL, ARMEB, PowerPC, SPARC, x86, x86-64, ARM64, MIPS64, SuperH, ARC.

The following kernel compression formats can be automatically detected: XZ, LZMA, GZip, BZ2, LZ4, LZO and Zstd.

Advanced usage

You can also obtain a text-only output of the kernel's symbol names, addresses and types through using the kallsyms-finder utility, also bundled with this tool. The format of its output will be similar to the /proc/kallsyms procfs file.

Some parameters that should be automatically inferred by the tool (such as the instruction set or base address) may be overriden in case of issue. The full specification of the arguments allowing to do that is presented below:

$ vmlinux-to-elf -h
usage: vmlinux-to-elf [-h] [--e-machine DECIMAL_NUMBER] [--bit-size BIT_SIZE]
                      [--file-offset HEX_NUMBER] [--base-address HEX_NUMBER]
                      [--bss-size BSS_SIZE] [--use-absolute]
                      input_file output_file

Turn a raw or compressed kernel binary, or a kernel ELF without symbols, into a fully analyzable ELF whose symbols were extracted from the kernel symbol table

positional arguments:
  input_file            Path to the vmlinux/vmlinuz/zImage/bzImage/kernel.bin/kernel.elf file to make into an analyzable .ELF
  output_file           Path to the analyzable .ELF to output

options:
  -h, --help            show this help message and exit
  --e-machine DECIMAL_NUMBER
                        Force overriding the output ELF "e_machine" field with this integer value (rather than auto-detect)
  --bit-size BIT_SIZE   Force overriding the input kernel bit size, providing 32 or 64 bit (rather than auto-detect)
  --file-offset HEX_NUMBER
                        Consider that the raw kernel starts at this offset of the provided raw file or compressed stream (rather than 0, or the beginning of the ELF
                        sections if an ELF header was present in the input)
  --base-address HEX_NUMBER
                        Force overriding the output ELF base address field with this integer value (rather than auto-detect)
  --bss-size BSS_SIZE   Size in megabytes of the .bss section in the binary
  --use-absolute        Assume kallsyms offsets are absolute addresses

How is the source code organized?

Source code chart

Bug fixes, improvements, etc.

Don't hesitate to open an issue for any suggestion of improvement.

Please privilege the current Github repository issues and pull requests in priority for reporting bugs, asking questions, etc.

Alternatively, you can use this matrix channel if needing directly contact with the author of the project, but please reserve this as a secondary channel e.g for sending kernel samples, what goes here otherwise is more likely to be lost.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vmlinux_to_elf-1.2.2.post2.tar.gz (6.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vmlinux_to_elf-1.2.2.post2-py3-none-any.whl (6.7 MB view details)

Uploaded Python 3

File details

Details for the file vmlinux_to_elf-1.2.2.post2.tar.gz.

File metadata

  • Download URL: vmlinux_to_elf-1.2.2.post2.tar.gz
  • Upload date:
  • Size: 6.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vmlinux_to_elf-1.2.2.post2.tar.gz
Algorithm Hash digest
SHA256 99d57303cd16b4224ace7a75304b6461d53cd04abd081a6acb4e0a9dc2914ce5
MD5 be9c13785d54bcb3cefd1b39fc0d1346
BLAKE2b-256 405445f44f17bc2f61f55929ef6e5266cb81756bb7b9b458db8568a5390d7a29

See more details on using hashes here.

Provenance

The following attestation bundles were made for vmlinux_to_elf-1.2.2.post2.tar.gz:

Publisher: python-publish.yml on marin-m/vmlinux-to-elf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vmlinux_to_elf-1.2.2.post2-py3-none-any.whl.

File metadata

File hashes

Hashes for vmlinux_to_elf-1.2.2.post2-py3-none-any.whl
Algorithm Hash digest
SHA256 018eaa369da5cd4a55d5d1a08138aa55bd77afb4dc2b585b71287a2dca640de4
MD5 b8c190a1f22230bb06033f2145231f9f
BLAKE2b-256 db2f506f329492b2e3f33ac332879ffcdeed69aab2a5e9ac759db8cfd088354e

See more details on using hashes here.

Provenance

The following attestation bundles were made for vmlinux_to_elf-1.2.2.post2-py3-none-any.whl:

Publisher: python-publish.yml on marin-m/vmlinux-to-elf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page