Recursive extraction and parsing of firmware and partition images
Project description
REAP
Recursive Extraction And Parsing — a general-purpose CLI tool for identifying and recursively extracting firmware and partition images. Works with raw eMMC/flash dumps, individual partition images, full-disk images (GPT or Rockchip PARM), and forensic disk images from a wide range of embedded Linux and Android devices.
Pure Python. No root, no FUSE, no mounting, no Linux kernel modules. Runs on macOS, Linux, and Windows.
What it does
Point it at a directory of partition .bin files, a single image, or a set of 7z archives and it will:
- Identify each image's format via magic bytes (37 format signatures)
- Annotate what partition it is (boot, recovery, system, userdata, etc.) by reading ext4 superblock metadata and analyzing ramdisk contents
- Extract contents recursively -- e.g. a boot image yields a kernel + ramdisk; the ramdisk decompresses to a cpio archive; the cpio extracts to a filesystem tree
- Analyze kernels (version, config, build paths, kallsyms symbol table), bootloaders (U-Boot environment, embedded DTBs), and unknown partitions (forensic hex dump, strings, SHA256)
- Report everything found in human-readable text and machine-readable JSON, including FBE encryption detection
Supported formats
Partition tables and disk layouts
| Format | Detection | Extraction |
|---|---|---|
| GPT partition table | EFI PART at offset 0x200 or 0x1000 (UFS 4K sectors) |
Individual partition images |
| Rockchip PARM partition table | PARM at offset 0 |
Individual partition images (RK29xx/RK3xxx flash dumps) |
| Android super.img (LP metadata) | 0x67446C70 at offset 0x1000 |
Logical partition images (system, vendor, product, etc.) |
Android boot and kernel
| Format | Detection | Extraction |
|---|---|---|
| Android Boot Image (v0--v4) | ANDROID! magic |
Kernel, ramdisk, second-stage, recovery DTBO, DTB |
| ARM zImage | 0x016F2818 at offset 0x24 |
Decompressed vmlinux, kernel config, version string, source paths, kallsyms, all strings |
| ARM64 Image | ARM\x64 at offset 0x38 |
Kernel config, version string, source paths, kallsyms, all strings |
| Raw ARM kernel binary | MSR CPSR instruction + Linux version string |
Kernel config, version string, source paths, kallsyms, all strings |
| Device Tree Blob (DTB) | 0xD00DFEED |
Extracted DTB, optional dtc decompile to DTS |
| DTBO container | 0xD7B7AB1E |
Individual DT overlay entries |
Bootloaders and firmware
| Format | Detection | Extraction |
|---|---|---|
| U-Boot uImage | 0x27051956 |
Unwrapped payload (kernel, ramdisk, firmware, device tree, etc.) |
| U-Boot binary | U-Boot <version> string, 64 KB--4 MB |
Default environment, embedded DTBs, strings |
| U-Boot environment | CRC32 + key=value pairs, power-of-2 size | Parsed environment variables |
| Samsung Exynos boot partition | BL1 header pointer + Exynos BL label |
bl1.bin, u-boot.bin, tzsw.bin |
| Rockchip KRNL wrapper | KRNL at offset 0 |
Unwrapped payload (re-identified as gzip, zImage, etc.) |
| ELF binary | \x7fELF magic |
Metadata dump (class, machine, entry point), strings |
| AVB vbmeta | AVB0 / AVBf |
Metadata dump (version, algorithm, rollback index, flags) |
Encrypted firmware containers
| Format | Detection | Extraction |
|---|---|---|
| IM*H firmware container | IM*H at offset 0 or 0x400 |
Header parse (version, module name/type, chunk table, key family). Encrypted chunks (RTOS, kernel, TZOS, DTB, etc.) extracted as raw .bin files. Decryption is out of scope for this tool. |
| Ambarella environment (UNR0) | UNR0 + 0x5AA5 flags |
Boot config, A/B slot status, firmware versions, bootloader logs |
Filesystems
| Format | Detection | Extraction |
|---|---|---|
| ext4 | 0xEF53 at offset 0x438 |
Full filesystem tree with FBE encryption detection |
| FAT12/16/32 | 0xEB/0xE9 + 0x55AA at 510 |
Full filesystem tree (LFN support) |
| exFAT | EXFAT OEM ID at offset 3 |
Identified only (no extraction yet) |
| EROFS | 0xE0F5E1E2 at offset 0x400 |
Identified only (no extraction yet) |
| F2FS | 0xF2F52010 at offset 0x400 |
Identified only (no extraction yet) |
Compression and archives
| Format | Detection | Extraction |
|---|---|---|
| gzip | 1F 8B |
Decompressed content |
| LZ4 frame | 04 22 4D 18 |
Decompressed content |
| LZ4 legacy | 02 21 4C 18 |
Decompressed content (Android ramdisk format) |
| LZMA | 5D 00 00 |
Decompressed content |
| bzip2 | BZh |
Decompressed content |
| XZ | FD 37 7A 58 5A 00 |
Decompressed content |
| cpio newc | 070701 / 070702 |
Files, directories, symlinks (as text files with -> target) |
| 7z archive | 37 7A BC AF 27 1C |
Full decompression (supports split .7z.001 parts) |
| Android sparse image | 0xED26FF3A |
Converted to raw image, then re-identified and extracted |
Device-specific partitions
| Format | Detection | Extraction |
|---|---|---|
| Android devinfo | ANDROID-BOOT! magic |
Lock status, tamper flags |
| Qualcomm modemst (EFS) | IMGEFS marker in first 64 bytes |
Forensic scan (SHA256, strings, hex dump) |
| BMP image | BM + valid DIB header |
Trimmed BMP (strips partition padding) |
| Boot logo container | ASCII count/sizes header + BMP at 0x200 | Individual BMP images |
| Empty / zeroed | All-zero content | Verified-empty marker with likely purpose annotation |
Installation
Requires Python 3.10+ (tested with 3.11).
From PyPI:
pip install reap-cli
The PyPI distribution is
reap-clibecause the barereapname on PyPI is held by an unrelated, long-abandoned 2012 package. We are pursuing a PEP 541 transfer. The installed CLI command isreapregardless.
From source (for development):
git clone https://gitlab.com/blackbox-research/reap
cd reap
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
Dependencies (installed automatically):
ext4-- pure-Python ext4 filesystem reader (no FUSE/mounting)lz4-- LZ4 decompression for Android ramdisks
Third-party plugins that add format handlers or detectors are discovered automatically via the reap.plugins entry point group. See the architecture section below.
Usage
reap <input_path> [options]
input_path can be a single image file, a directory containing partition images, or a set of 7z archives.
Options
| Flag | Description |
|---|---|
-o DIR |
Output directory (default: <input>_unpacked/) |
--identify-only |
Print format identification only, no extraction |
--skip-ext4 |
Skip ext4 filesystem extraction (useful for huge partitions) |
--skip-archives |
Skip 7z archive extraction |
--force-archives |
Force archive extraction even when physicalImage/ already exists |
--no-recursive |
Don't recurse into extracted children |
--max-depth N |
Maximum recursion depth (default: 10) |
-j, --jobs N |
Parallel extraction workers (0=auto, 1=sequential; default: auto) |
-v |
Verbose output (INFO level) |
-vv |
Debug output |
--report text|json|both |
Report format (default: both) |
Examples
Identify all partitions in a dump:
reap ./physicalImage --identify-only
Full extraction (skip large ext4 partitions):
reap ./physicalImage --skip-ext4 -v
Extract a single boot image:
reap boot.img -o ./boot_extracted -v
Extract a directory of 7z archives (split parts supported):
reap ./archives/ -v
Parallel extraction with 4 workers:
reap ./physicalImage -j 4 -v
Output structure
For a boot image, the recursive extraction produces:
boot_unpacked/
kernel_info.txt # Kernel analysis summary
kernel_config.txt # Build-time .config (if IKCONFIG enabled)
kernel_source_paths.txt # Build-time source paths
kernel_strings.txt # All embedded ASCII strings
kallsyms.txt # Kernel symbol table (if present)
vmlinux # Decompressed kernel binary
ramdisk_unpacked/
ramdisk_unpacked/ # cpio filesystem tree
init
init.rc
fstab.*
sbin/
...
For a directory of partitions, you get a subdirectory per partition plus reports:
physicalImage_unpacked/
report.txt # Human-readable report
report.json # Machine-readable report
mmcblk0p1/ # boot image contents
mmcblk0p2/ # DTB contents
mmcblk0p3/ # recovery image contents
mmcblk0p4/ # system filesystem tree
...
Partition annotation
The tool automatically identifies partition roles by:
- Reading the ext4 superblock
s_last_mountedfield (e.g./system,/data,/cache) - Analyzing boot image ramdisks for
/sbin/recoveryto distinguish boot vs recovery - Parsing U-Boot uImage type fields (kernel, ramdisk, firmware, device tree)
- Parsing IM*H firmware module names and types (bootloader, kernel, RTOS)
- Recognizing format-specific roles (DTB, vbmeta, DTBO, sparse, super, modemst)
- Inferring empty partition purpose from size (<=4 MB zeroed = likely misc or metadata)
Annotations appear in reports and verbose output as labels like (recovery), (system), (userdata), etc.
FBE encryption detection
When extracting ext4 filesystems with File-Based Encryption (FBE), the tool:
- Detects the encryption superblock flag and per-inode encryption flags
- Hex-encodes encrypted filenames for safe extraction
- Writes
encrypted_paths.txtlisting all encrypted files and directories - Reports encryption algorithms (AES-256-XTS, AES-256-GCM, etc.) in JSON output
Architecture
reap/
cli.py # Argument parsing, entry point
identify.py # Magic-byte format detection (37 signatures)
annotate.py # Partition role inference
pipeline.py # Recursive extraction orchestrator (parallel workers)
report.py # Text + JSON report generation (FBE-aware)
handlers/
__init__.py # BaseHandler ABC, handler registry
ambarella_env.py # Ambarella UNR0 boot environment
avb.py # AVB vbmeta metadata
bmp.py # BMP image (partition padding trim)
boot_img.py # Android boot image (v0--v4)
bootlogo.py # Boot logo container (multiple BMPs)
compression.py # gzip, LZ4, LZMA, bzip2, XZ
cpio_handler.py # cpio newc archives
devinfo.py # Android devinfo (lock status)
dji_imah.py # IM*H encrypted firmware container (header parse, encrypted chunks)
dtb.py # Device Tree Blob
dtbo.py # DTBO container
elf.py # ELF binary metadata + strings
ext4_handler.py # ext4 filesystem (FBE detection, dir_index fallback)
exynos_boot.py # Samsung Exynos eMMC boot partition
fat.py # FAT12/16/32 filesystem
gpt.py # GPT partition table (512-byte + 4K UFS sectors)
modemst.py # Qualcomm modem EFS partition
raw.py # Empty + unknown fallback (forensic scan)
raw_kernel.py # Raw ARM kernel binary
rk_krnl.py # Rockchip KRNL wrapper
rkparm.py # Rockchip PARM partition table
seven_zip.py # 7z archive (split-part support)
sparse_img.py # Android sparse -> raw conversion
super_img.py # super.img LP metadata
uboot_bin.py # U-Boot binary (environment, embedded DTBs)
uboot_env.py # U-Boot environment block
uimage.py # U-Boot uImage wrapper
zimage.py # ARM zImage / ARM64 Image kernel extraction
_kernel_utils.py # Shared kernel analysis (version, config, kallsyms)
Each handler implements BaseHandler.extract() and returns an ExtractionResult with optional children for recursive processing. Handlers register themselves at import time via register_handler().
The pipeline orchestrator (pipeline.py) drives the flow: identify -> annotate -> dispatch to handler -> recurse into children. Children can be processed in parallel via ThreadPoolExecutor. Adding new formats is straightforward -- write a handler, register it for a Format enum value, and the pipeline picks it up automatically.
Symlink handling
Symlinks found inside ext4 filesystems and cpio archives are not created as OS symlinks (which can cause issues on some platforms and create security risks with path traversal). Instead, they're written as small text files containing -> target and recorded in the extraction metadata / JSON report.
Running tests
pip install -e ".[dev]"
pytest tests/ -v
110 tests covering format detection, handler extraction, kernel analysis (kallsyms), pipeline orchestration, and forensic scanning. All tests use synthetic data -- no real image files needed.
Known limitations
- EROFS, F2FS, and exFAT: Identified but not yet extracted (no pure-Python reader available).
- Encrypted partitions: FBE-encrypted ext4 partitions are detected and documented, but file contents remain encrypted. The tool does not perform Android FDE/FBE decryption.
- IM*H decryption: Out of scope. The core tool parses IM*H headers and extracts encrypted chunks as raw
.binfiles. Producing plaintext requires AES keys that are not distributed with this tool. - Large partitions: Extracting a 54 GB ext4 partition takes time and disk space. Use
--skip-ext4to skip these, or extract individual partitions as needed. - 7z extraction: Requires system
7zbinary for split archives; falls back topy7zrfor single files. - Symlinks: Recorded as text files, not created as actual OS symlinks.
- Text files: Plain-text metadata files (.txt, .sha256, .xml, README) in the input directory are detected and skipped rather than subjected to forensic extraction.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reap_cli-0.1.1.tar.gz.
File metadata
- Download URL: reap_cli-0.1.1.tar.gz
- Upload date:
- Size: 101.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5fbe7de201ae6500f775b729758759ee9f4c122a2399a643f5a54ba8a6ba4ce
|
|
| MD5 |
086a2932518615e9f7070005dacef968
|
|
| BLAKE2b-256 |
2206dc7df2d89053bcfbc8f3d6494e8bccc61228377000485c65708234433559
|
File details
Details for the file reap_cli-0.1.1-py3-none-any.whl.
File metadata
- Download URL: reap_cli-0.1.1-py3-none-any.whl
- Upload date:
- Size: 106.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
424eaaa638e6d312007cfb8233e680dadb030314eb0ddc8d26f4f1ec05fb374b
|
|
| MD5 |
512c869afee0e4cff73e9f51b80e5aaf
|
|
| BLAKE2b-256 |
3e721f32a51f024372654449dc145c9b2e0819cbc3079571c086dc59cbcda241
|