Skip to main content

Dynamically extract native .so libraries from Android apps via Frida

Project description

soxtract

Dynamically extract native .so libraries from running Android applications using Frida. Extracted files are automatically repaired into valid ELF binaries ready for analysis in Ghidra, Binary Ninja, IDA Pro, or similar tools.


How it works

soxtract injects a Frida agent into the target process and captures native libraries through four complementary methods:

  1. Initial enumeration — dumps all .so files already loaded at attach time
  2. dlopen hooks — intercepts dlopen and android_dlopen_ext to catch libraries loaded at runtime
  3. Periodic memory scan — scans executable memory for ELF magic bytes every 5 seconds, catching libraries loaded through custom loaders
  4. Disk fallback — if a memory read fails completely, reads the library directly from the device filesystem

After extraction, each dump is validated and repaired with a built-in ELF fixer before being saved as a proper .so file alongside a JSON metadata sidecar.


Requirements

  • Rooted Android device (or emulator) with frida-server running
  • USB debugging enabled, device visible to adb
  • Python 3.10+
  • Node.js 18+ (to build the Frida agent — only needed once)

Installation

1. Build the Frida agent

cd agent
npm install
npm run build
cd ..

This compiles the TypeScript agent to agent/dist/agent.js.

2. Install the Python package

pip install -e .

Usage

# Attach to a running app
soxtract com.example.app

# Spawn the app from scratch (captures libs loaded at startup)
soxtract com.example.app --spawn

# Attach by PID
soxtract 1234

# Stop automatically after 60 seconds
soxtract com.example.app --timeout 60

# Save to a custom directory
soxtract com.example.app --output-dir /tmp/dumps

Output structure

soxtract_out/
└── com.example.app/
    └── 20260426_143022/
        ├── libs/               ← repaired .so files + metadata
        │   ├── libfoo_1b2c3d00.so
        │   ├── libfoo_1b2c3d00.meta.json
        │   ├── libbar_2c3d4e00.so
        │   └── libbar_2c3d4e00.meta.json
        └── raw/                ← original memory dumps (backup)
            ├── libfoo_1b2c3d00.so.raw
            └── libbar_2c3d4e00.so.raw

File naming format: {library_name}_{lower_32_bits_of_base_address}.so

Each .meta.json sidecar records:

{
  "library_name": "libfoo.so",
  "base_address": "0x7a1b2c3d00",
  "size_bytes": 2097152,
  "extraction_method": "dlopen_hook",
  "timestamp_utc": "2026-04-26T14:30:22Z",
  "package": "com.example.app",
  "elf_valid": true,
  "elf_bitness": 64,
  "elf_abi": "aarch64",
  "elf_repaired": true,
  "repair_changes": ["ph[1]: p_offset 0x0 → 0x6000", "zeroed e_shnum/e_shstrndx"]
}

CLI options

Flag Default Description
--output-dir DIR soxtract_out Root directory for output
--spawn off Spawn the app instead of attaching to a running process
--timeout S 0 (unlimited) Stop after N seconds
--no-fix off Skip ELF repair; save raw memory dump as-is
--chunk-size KB 256 Memory read chunk size
--scan-interval MS 5000 How often to scan memory for new libraries
--loader-delay MS 250 Delay after dlopen returns before reading the new module
--retries N 3 Memory read retry attempts before falling back to disk
--config FILE Load settings from a TOML file
--log-level LEVEL INFO DEBUG / INFO / WARNING / ERROR

Configuration file

Copy config.example.toml and pass it with --config:

soxtract com.example.app --config my_config.toml
[soxtract]
chunk_size     = 262144   # bytes
scan_interval  = 5000     # ms
loader_delay   = 250      # ms
retries        = 3
retry_backoff_ms = 500
fix_elf        = true
spawn          = false
timeout        = 0
log_level      = "INFO"

CLI flags override config file values.


ELF repair

Memory-dumped .so files have incorrect program-header file offsets (p_offset) because the runtime layout does not match the on-disk layout. soxtract repairs this automatically:

  • For each program header segment: p_offset = p_vaddr − min(PT_LOAD p_vaddr)
  • Section header table info is zeroed out if absent (Android strips it from memory)
  • The repaired file is validated before saving; if repair fails, the raw dump is kept

Use --no-fix to skip repair and save the raw dump directly.


Known limitations

Scenario Behaviour
Packed / encrypted .so Memory content differs from the original file. The dump is saved as-is; the meta JSON notes it as a possible packed library.
Custom native loaders If a library is mapped anonymously (no path), the disk fallback is skipped and only the in-memory content is captured.
Very large libraries (>100 MB) Transfer is slow due to Frida's message channel. Increase --chunk-size to reduce overhead.
Non-rooted device Not supported — frida-server requires root to inject into arbitrary processes.

Project structure

soxtract/
├── soxtract/
│   ├── cli.py           # Entry point and argument parsing
│   ├── config.py        # Configuration (CLI + TOML merge)
│   ├── session.py       # Frida session lifecycle
│   ├── extractor.py     # Message dispatch, chunk reassembly, finalization
│   ├── elf_validator.py # ELF header inspection (no external deps)
│   ├── elf_fixer.py     # In-house ELF memory-dump repair
│   ├── metadata.py      # JSON sidecar model
│   └── dedup.py         # Deduplication by (name, base_address)
├── agent/
│   ├── src/
│   │   ├── index.ts     # Agent entry point + RPC exports
│   │   ├── reader.ts    # Chunked memory reader with retry + disk fallback
│   │   ├── scanner.ts   # Incremental memory scanner
│   │   ├── modules.ts   # Initial enumeration + dlopen hooks
│   │   ├── config.ts    # Runtime-configurable agent settings
│   │   └── types.ts     # Shared message protocol types
│   └── dist/
│       └── agent.js     # Compiled agent (run `npm run build` to regenerate)
├── tests/
│   ├── test_elf_validator.py
│   └── test_elf_fixer.py
├── config.example.toml
└── pyproject.toml

Running tests

python3 -m pytest tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soxtract-0.1.0.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soxtract-0.1.0-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file soxtract-0.1.0.tar.gz.

File metadata

  • Download URL: soxtract-0.1.0.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for soxtract-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ee8e64561eb74da2338e14d3bd224f04a68a1a2118129f0eed8457987aa8f858
MD5 2823a66a35f53609a2ed15d25bce048e
BLAKE2b-256 128a51c80a4d2726cf534a4efff90fd19e01ba33df5b261a70341eb36e9e60a1

See more details on using hashes here.

File details

Details for the file soxtract-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: soxtract-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for soxtract-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 39a215442d03a202edc56e2e3d5f7fdef6fd43305699ef24a7a0cdc2fcb69014
MD5 e88d76fb9e1e9c106fb74d3a585c5472
BLAKE2b-256 a2dfdadae90eb0365800504b319ce6f676d3ed1727074850c9052a9a91cc977f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page