Skip to main content

ExplodeThoseBits - CUDA-accelerated exhaustive bit tree analysis

Project description

ExplodeThoseBits! // ETB

Exhaustive bit-tree traversal library. Takes bytes, explodes them into bits, walks every forward-only path through the bit space, reconstructs candidate byte sequences, scores them against known file signatures and heuristics. CUDA-accelerated for Hopper/Blackwell GPUs.

What it does

Given n input bytes, there are 9^n - 1 possible forward-only paths through the bit coordinate space (each byte has 8 bit positions, plus the choice to skip). The library systematically generates these paths, extracts bits along each path, packs them back into bytes, and evaluates whether the result looks like valid data.

The search space is obviously intractable for any reasonable input size. A 16-byte input has ~1.8 × 10^15 paths. So the library implements multi-level early stopping that prunes branches at 4, 8, and 16 bytes based on:

  • Signature prefix matching (does it start like a PNG, JPEG, PDF, etc.)
  • Shannon entropy bounds (too low = repeated garbage, too high = random/encrypted)
  • Printable ASCII ratio
  • UTF-8 validity
  • Control character density
  • Null byte run length

This reduces effective complexity from O(8^n) to O(k^d) where k is the effective branching factor after pruning and d is the depth at which most paths get killed.

Installation

pip install explodethosebits

Or build from source:

git clone https://github.com/odinglyn0/etb
cd etb
pip install .

Requires CUDA 12.x for GPU acceleration. Falls back to CPU if unavailable.

Usage

Basic extraction

import etb

# Extract from bytes
data = open("suspicious.bin", "rb").read()
candidates = etb.extract(data)

for c in candidates:
    print(f"{c.format_name}: {c.confidence:.2%}")
    print(f"  Entropy: {c.heuristics.entropy:.2f}")
    print(f"  Path length: {c.path.length()}")

Extract from file

candidates = etb.extract_from_file("suspicious.bin")

With metrics

result = etb.extract_with_metrics(data)

print(f"Evaluated {result.metrics.paths_evaluated} paths")
print(f"Prune rate: {result.metrics.prune_rate:.1%}")
print(f"Effective branching factor: {result.metrics.effective_branching_factor:.2f}")
print(f"Time: {result.metrics.wall_clock_seconds:.2f}s")

Custom configuration

config = etb.EtbConfig()

# Adjust early stopping thresholds
config.early_stopping.level1_threshold = 0.3  # More aggressive pruning
config.early_stopping.level2_threshold = 0.4
config.early_stopping.adaptive_thresholds = True

# Entropy bounds
config.entropy_min = 0.1  # Below this = repeated pattern
config.entropy_max = 7.9  # Above this = random/encrypted

# Bit pruning mode
config.bit_pruning = etb.BitPruningConfig(etb.BitPruningMode.MSB_ONLY)  # O(4^d) instead of O(8^d)

# Output options
config.output.top_n_results = 20
config.output.include_paths = True

candidates = etb.extract(data, config)

Load config from file

etb.load_config("config.yaml")
config = etb.ConfigManager.instance().get_config()

Low-level API

# Bit coordinates
coord = etb.BitCoordinate(byte_index=0, bit_position=3)

# Paths (forward-only constraint enforced)
path = etb.Path()
path.add(etb.BitCoordinate(0, 3))
path.add(etb.BitCoordinate(1, 5))
path.add(etb.BitCoordinate(2, 1))
assert path.is_valid()  # byte indices must be strictly increasing

# Extract bits along a path
bits = etb.extract_bits_from_path(data, path)

# Pack bits back to bytes
reconstructed = etb.bits_to_bytes(bits)

# Or do both at once
reconstructed = etb.path_to_bytes(data, path)

Path generation

# Generate all paths for small inputs (careful with this)
gen = etb.PathGenerator(input_length=4)
for path in gen:
    # 9^4 - 1 = 6560 paths
    pass

# With bit pruning
config = etb.PathGeneratorConfig(input_length=4)
config.bit_mask = 0xF0  # Only MSB positions
gen = etb.PathGenerator(config)
# Now 5^4 - 1 = 624 paths

Signature matching

# Load signature dictionary
sigs = etb.SignatureDictionary()
sigs.load_from_json("signatures.json")

# Match against data
matcher = etb.SignatureMatcher(sigs)
match = matcher.match(data)

if match.matched:
    print(f"Detected: {match.format_name} ({match.category})")
    print(f"Confidence: {match.confidence:.2%}")
    print(f"Header matched: {match.header_matched}")
    print(f"Footer matched: {match.footer_matched}")

Heuristics analysis

engine = etb.HeuristicsEngine()
result = engine.analyze(data)

print(f"Entropy: {result.entropy:.2f} bits/byte")
print(f"Printable ratio: {result.printable_ratio:.2%}")
print(f"Control char ratio: {result.control_char_ratio:.2%}")
print(f"Max null run: {result.max_null_run}")
print(f"UTF-8 validity: {result.utf8_validity:.2%}")
print(f"Composite score: {result.composite_score:.2f}")

Early stopping

controller = etb.EarlyStoppingController()

# Check if a partial reconstruction should be abandoned
decision = controller.should_stop(partial_data)
if decision.should_stop:
    print(f"Stopped at level {decision.level}: {decision.reason}")

Prefix trie (for deduplication)

trie = etb.PrefixTrie()

# Insert evaluated prefixes
trie.insert(prefix_bytes, etb.PrefixStatus.VALID, score=0.8)
trie.insert(bad_prefix, etb.PrefixStatus.PRUNED, score=0.1)

# Check before evaluating
if trie.is_pruned(candidate_prefix):
    # Skip - ancestor was already pruned
    pass

# Stats
print(f"Effective branching factor: {trie.get_effective_branching_factor():.2f}")

Memoization cache

cache = etb.PrefixCache()

# Check cache before expensive evaluation
entry = cache.lookup(prefix)
if entry is not None:
    score = entry.score
    should_prune = entry.should_prune
else:
    # Evaluate and cache
    result = engine.analyze(prefix)
    cache.insert(prefix, result, score, should_prune)

print(f"Cache hit rate: {cache.hit_rate():.1%}")

How the exhaustive search works

Bit coordinate system

Each bit in the input is addressed by (byte_index, bit_position) where bit_position ∈ [0,7] and 0 is LSB.

Forward-only paths

A path is a sequence of bit coordinates where each coordinate's byte_index is strictly greater than the previous. This constraint:

  1. Ensures paths are acyclic
  2. Limits path length to at most n (input length)
  3. Makes the search space finite (though still exponential)

Path enumeration

For n bytes with 8 bits each, the number of paths is:

Σ(k=1 to n) C(n,k) × 8^k = 9^n - 1

The library generates paths lazily using depth-first traversal with backtracking.

Multi-level early stopping

The key to tractability. At each checkpoint depth:

Level 1 (4 bytes):

  • Check signature prefix (first 4 bytes of PNG, JPEG, etc.)
  • Basic entropy check
  • Reject repeated byte patterns
  • Threshold: 0.2 composite score

Level 2 (8 bytes):

  • Full entropy bounds check
  • Printable ratio for text detection
  • Control character density
  • Threshold: 0.3 composite score

Level 3 (16 bytes):

  • Structural coherence
  • UTF-8 validity
  • Null run length
  • Threshold: 0.4 composite score

Paths failing any level are pruned along with all descendants.

Adaptive thresholds

When a high-scoring candidate is found (>0.8), thresholds tighten to 0.6. When the best score is low (<0.3), thresholds relax to 0.2. This focuses search on promising regions.

Bit pruning modes

Reduce branching factor by limiting which bit positions are explored:

  • EXHAUSTIVE: All 8 positions. O(8^d). Most thorough.
  • MSB_ONLY: Positions 4-7 only. O(4^d). Good for most formats.
  • SINGLE_BIT: 2 positions. O(2^d). Fastest.
  • CUSTOM: Arbitrary bitmask.

Scoring

Candidates are scored by weighted combination:

  • Signature match: 40%
  • Heuristic score: 30%
  • Length score: 15%
  • Structural validity: 15%

Top-k candidates are maintained in a min-heap for efficient tracking.

CUDA acceleration

GPU kernels for:

  • Path generation: Work-stealing across thread blocks, warp-level cooperative exploration
  • Heuristics: Shared memory histograms for entropy calculation
  • Signature matching: Constant memory broadcast for signature data
  • Prefix pruning: Atomic trie updates

Optimized for SM 90 (Hopper) and SM 100 (Blackwell) with architecture-specific tuning.

Configuration reference

See data/config.yaml for all options with documentation.

Key parameters:

Parameter Default Description
early_stopping.level1_threshold 0.2 Min score to continue at 4 bytes
early_stopping.level2_threshold 0.3 Min score to continue at 8 bytes
early_stopping.level3_threshold 0.4 Min score to continue at 16 bytes
entropy_min 0.1 Below = repeated pattern
entropy_max 7.9 Above = random/encrypted
bit_pruning.mode exhaustive Bit position selection
memoization.max_cache_size_mb 1024 LRU cache size
output.top_n_results 10 Candidates to return
performance.batch_size 65536 Paths per kernel launch

Supported formats

Signature detection for:

  • Images: PNG, JPEG, GIF, BMP
  • Documents: PDF
  • Archives: ZIP, RAR, 7Z
  • Executables: ELF, PE
  • Audio: MP3
  • Video: MP4

Add custom signatures via JSON:

{
  "format": "CUSTOM",
  "category": "custom",
  "signatures": [
    {"magic": "DEADBEEF", "offset": 0, "confidence": 0.9}
  ],
  "footer": {
    "magic": "CAFEBABE",
    "required": false
  }
}

Building from source

mkdir build && cd build
cmake ..
cmake --build .
ctest --output-on-failure

Requirements:

  • CMake 3.24+
  • CUDA Toolkit 12.x
  • C++17 compiler
  • Python 3.8+ (for bindings)

License

MIT // See: LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

explodethosebits-0.2.8.tar.gz (130.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

explodethosebits-0.2.8-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

explodethosebits-0.2.8-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

explodethosebits-0.2.8-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

explodethosebits-0.2.8-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

explodethosebits-0.2.8-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file explodethosebits-0.2.8.tar.gz.

File metadata

  • Download URL: explodethosebits-0.2.8.tar.gz
  • Upload date:
  • Size: 130.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for explodethosebits-0.2.8.tar.gz
Algorithm Hash digest
SHA256 209948c203b660d05844421d67ebcf0a596b90a4ff4c145b98ed83bcaa0484d6
MD5 2341e9f6ac6b5cfe7b5bc3cdbb193492
BLAKE2b-256 7bea5c14af2bef67963c09741786dd3f8ae02dcc31f68b16cbfd17c059de5c5a

See more details on using hashes here.

Provenance

The following attestation bundles were made for explodethosebits-0.2.8.tar.gz:

Publisher: publish.yml on odinglyn0/etb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file explodethosebits-0.2.8-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for explodethosebits-0.2.8-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 41aa4275564f17d3849acdbad41fdadf190c32687eec652649afb66466282f14
MD5 437384f5478a0d16e046906c6e0d754b
BLAKE2b-256 12fcd7fc62a8df9d0094e81ec91c3c3340aefa8314d76de6335ce124e6ca1e53

See more details on using hashes here.

Provenance

The following attestation bundles were made for explodethosebits-0.2.8-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on odinglyn0/etb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file explodethosebits-0.2.8-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for explodethosebits-0.2.8-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4e9124518584e35e76adb4815235a60ab8c8fdc16041388e41285ee36ada5af5
MD5 77c60002af1dc8dbdc83955df63ab37c
BLAKE2b-256 b71999b4ecc0ccf922487e631d3aa09df84847dbc85042d698cf7eeb322a0f95

See more details on using hashes here.

Provenance

The following attestation bundles were made for explodethosebits-0.2.8-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on odinglyn0/etb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file explodethosebits-0.2.8-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for explodethosebits-0.2.8-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2152a97a2b92b86dad04b1345d44e6bc0d79161853e07cba8ee07dbb3f9b8cff
MD5 4e54ec035fe02235b6d65b5b48590585
BLAKE2b-256 31f7def361a58bb92b2454c9d59f0011a7b640e6157332899f62b5d7de29d40b

See more details on using hashes here.

Provenance

The following attestation bundles were made for explodethosebits-0.2.8-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on odinglyn0/etb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file explodethosebits-0.2.8-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for explodethosebits-0.2.8-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7d300e3be435d580115d4dd92e211ea084462d27d085bd9d8871c15bada1e4af
MD5 605104247bc94032912712da2111d2c9
BLAKE2b-256 3cbc82fb7dccb7e2d8fa6651551c2d294722079beb5b65349cc58869a71dde14

See more details on using hashes here.

Provenance

The following attestation bundles were made for explodethosebits-0.2.8-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on odinglyn0/etb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file explodethosebits-0.2.8-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for explodethosebits-0.2.8-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 577a2d41617e36dd421070e65c6d8011d990df8a8471a301d1b577c7b5657297
MD5 75cb4621d9bb705dea334b088546da19
BLAKE2b-256 423da604659153697399addcf54c04798692eb308603131ab03cd54490384d81

See more details on using hashes here.

Provenance

The following attestation bundles were made for explodethosebits-0.2.8-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on odinglyn0/etb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page