Skip to main content

Toolkit for managing large binary files through chunking and metadata descriptors

Project description

Blob Descriptor 🗃️

Python Version Tests Status PyPI version fury.io

A robust toolkit for managing large binary files through intelligent chunking and metadata descriptors.

☕ Support

If you find this project helpful, consider supporting me:

ko-fi

Features ✨

  • Smart Chunking - Split large files into manageable chunks
  • Metadata Tracking - Maintain comprehensive file descriptors
  • Integrity Verification - MD5 checksums at file and chunk levels
  • Flexible Sources - Handle local files and remote URLs
  • Efficient Transfer - Parallel chunk processing
  • CLI Interface - Easy command-line control

Installation 📦

pip install blob-descriptor

Or for the latest development version:

pip install git+https://github.com/jet-logic/blob_descriptor.git

Usage 🚀

Basic Workflow

  1. Create descriptor and chunks:

    blob-descriptor create --cw 1M,./chunks large_file.dat
    
  2. Verify integrity:

    blob-descriptor verify descriptor.bd ./chunks
    
  3. Reassemble files:

    blob-descriptor assemble descriptor.bd --sink reconstructed.dat
    

Command Reference

Command Description Example
create Generate descriptor and chunks create --cs 5M bigfile.iso
verify Check file integrity verify desc.bd ./chunks
check Validate chunk availability check desc.bd --search ./backups
assemble Reconstruct files from chunks assemble desc.bd --sink output.img

Advanced Options

# Process only specific chunks (0-5 and 10-15)
blob-descriptor create \
  --cc "5M,./temp,upload.sh {file},0-5,10-15" \
  huge_dataset.bin

# Use custom naming pattern (mask 3)
blob-descriptor create -m 3 --cs 10M data.tar

API Example 🐍

from blob_descriptor import BlobDescriptor

# Create descriptor
bd = BlobDescriptor()
bd.add_file("large_file.bin")
desc = bd.make_descriptor(block_size=8192)

# Save to file
bd.save("descriptor.bd")

Here's a dedicated section for the naming pattern masks that you can add to your README:

Naming Pattern Masks 🏷️

Blob Descriptor provides flexible naming patterns for generated chunks through four mask options:

Available Masks

Mask Pattern Format Example Output
1 {md5:.5}_{total_size}_{block_size}_{index:0{block_ipad}d}_{md5:.5} 1a3f5_1048576_524288_0001_1a3f5
2 {md5:.5}_{block_size}{index:0{block_ipad}d}_{md5:.5}_{total_size} 1a3f5_512K0001_1a3f5_1048576
3 {md5:.5}_{block_size}{index:0{block_ipad}d}_{total_size} 1a3f5_512K0001_1048576
4 {md5:.5}_{block_size}{index:0{block_ipad}d} 1a3f5_512K0001

Format Variables

  • {md5:.5}: First 5 chars of MD5 hash
  • {total_size}: Full blob size in bytes
  • {block_size}: Chunk size in bytes (mask 1) or human-readable (masks 2-4)
  • {index}: Chunk sequence number
  • {block_ipad}: Zero-padding width based on total chunks

Usage Examples

Set mask when creating descriptor:

blob-descriptor create -m 3 --cs 10M bigfile.iso

Programmatic configuration:

from blob_descriptor import set_mask, mask3
set_mask(mask3)  # Use mask pattern 3

Mask Comparison

# Mask 1 (Default)
1a3f5_1048576_524288_0001_1a3f5

# Mask 2 (Size suffix)
1a3f5_512K0001_1a3f5_1048576

# Mask 3 (Compact)
1a3f5_512K0001_1048576

# Mask 4 (Minimal)
1a3f5_512K0001

Choose masks based on your needs:

  • Mask 1: Full technical details
  • Mask 2: Balanced readability
  • Mask 3: Clean with size info
  • Mask 4: Minimal footprint

The pattern affects both:

  • Chunk filenames
  • Descriptor metadata
  • Verification outputs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blob_descriptor-0.0.3.tar.gz (69.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blob_descriptor-0.0.3-py3-none-any.whl (55.2 kB view details)

Uploaded Python 3

File details

Details for the file blob_descriptor-0.0.3.tar.gz.

File metadata

  • Download URL: blob_descriptor-0.0.3.tar.gz
  • Upload date:
  • Size: 69.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for blob_descriptor-0.0.3.tar.gz
Algorithm Hash digest
SHA256 53a46b458346fe44613fd185091516c28f0c890d7018dcaabe112fd1fe5afb63
MD5 0e987d80bd160bac292aef5bea2f9690
BLAKE2b-256 b0aa91ef6b1c9ddc886edfabb8230c1fa0254a970e2b32ee36cda66fde988c31

See more details on using hashes here.

File details

Details for the file blob_descriptor-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for blob_descriptor-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 19cc130a791d6cbb60b5f557b698dad83cd0e608089022df4e22d33cac9f9b72
MD5 7d4bab5bffebafcff3e147b82fab5c25
BLAKE2b-256 af18e4b3bfd341696b18cab278d67393c845070f2217a6ae47eb7c4148681743

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page