Skip to main content

Toolkit for managing large binary files through chunking and metadata descriptors

Project description

Blob Descriptor 🗃️

Python Version

A robust toolkit for managing large binary files through intelligent chunking and metadata descriptors.

Features ✨

  • Smart Chunking - Split large files into manageable chunks
  • Metadata Tracking - Maintain comprehensive file descriptors
  • Integrity Verification - MD5 checksums at file and chunk levels
  • Flexible Sources - Handle local files and remote URLs
  • Efficient Transfer - Parallel chunk processing
  • CLI Interface - Easy command-line control

Installation 📦

pip install blob-descriptor

Or for the latest development version:

pip install git+https://github.com/yourusername/blob_descriptor.git

Usage 🚀

Basic Workflow

  1. Create descriptor and chunks:

    blob-descriptor create --cw 1M,./chunks large_file.dat
    
  2. Verify integrity:

    blob-descriptor verify descriptor.bd ./chunks
    
  3. Reassemble files:

    blob-descriptor assemble descriptor.bd --sink reconstructed.dat
    

Command Reference

Command Description Example
create Generate descriptor and chunks create --cs 5M bigfile.iso
verify Check file integrity verify desc.bd ./chunks
check Validate chunk availability check desc.bd --search ./backups
assemble Reconstruct files from chunks assemble desc.bd --sink output.img

Advanced Options

# Process only specific chunks (0-5 and 10-15)
blob-descriptor create \
  --cc "5M,./temp,upload.sh {file},0-5,10-15" \
  huge_dataset.bin

# Use custom naming pattern (mask 3)
blob-descriptor create -m 3 --cs 10M data.tar

API Example 🐍

from blob_descriptor import BlobDescriptor

# Create descriptor
bd = BlobDescriptor()
bd.add_file("large_file.bin")
desc = bd.make_descriptor(block_size=8192)

# Save to file
bd.save("descriptor.bd")

Here's a dedicated section for the naming pattern masks that you can add to your README:

Naming Pattern Masks 🏷️

Blob Descriptor provides flexible naming patterns for generated chunks through four mask options:

Available Masks

Mask Pattern Format Example Output
1 {md5:.5}_{total_size}_{block_size}_{index:0{block_ipad}d}_{md5:.5} 1a3f5_1048576_524288_0001_1a3f5
2 {md5:.5}_{block_size}{index:0{block_ipad}d}_{md5:.5}_{total_size} 1a3f5_512K0001_1a3f5_1048576
3 {md5:.5}_{block_size}{index:0{block_ipad}d}_{total_size} 1a3f5_512K0001_1048576
4 {md5:.5}_{block_size}{index:0{block_ipad}d} 1a3f5_512K0001

Format Variables

  • {md5:.5}: First 5 chars of MD5 hash
  • {total_size}: Full blob size in bytes
  • {block_size}: Chunk size in bytes (mask 1) or human-readable (masks 2-4)
  • {index}: Chunk sequence number
  • {block_ipad}: Zero-padding width based on total chunks

Usage Examples

Set mask when creating descriptor:

blob-descriptor create -m 3 --cs 10M bigfile.iso

Programmatic configuration:

from blob_descriptor import set_mask, mask3
set_mask(mask3)  # Use mask pattern 3

Mask Comparison

# Mask 1 (Default)
1a3f5_1048576_524288_0001_1a3f5

# Mask 2 (Size suffix)
1a3f5_512K0001_1a3f5_1048576

# Mask 3 (Compact)
1a3f5_512K0001_1048576

# Mask 4 (Minimal)
1a3f5_512K0001

Choose masks based on your needs:

  • Mask 1: Full technical details
  • Mask 2: Balanced readability
  • Mask 3: Clean with size info
  • Mask 4: Minimal footprint

The pattern affects both:

  • Chunk filenames
  • Descriptor metadata
  • Verification outputs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blob_descriptor-0.0.1.tar.gz (45.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blob_descriptor-0.0.1-py3-none-any.whl (42.5 kB view details)

Uploaded Python 3

File details

Details for the file blob_descriptor-0.0.1.tar.gz.

File metadata

  • Download URL: blob_descriptor-0.0.1.tar.gz
  • Upload date:
  • Size: 45.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for blob_descriptor-0.0.1.tar.gz
Algorithm Hash digest
SHA256 1538b8cdc3c17cf1430dca245b0d1f1a801f53af2e2a6241b215e3cdd253c9ef
MD5 46e520b91e59641df33b3afa63d7a73b
BLAKE2b-256 30407d953f7223d97d83e2b054804634a43b2d5aa5f5f207ee41860c0236c0da

See more details on using hashes here.

File details

Details for the file blob_descriptor-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for blob_descriptor-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 83d20b501c0d1cb2c63345de112098672b7aee2cebc55b4bfa0a70a9fb6c6b5a
MD5 da234d27e98020ef0e4ebc2b531e6852
BLAKE2b-256 3e5cf77ef1f1f6e61d1879e3961d8a0ffc2edca68d7c41ee26ba2a82d9499979

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page