Skip to main content

Toolkit for managing large binary files through chunking and metadata descriptors

Project description

Blob Descriptor 🗃️

Python Version

A robust toolkit for managing large binary files through intelligent chunking and metadata descriptors.

Features ✨

  • Smart Chunking - Split large files into manageable chunks
  • Metadata Tracking - Maintain comprehensive file descriptors
  • Integrity Verification - MD5 checksums at file and chunk levels
  • Flexible Sources - Handle local files and remote URLs
  • Efficient Transfer - Parallel chunk processing
  • CLI Interface - Easy command-line control

Installation 📦

pip install blob-descriptor

Or for the latest development version:

pip install git+https://github.com/yourusername/blob_descriptor.git

Usage 🚀

Basic Workflow

  1. Create descriptor and chunks:

    blob-descriptor create --cw 1M,./chunks large_file.dat
    
  2. Verify integrity:

    blob-descriptor verify descriptor.bd ./chunks
    
  3. Reassemble files:

    blob-descriptor assemble descriptor.bd --sink reconstructed.dat
    

Command Reference

Command Description Example
create Generate descriptor and chunks create --cs 5M bigfile.iso
verify Check file integrity verify desc.bd ./chunks
check Validate chunk availability check desc.bd --search ./backups
assemble Reconstruct files from chunks assemble desc.bd --sink output.img

Advanced Options

# Process only specific chunks (0-5 and 10-15)
blob-descriptor create \
  --cc "5M,./temp,upload.sh {file},0-5,10-15" \
  huge_dataset.bin

# Use custom naming pattern (mask 3)
blob-descriptor create -m 3 --cs 10M data.tar

API Example 🐍

from blob_descriptor import BlobDescriptor

# Create descriptor
bd = BlobDescriptor()
bd.add_file("large_file.bin")
desc = bd.make_descriptor(block_size=8192)

# Save to file
bd.save("descriptor.bd")

Here's a dedicated section for the naming pattern masks that you can add to your README:

Naming Pattern Masks 🏷️

Blob Descriptor provides flexible naming patterns for generated chunks through four mask options:

Available Masks

Mask Pattern Format Example Output
1 {md5:.5}_{total_size}_{block_size}_{index:0{block_ipad}d}_{md5:.5} 1a3f5_1048576_524288_0001_1a3f5
2 {md5:.5}_{block_size}{index:0{block_ipad}d}_{md5:.5}_{total_size} 1a3f5_512K0001_1a3f5_1048576
3 {md5:.5}_{block_size}{index:0{block_ipad}d}_{total_size} 1a3f5_512K0001_1048576
4 {md5:.5}_{block_size}{index:0{block_ipad}d} 1a3f5_512K0001

Format Variables

  • {md5:.5}: First 5 chars of MD5 hash
  • {total_size}: Full blob size in bytes
  • {block_size}: Chunk size in bytes (mask 1) or human-readable (masks 2-4)
  • {index}: Chunk sequence number
  • {block_ipad}: Zero-padding width based on total chunks

Usage Examples

Set mask when creating descriptor:

blob-descriptor create -m 3 --cs 10M bigfile.iso

Programmatic configuration:

from blob_descriptor import set_mask, mask3
set_mask(mask3)  # Use mask pattern 3

Mask Comparison

# Mask 1 (Default)
1a3f5_1048576_524288_0001_1a3f5

# Mask 2 (Size suffix)
1a3f5_512K0001_1a3f5_1048576

# Mask 3 (Compact)
1a3f5_512K0001_1048576

# Mask 4 (Minimal)
1a3f5_512K0001

Choose masks based on your needs:

  • Mask 1: Full technical details
  • Mask 2: Balanced readability
  • Mask 3: Clean with size info
  • Mask 4: Minimal footprint

The pattern affects both:

  • Chunk filenames
  • Descriptor metadata
  • Verification outputs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blob_descriptor-0.0.2.tar.gz (69.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blob_descriptor-0.0.2-py3-none-any.whl (54.9 kB view details)

Uploaded Python 3

File details

Details for the file blob_descriptor-0.0.2.tar.gz.

File metadata

  • Download URL: blob_descriptor-0.0.2.tar.gz
  • Upload date:
  • Size: 69.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for blob_descriptor-0.0.2.tar.gz
Algorithm Hash digest
SHA256 71679621e6b58c2706d6a695e3eb433829f4b6d8994282b5536898874a84eb28
MD5 e1e0fa4028a0ff3a8f76b5f57908babd
BLAKE2b-256 514865152b8eef2c3d0fac06e3f6d8311a10bc3a7dcb8fc048e5be5ba149c368

See more details on using hashes here.

File details

Details for the file blob_descriptor-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for blob_descriptor-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d6b660bbbeba9693df570f813dea48bedc9d50a6df0e2bdf3c50ad6beb2570c6
MD5 4049924881d8754aa047237d4afc86fc
BLAKE2b-256 583a3866122b88952f6c8b1f547d748372e4712445d47eaa38afbd31fc31227b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page