Toolkit for managing large binary files through chunking and metadata descriptors
Project description
Blob Descriptor 🗃️
A robust toolkit for managing large binary files through intelligent chunking and metadata descriptors.
☕ Support
If you find this project helpful, consider supporting me:
Features ✨
- Smart Chunking - Split large files into manageable chunks
- Metadata Tracking - Maintain comprehensive file descriptors
- Integrity Verification - MD5 checksums at file and chunk levels
- Flexible Sources - Handle local files and remote URLs
- Efficient Transfer - Parallel chunk processing
- CLI Interface - Easy command-line control
Installation 📦
pip install blob-descriptor
Or for the latest development version:
pip install git+https://github.com/jet-logic/blob_descriptor.git
Usage 🚀
Basic Workflow
-
Create descriptor and chunks:
blob-descriptor create --cw 1M,./chunks large_file.dat
-
Verify integrity:
blob-descriptor verify descriptor.bd ./chunks
-
Reassemble files:
blob-descriptor assemble descriptor.bd --sink reconstructed.dat
Command Reference
| Command | Description | Example |
|---|---|---|
| create | Generate descriptor and chunks | create --cs 5M bigfile.iso |
| verify | Check file integrity | verify desc.bd ./chunks |
| check | Validate chunk availability | check desc.bd --search ./backups |
| assemble | Reconstruct files from chunks | assemble desc.bd --sink output.img |
Advanced Options
# Process only specific chunks (0-5 and 10-15)
blob-descriptor create \
--cc "5M,./temp,upload.sh {file},0-5,10-15" \
huge_dataset.bin
# Use custom naming pattern (mask 3)
blob-descriptor create -m 3 --cs 10M data.tar
API Example 🐍
from blob_descriptor import BlobDescriptor
# Create descriptor
bd = BlobDescriptor()
bd.add_file("large_file.bin")
desc = bd.make_descriptor(block_size=8192)
# Save to file
bd.save("descriptor.bd")
Here's a dedicated section for the naming pattern masks that you can add to your README:
Naming Pattern Masks 🏷️
Blob Descriptor provides flexible naming patterns for generated chunks through four mask options:
Available Masks
| Mask | Pattern Format | Example Output |
|---|---|---|
| 1 | {md5:.5}_{total_size}_{block_size}_{index:0{block_ipad}d}_{md5:.5} |
1a3f5_1048576_524288_0001_1a3f5 |
| 2 | {md5:.5}_{block_size}{index:0{block_ipad}d}_{md5:.5}_{total_size} |
1a3f5_512K0001_1a3f5_1048576 |
| 3 | {md5:.5}_{block_size}{index:0{block_ipad}d}_{total_size} |
1a3f5_512K0001_1048576 |
| 4 | {md5:.5}_{block_size}{index:0{block_ipad}d} |
1a3f5_512K0001 |
Format Variables
{md5:.5}: First 5 chars of MD5 hash{total_size}: Full blob size in bytes{block_size}: Chunk size in bytes (mask 1) or human-readable (masks 2-4){index}: Chunk sequence number{block_ipad}: Zero-padding width based on total chunks
Usage Examples
Set mask when creating descriptor:
blob-descriptor create -m 3 --cs 10M bigfile.iso
Programmatic configuration:
from blob_descriptor import set_mask, mask3
set_mask(mask3) # Use mask pattern 3
Mask Comparison
# Mask 1 (Default)
1a3f5_1048576_524288_0001_1a3f5
# Mask 2 (Size suffix)
1a3f5_512K0001_1a3f5_1048576
# Mask 3 (Compact)
1a3f5_512K0001_1048576
# Mask 4 (Minimal)
1a3f5_512K0001
Choose masks based on your needs:
- Mask 1: Full technical details
- Mask 2: Balanced readability
- Mask 3: Clean with size info
- Mask 4: Minimal footprint
The pattern affects both:
- Chunk filenames
- Descriptor metadata
- Verification outputs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file blob_descriptor-0.0.3.tar.gz.
File metadata
- Download URL: blob_descriptor-0.0.3.tar.gz
- Upload date:
- Size: 69.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53a46b458346fe44613fd185091516c28f0c890d7018dcaabe112fd1fe5afb63
|
|
| MD5 |
0e987d80bd160bac292aef5bea2f9690
|
|
| BLAKE2b-256 |
b0aa91ef6b1c9ddc886edfabb8230c1fa0254a970e2b32ee36cda66fde988c31
|
File details
Details for the file blob_descriptor-0.0.3-py3-none-any.whl.
File metadata
- Download URL: blob_descriptor-0.0.3-py3-none-any.whl
- Upload date:
- Size: 55.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19cc130a791d6cbb60b5f557b698dad83cd0e608089022df4e22d33cac9f9b72
|
|
| MD5 |
7d4bab5bffebafcff3e147b82fab5c25
|
|
| BLAKE2b-256 |
af18e4b3bfd341696b18cab278d67393c845070f2217a6ae47eb7c4148681743
|