Zeroc: High-Performance API Compression Protocol
Project description
Zeroc Python Implementation
Python reference implementation of the Zeroc compression protocol.
Features
- ✅ Complete wire format implementation (12-byte header + varint + payload)
- ✅ Dictionary loader with 132-byte header validation
- ✅ CRC32C checksum support
- ✅ Dictionary caching for performance
- ✅ Comprehensive test suite
- ✅ Type annotations for IDE support
Installation
From Source
cd implementations/python
pip install -e .
Dependencies
pip install zstandard crc32c protobuf
Quick Start
Basic Compression (No Dictionary)
from zeroc import encode_frame, decode_frame, decompress_payload
# Encode protobuf to Zeroc frame
proto_bytes = b"Your protobuf serialized data"
frame = encode_frame(proto_bytes, compress=True, checksum=True)
# Decode frame
compressed, metadata = decode_frame(frame)
# Decompress
decompressed = decompress_payload(compressed, 0)
With Trained Dictionary
from zeroc import DictionaryLoader, encode_frame, decode_frame
# Load dictionary
loader = DictionaryLoader()
metadata, dict_obj = loader.load("dictionaries/formats/Order-1.0.0.zdict")
# Get compressor/decompressor
compressor = loader.get_compressor("dictionaries/formats/Order-1.0.0.zdict")
decompressor = loader.get_decompressor("dictionaries/formats/Order-1.0.0.zdict")
# Encode with dictionary
frame = encode_frame(
proto_bytes,
dictionary_id=metadata['dictionary_id'],
compress=True,
checksum=True,
compressor=compressor
)
# Decode and decompress
compressed, frame_meta = decode_frame(frame)
decompressed = decompressor.decompress(compressed)
API Reference
Wire Format
encode_frame(proto_bytes, dictionary_id=0, schema_hash=0, compress=True, checksum=False, compressor=None)
Encode protobuf bytes to Zeroc frame.
Parameters:
proto_bytes(bytes): Protobuf binary datadictionary_id(int): Dictionary ID (CRC32), 0 for no dictionaryschema_hash(int): Schema hash (CRC32)compress(bool): Whether to compress (True) or identity (False)checksum(bool): Whether to include CRC32C checksumcompressor(ZstdCompressor): Required if dictionary_id > 0
Returns:
bytes: Complete Zeroc frame
Example:
frame = encode_frame(proto_bytes, compress=True, checksum=True)
decode_frame(frame)
Decode Zeroc frame to payload and metadata.
Parameters:
frame(bytes): Complete Zeroc frame
Returns:
(bytes, dict): Tuple of (payload, metadata)
Metadata Fields:
version: Protocol version bytemajor_version: Major version (1)minor_version: Minor version (0)flags: Flags bytedictionary_id: Dictionary IDschema_hash: Schema hashcompressed_size: Payload size in bytescompression_enabled: Booleandictionary_used: Booleanchecksum_included: Boolean
Raises:
ValueError: If frame is invalid or malformed
Example:
compressed, metadata = decode_frame(frame)
print(f"Dictionary ID: 0x{metadata['dictionary_id']:08x}")
decompress_payload(compressed, dictionary_id, decompressor=None)
Decompress payload with optional dictionary.
Parameters:
compressed(bytes): Compressed bytesdictionary_id(int): Dictionary ID (0 for no dictionary)decompressor(ZstdDecompressor): Required if dictionary_id > 0
Returns:
bytes: Decompressed protobuf bytes
Example:
proto_bytes = decompress_payload(compressed, 0) # No dictionary
proto_bytes = decompress_payload(compressed, dict_id, decompressor) # With dictionary
Dictionary Loader
load_dictionary(filepath)
Load Zeroc dictionary from .zdict file.
Parameters:
filepath(str): Path to .zdict file
Returns:
(dict, ZstdCompressionDict): Tuple of (metadata, dictionary object)
Metadata Fields:
version: SemVer string (e.g., "1.0.0")schema_name: Schema name (e.g., "ecommerce.v1.Order")dictionary_id: Dictionary ID (CRC32)sample_count: Number of training samplescreated: Unix timestampcompression_level: Zstd compression leveldict_size: Dictionary size in bytesmin_size: Minimum protobuf sizemax_size: Maximum protobuf sizesha256_prefix: First 8 bytes of SHA256
Raises:
FileNotFoundError: If file doesn't existValueError: If dictionary format is invalid
Example:
metadata, dict_obj = load_dictionary("Order-1.0.0.zdict")
print(f"Dictionary ID: 0x{metadata['dictionary_id']:08x}")
DictionaryLoader
Dictionary loader with caching.
Methods:
load(filepath)
Load dictionary with caching.
loader = DictionaryLoader()
metadata, dict_obj = loader.load("Order-1.0.0.zdict")
get_compressor(filepath, level=3)
Get ZstdCompressor for dictionary.
compressor = loader.get_compressor("Order-1.0.0.zdict", level=3)
compressed = compressor.compress(proto_bytes)
get_decompressor(filepath)
Get ZstdDecompressor for dictionary.
decompressor = loader.get_decompressor("Order-1.0.0.zdict")
decompressed = decompressor.decompress(compressed)
clear_cache()
Clear dictionary cache.
loader.clear_cache()
Constants
from zeroc import (
MAGIC_BYTES, # b'PZ'
PROTOCOL_VERSION, # 0x10 (v1.0)
FLAG_COMPRESSION_ENABLED,
FLAG_DICTIONARY_USED,
FLAG_CHECKSUM_INCLUDED,
)
Examples
See examples/basic_usage.py for comprehensive examples:
cd implementations/python
python examples/basic_usage.py
Examples include:
- Compression without dictionary
- Compression with trained dictionary
- Identity frame (no compression)
- Batch compression (100 messages)
Testing
Run Tests
# Install test dependencies
pip install pytest
# Run all tests
cd implementations/python
pytest tests/ -v
# Run specific test
pytest tests/test_wire_format.py::TestFrameEncoding::test_identity_frame -v
# Run with coverage
pip install pytest-cov
pytest tests/ --cov=zeroc --cov-report=html
Test Coverage
- Wire format: varint encoding/decoding, frame encoding/decoding, error handling
- Dictionary loader: format validation, caching, real dictionaries
- Round-trip tests: encode → decode → decompress
Performance
Typical performance on modern hardware:
| Operation | Latency | Throughput |
|---|---|---|
| Encode (no dict) | ~0.001ms | ~1M ops/sec |
| Encode (with dict) | ~0.002ms | ~500K ops/sec |
| Decode (no dict) | <0.001ms | ~2M ops/sec |
| Decode (with dict) | <0.001ms | ~2M ops/sec |
Wire Format Specification
Zeroc frame structure:
+-------------------+
| Magic "PZ" (2B) | Fixed header
+-------------------+
| Version (1B) | 0x10 = v1.0
+-------------------+
| Flags (1B) | Compression/Dict/Checksum
+-------------------+
| Dict ID (4B) | CRC32 of dictionary (big-endian)
+-------------------+
| Schema Hash (4B) | CRC32 of schema (big-endian)
+-------------------+
| Payload Len | LEB128 varint
+-------------------+
| Compressed Data | Zstd compressed protobuf
+-------------------+
| CRC32C (4B) | Optional checksum (if flag set)
+-------------------+
Dictionary Format Specification
Zeroc dictionary file (.zdict):
+----------------------+
| Magic "PZSTDICT" (8B)| 132-byte header
+----------------------+
| Version (12B) | SemVer, null-padded
+----------------------+
| Schema Name (64B) | Schema name, null-padded
+----------------------+
| Dict ID (4B) | CRC32 of dictionary data
+----------------------+
| Sample Count (4B) | Training sample count
+----------------------+
| Created (8B) | Unix timestamp
+----------------------+
| Compression Lvl (4B) | Zstd level used
+----------------------+
| Dict Size (4B) | Dictionary data size
+----------------------+
| Min Protobuf (4B) | Min protobuf size
+----------------------+
| Max Protobuf (4B) | Max protobuf size
+----------------------+
| SHA256 Prefix (8B) | First 8 bytes of SHA256
+----------------------+
| Reserved (8B) | Reserved for future use
+----------------------+
| Zstd Dict Data | Raw Zstd dictionary
+----------------------+
Troubleshooting
Import Error
ModuleNotFoundError: No module named 'zeroc'
Solution:
cd implementations/python
pip install -e .
Dictionary Not Found
FileNotFoundError: Dictionary not found: Order-1.0.0.zdict
Solution:
# Train dictionaries first
cd tools/dict-trainer
python train_dictionary.py --schema ecommerce.v1.Order --version 1.0.0
Dictionary ID Mismatch
ValueError: Dictionary ID mismatch: header says 0x12345678, calculated 0x87654321
Cause: Dictionary file is corrupted or modified.
Solution: Retrain the dictionary or download from CDN.
Checksum Mismatch
ValueError: Checksum mismatch: expected 0x12345678, got 0x87654321
Cause: Frame data is corrupted during transmission.
Solution: Check network transmission, retry request.
Contributing
See main repository CONTRIBUTING.md.
License
See main repository LICENSE.
Related Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file umitkavala_zeroc-1.0.2.tar.gz.
File metadata
- Download URL: umitkavala_zeroc-1.0.2.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d33bf3ff695454564d15541ec0808c44f67ca930f0f92fd7d67d022596c4b8d5
|
|
| MD5 |
800564ec58a748cef5018d1962761b0e
|
|
| BLAKE2b-256 |
52b99cc83f283e6ce327a64d37d9734acfedb096d6988b688c99900f3366d664
|
File details
Details for the file umitkavala_zeroc-1.0.2-py3-none-any.whl.
File metadata
- Download URL: umitkavala_zeroc-1.0.2-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3f1c5d96051a2ec3eb4844877c8dbe8c92c010173a9ff253376928495d63bb8
|
|
| MD5 |
133b63a35882fb52fdfb90a598c83e3c
|
|
| BLAKE2b-256 |
6fee7c8adc97d90fb1348d5dd53994316d02d553dbe157d548dc051a724398d4
|