Skip to main content

Zeroc: High-Performance API Compression Protocol

Project description

Zeroc Python Implementation

Python reference implementation of the Zeroc compression protocol.

Features

  • ✅ Complete wire format implementation (12-byte header + varint + payload)
  • ✅ Dictionary loader with 132-byte header validation
  • ✅ CRC32C checksum support
  • ✅ Dictionary caching for performance
  • ✅ Comprehensive test suite
  • ✅ Type annotations for IDE support

Installation

From Source

cd implementations/python
pip install -e .

Dependencies

pip install zstandard crc32c protobuf

Quick Start

Basic Compression (No Dictionary)

from zeroc import encode_frame, decode_frame, decompress_payload

# Encode protobuf to Zeroc frame
proto_bytes = b"Your protobuf serialized data"
frame = encode_frame(proto_bytes, compress=True, checksum=True)

# Decode frame
compressed, metadata = decode_frame(frame)

# Decompress
decompressed = decompress_payload(compressed, 0)

With Trained Dictionary

from zeroc import DictionaryLoader, encode_frame, decode_frame

# Load dictionary
loader = DictionaryLoader()
metadata, dict_obj = loader.load("dictionaries/formats/Order-1.0.0.zdict")

# Get compressor/decompressor
compressor = loader.get_compressor("dictionaries/formats/Order-1.0.0.zdict")
decompressor = loader.get_decompressor("dictionaries/formats/Order-1.0.0.zdict")

# Encode with dictionary
frame = encode_frame(
    proto_bytes,
    dictionary_id=metadata['dictionary_id'],
    compress=True,
    checksum=True,
    compressor=compressor
)

# Decode and decompress
compressed, frame_meta = decode_frame(frame)
decompressed = decompressor.decompress(compressed)

API Reference

Wire Format

encode_frame(proto_bytes, dictionary_id=0, schema_hash=0, compress=True, checksum=False, compressor=None)

Encode protobuf bytes to Zeroc frame.

Parameters:

  • proto_bytes (bytes): Protobuf binary data
  • dictionary_id (int): Dictionary ID (CRC32), 0 for no dictionary
  • schema_hash (int): Schema hash (CRC32)
  • compress (bool): Whether to compress (True) or identity (False)
  • checksum (bool): Whether to include CRC32C checksum
  • compressor (ZstdCompressor): Required if dictionary_id > 0

Returns:

  • bytes: Complete Zeroc frame

Example:

frame = encode_frame(proto_bytes, compress=True, checksum=True)

decode_frame(frame)

Decode Zeroc frame to payload and metadata.

Parameters:

  • frame (bytes): Complete Zeroc frame

Returns:

  • (bytes, dict): Tuple of (payload, metadata)

Metadata Fields:

  • version: Protocol version byte
  • major_version: Major version (1)
  • minor_version: Minor version (0)
  • flags: Flags byte
  • dictionary_id: Dictionary ID
  • schema_hash: Schema hash
  • compressed_size: Payload size in bytes
  • compression_enabled: Boolean
  • dictionary_used: Boolean
  • checksum_included: Boolean

Raises:

  • ValueError: If frame is invalid or malformed

Example:

compressed, metadata = decode_frame(frame)
print(f"Dictionary ID: 0x{metadata['dictionary_id']:08x}")

decompress_payload(compressed, dictionary_id, decompressor=None)

Decompress payload with optional dictionary.

Parameters:

  • compressed (bytes): Compressed bytes
  • dictionary_id (int): Dictionary ID (0 for no dictionary)
  • decompressor (ZstdDecompressor): Required if dictionary_id > 0

Returns:

  • bytes: Decompressed protobuf bytes

Example:

proto_bytes = decompress_payload(compressed, 0)  # No dictionary
proto_bytes = decompress_payload(compressed, dict_id, decompressor)  # With dictionary

Dictionary Loader

load_dictionary(filepath)

Load Zeroc dictionary from .zdict file.

Parameters:

  • filepath (str): Path to .zdict file

Returns:

  • (dict, ZstdCompressionDict): Tuple of (metadata, dictionary object)

Metadata Fields:

  • version: SemVer string (e.g., "1.0.0")
  • schema_name: Schema name (e.g., "ecommerce.v1.Order")
  • dictionary_id: Dictionary ID (CRC32)
  • sample_count: Number of training samples
  • created: Unix timestamp
  • compression_level: Zstd compression level
  • dict_size: Dictionary size in bytes
  • min_size: Minimum protobuf size
  • max_size: Maximum protobuf size
  • sha256_prefix: First 8 bytes of SHA256

Raises:

  • FileNotFoundError: If file doesn't exist
  • ValueError: If dictionary format is invalid

Example:

metadata, dict_obj = load_dictionary("Order-1.0.0.zdict")
print(f"Dictionary ID: 0x{metadata['dictionary_id']:08x}")

DictionaryLoader

Dictionary loader with caching.

Methods:

load(filepath)

Load dictionary with caching.

loader = DictionaryLoader()
metadata, dict_obj = loader.load("Order-1.0.0.zdict")
get_compressor(filepath, level=3)

Get ZstdCompressor for dictionary.

compressor = loader.get_compressor("Order-1.0.0.zdict", level=3)
compressed = compressor.compress(proto_bytes)
get_decompressor(filepath)

Get ZstdDecompressor for dictionary.

decompressor = loader.get_decompressor("Order-1.0.0.zdict")
decompressed = decompressor.decompress(compressed)
clear_cache()

Clear dictionary cache.

loader.clear_cache()

Constants

from zeroc import (
    MAGIC_BYTES,           # b'PZ'
    PROTOCOL_VERSION,      # 0x10 (v1.0)
    FLAG_COMPRESSION_ENABLED,
    FLAG_DICTIONARY_USED,
    FLAG_CHECKSUM_INCLUDED,
)

Examples

See examples/basic_usage.py for comprehensive examples:

cd implementations/python
python examples/basic_usage.py

Examples include:

  1. Compression without dictionary
  2. Compression with trained dictionary
  3. Identity frame (no compression)
  4. Batch compression (100 messages)

Testing

Run Tests

# Install test dependencies
pip install pytest

# Run all tests
cd implementations/python
pytest tests/ -v

# Run specific test
pytest tests/test_wire_format.py::TestFrameEncoding::test_identity_frame -v

# Run with coverage
pip install pytest-cov
pytest tests/ --cov=zeroc --cov-report=html

Test Coverage

  • Wire format: varint encoding/decoding, frame encoding/decoding, error handling
  • Dictionary loader: format validation, caching, real dictionaries
  • Round-trip tests: encode → decode → decompress

Performance

Typical performance on modern hardware:

Operation Latency Throughput
Encode (no dict) ~0.001ms ~1M ops/sec
Encode (with dict) ~0.002ms ~500K ops/sec
Decode (no dict) <0.001ms ~2M ops/sec
Decode (with dict) <0.001ms ~2M ops/sec

Wire Format Specification

Zeroc frame structure:

+-------------------+
| Magic "PZ" (2B)   |  Fixed header
+-------------------+
| Version (1B)      |  0x10 = v1.0
+-------------------+
| Flags (1B)        |  Compression/Dict/Checksum
+-------------------+
| Dict ID (4B)      |  CRC32 of dictionary (big-endian)
+-------------------+
| Schema Hash (4B)  |  CRC32 of schema (big-endian)
+-------------------+
| Payload Len       |  LEB128 varint
+-------------------+
| Compressed Data   |  Zstd compressed protobuf
+-------------------+
| CRC32C (4B)       |  Optional checksum (if flag set)
+-------------------+

Dictionary Format Specification

Zeroc dictionary file (.zdict):

+----------------------+
| Magic "PZSTDICT" (8B)|  132-byte header
+----------------------+
| Version (12B)        |  SemVer, null-padded
+----------------------+
| Schema Name (64B)    |  Schema name, null-padded
+----------------------+
| Dict ID (4B)         |  CRC32 of dictionary data
+----------------------+
| Sample Count (4B)    |  Training sample count
+----------------------+
| Created (8B)         |  Unix timestamp
+----------------------+
| Compression Lvl (4B) |  Zstd level used
+----------------------+
| Dict Size (4B)       |  Dictionary data size
+----------------------+
| Min Protobuf (4B)    |  Min protobuf size
+----------------------+
| Max Protobuf (4B)    |  Max protobuf size
+----------------------+
| SHA256 Prefix (8B)   |  First 8 bytes of SHA256
+----------------------+
| Reserved (8B)        |  Reserved for future use
+----------------------+
| Zstd Dict Data       |  Raw Zstd dictionary
+----------------------+

Troubleshooting

Import Error

ModuleNotFoundError: No module named 'zeroc'

Solution:

cd implementations/python
pip install -e .

Dictionary Not Found

FileNotFoundError: Dictionary not found: Order-1.0.0.zdict

Solution:

# Train dictionaries first
cd tools/dict-trainer
python train_dictionary.py --schema ecommerce.v1.Order --version 1.0.0

Dictionary ID Mismatch

ValueError: Dictionary ID mismatch: header says 0x12345678, calculated 0x87654321

Cause: Dictionary file is corrupted or modified.

Solution: Retrain the dictionary or download from CDN.

Checksum Mismatch

ValueError: Checksum mismatch: expected 0x12345678, got 0x87654321

Cause: Frame data is corrupted during transmission.

Solution: Check network transmission, retry request.

Contributing

See main repository CONTRIBUTING.md.

License

See main repository LICENSE.

Related Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

umitkavala_zeroc-1.0.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

umitkavala_zeroc-1.0.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file umitkavala_zeroc-1.0.0.tar.gz.

File metadata

  • Download URL: umitkavala_zeroc-1.0.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for umitkavala_zeroc-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d8d816042bbfda06ef574951727f5ec649d350022aa5463f931c4c574e784907
MD5 dddc522a94f9a85860e23dfc8ba6ffcf
BLAKE2b-256 0b0aa1f6c233fbe9cd9063d2beb9409063b707dd94c44a935d705d151c2d6bb0

See more details on using hashes here.

File details

Details for the file umitkavala_zeroc-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for umitkavala_zeroc-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d93b41eb570e4818590fc02034d9fec09045f7f898684c91fc35387f095d55de
MD5 3f391bae4d3aa36270e518560b2f1a51
BLAKE2b-256 1291347e8b58542a185af326cbe13ff75751aaa4a5b12aa25cf0053d455f61a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page