Skip to main content

Fixed-chunk dedup + entropy compression utility (MSH1 binary format)

Project description

mshzip (Python)

고정 청크 dedup + 엔트로피 압축 유틸 — Python/UV 버전

MSH1 바이너리 포맷을 사용하며, Node.js 구현체와 100% 교차 호환됩니다. 표준 라이브러리만 사용 (외부 의존성 없음).

설치

# uv (권장)
uv pip install mshzip

# pip
pip install mshzip

CLI 사용법

# 압축
mshzip pack -i data.bin -o data.msh
mshzip pack -i data.bin -o data.msh --chunk 1024 --crc --verbose

# 해제
mshzip unpack -i data.msh -o data.bin

# 파일 정보
mshzip info -i data.msh

# 병렬 처리
mshzip multi pack file1.bin file2.bin file3.bin --out-dir ./compressed --workers 4
mshzip multi unpack compressed/*.msh --out-dir ./restored

# stdin/stdout 파이프
cat data.bin | mshzip pack -i - -o - > data.msh
mshzip unpack -i data.msh -o - | sha256sum

Python API

간편 API

import mshzip

# 압축
compressed = mshzip.pack(b'hello world' * 100)

# 해제
original = mshzip.unpack(compressed)

# 옵션
compressed = mshzip.pack(data, chunk_size=1024, codec='gzip', crc=True)

Packer / Unpacker 클래스

from mshzip import Packer, Unpacker

packer = Packer(chunk_size=256, codec='gzip', crc=True)
compressed = packer.pack(data)

unpacker = Unpacker()
restored = unpacker.unpack(compressed)

스트리밍 API

from mshzip import PackStream, UnpackStream, pack_stream, unpack_stream

# Generator 기반 스트리밍
ps = PackStream(chunk_size=128)
for frame in ps.feed(data):
    output.write(frame)
for frame in ps.flush():
    output.write(frame)

# 파일 I/O 편의 함수
with open('input.bin', 'rb') as inp, open('output.msh', 'wb') as out:
    stats = pack_stream(inp, out, chunk_size=256)

with open('output.msh', 'rb') as inp, open('restored.bin', 'wb') as out:
    stats = unpack_stream(inp, out)

병렬 처리

from mshzip.parallel import WorkerPool, Task

pool = WorkerPool(4)
results = pool.run_all([
    Task(type='pack', input_path='a.bin', output_path='a.msh'),
    Task(type='pack', input_path='b.bin', output_path='b.msh'),
])
pool.shutdown()

for r in results:
    print(f'{r.success}: {r.input_size} -> {r.output_size} ({r.elapsed_ms}ms)')

CLI 옵션

옵션 기본값 설명
--chunk <N> 128 청크 크기 (8 ~ 16,777,216B)
--frame <N> 67108864 프레임당 최대 바이트 (64MB)
--codec <종류> gzip gzip 또는 none
--crc off CRC32 체크섬 추가
--verbose off 상세 출력
--workers <N> CPU 코어 수 병렬 Worker 수 (multi 명령)

테스트

# uv
uv run pytest

# pytest 직접
pytest tests/ -v

253개 테스트: varint(30) + packer(16) + unpacker(14) + roundtrip(131) + compat(32) + stream(16) + cli(7) + parallel(7)

요구 사항

  • Python 3.10+
  • 외부 의존성 없음 (표준 라이브러리만 사용)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mshzip-1.1.0.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mshzip-1.1.0-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file mshzip-1.1.0.tar.gz.

File metadata

  • Download URL: mshzip-1.1.0.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mshzip-1.1.0.tar.gz
Algorithm Hash digest
SHA256 f04135f8d005d55a67549f15674a1e0b476e112be00b8cc069d89d0fed0152d3
MD5 f80e14ae2c6922bab0afcae9850485fc
BLAKE2b-256 7dc462bdb076db1f168b220eb2e754b1827c3fbf2f5edbd3c6845a7906578546

See more details on using hashes here.

File details

Details for the file mshzip-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: mshzip-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mshzip-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 20dbb2ca12a7b7265affcd2a07c6aed877ff554d083b83194fa02355310b9fb4
MD5 b59915d8cb0d7ea1aaba18210f587728
BLAKE2b-256 1e4ca838f0445b21ab578a99b7a53b65baecf9c98da0c10d485e2ead444403f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page