Skip to main content

Fixed-chunk dedup + entropy compression utility (MSH1 binary format)

Project description

mshzip (Python)

고정 청크 dedup + 엔트로피 압축 유틸 — Python/UV 버전

MSH1 바이너리 포맷을 사용하며, Node.js 구현체와 100% 교차 호환됩니다. 표준 라이브러리만 사용 (외부 의존성 없음).

설치

# uv (권장)
uv pip install mshzip

# pip
pip install mshzip

CLI 사용법

# 압축
mshzip pack -i data.bin -o data.msh
mshzip pack -i data.bin -o data.msh --chunk 1024 --crc --verbose

# 해제
mshzip unpack -i data.msh -o data.bin

# 파일 정보
mshzip info -i data.msh

# 병렬 처리
mshzip multi pack file1.bin file2.bin file3.bin --out-dir ./compressed --workers 4
mshzip multi unpack compressed/*.msh --out-dir ./restored

# stdin/stdout 파이프
cat data.bin | mshzip pack -i - -o - > data.msh
mshzip unpack -i data.msh -o - | sha256sum

Python API

간편 API

import mshzip

# 압축
compressed = mshzip.pack(b'hello world' * 100)

# 해제
original = mshzip.unpack(compressed)

# 옵션
compressed = mshzip.pack(data, chunk_size=1024, codec='gzip', crc=True)

Packer / Unpacker 클래스

from mshzip import Packer, Unpacker

packer = Packer(chunk_size=256, codec='gzip', crc=True)
compressed = packer.pack(data)

unpacker = Unpacker()
restored = unpacker.unpack(compressed)

스트리밍 API

from mshzip import PackStream, UnpackStream, pack_stream, unpack_stream

# Generator 기반 스트리밍
ps = PackStream(chunk_size=128)
for frame in ps.feed(data):
    output.write(frame)
for frame in ps.flush():
    output.write(frame)

# 파일 I/O 편의 함수
with open('input.bin', 'rb') as inp, open('output.msh', 'wb') as out:
    stats = pack_stream(inp, out, chunk_size=256)

with open('output.msh', 'rb') as inp, open('restored.bin', 'wb') as out:
    stats = unpack_stream(inp, out)

병렬 처리

from mshzip.parallel import WorkerPool, Task

pool = WorkerPool(4)
results = pool.run_all([
    Task(type='pack', input_path='a.bin', output_path='a.msh'),
    Task(type='pack', input_path='b.bin', output_path='b.msh'),
])
pool.shutdown()

for r in results:
    print(f'{r.success}: {r.input_size} -> {r.output_size} ({r.elapsed_ms}ms)')

CLI 옵션

옵션 기본값 설명
--chunk <N> 128 청크 크기 (8 ~ 16,777,216B)
--frame <N> 67108864 프레임당 최대 바이트 (64MB)
--codec <종류> gzip gzip 또는 none
--crc off CRC32 체크섬 추가
--verbose off 상세 출력
--workers <N> CPU 코어 수 병렬 Worker 수 (multi 명령)

테스트

# uv
uv run pytest

# pytest 직접
pytest tests/ -v

253개 테스트: varint(30) + packer(16) + unpacker(14) + roundtrip(131) + compat(32) + stream(16) + cli(7) + parallel(7)

요구 사항

  • Python 3.10+
  • 외부 의존성 없음 (표준 라이브러리만 사용)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mshzip-1.0.1.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mshzip-1.0.1-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file mshzip-1.0.1.tar.gz.

File metadata

  • Download URL: mshzip-1.0.1.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mshzip-1.0.1.tar.gz
Algorithm Hash digest
SHA256 6f3c48ab6c433a020cdcd3b870b6054c93add0767a4523b43592ebc070d8411c
MD5 dd23f5f1d606655e79ee5c45ff17f97e
BLAKE2b-256 44cfd79ae14ebe1360e51147a7569ffd623d692f67cfb9890e1d4a27ae202241

See more details on using hashes here.

File details

Details for the file mshzip-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: mshzip-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mshzip-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b388d42a6e32f10a8570858b05e51c41470dcc377b865798f2d7fb36a7c8b98e
MD5 bce62ad1b8ed004c6f6440ecd6a75f0c
BLAKE2b-256 87c7d084d6edab603c5b10fb6807c6bfd1f3f276f1c8f60c93ca738fe47e3914

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page