Skip to main content

고정 청크 dedup + 엔트로피 압축 유틸 (MSH1 바이너리 포맷)

Project description

mshzip (Python)

고정 청크 dedup + 엔트로피 압축 유틸 — Python/UV 버전

MSH1 바이너리 포맷을 사용하며, Node.js 구현체와 100% 교차 호환됩니다. 표준 라이브러리만 사용 (외부 의존성 없음).

설치

# uv (권장)
uv pip install mshzip

# pip
pip install mshzip

CLI 사용법

# 압축
mshzip pack -i data.bin -o data.msh
mshzip pack -i data.bin -o data.msh --chunk 1024 --crc --verbose

# 해제
mshzip unpack -i data.msh -o data.bin

# 파일 정보
mshzip info -i data.msh

# 병렬 처리
mshzip multi pack file1.bin file2.bin file3.bin --out-dir ./compressed --workers 4
mshzip multi unpack compressed/*.msh --out-dir ./restored

# stdin/stdout 파이프
cat data.bin | mshzip pack -i - -o - > data.msh
mshzip unpack -i data.msh -o - | sha256sum

Python API

간편 API

import mshzip

# 압축
compressed = mshzip.pack(b'hello world' * 100)

# 해제
original = mshzip.unpack(compressed)

# 옵션
compressed = mshzip.pack(data, chunk_size=1024, codec='gzip', crc=True)

Packer / Unpacker 클래스

from mshzip import Packer, Unpacker

packer = Packer(chunk_size=256, codec='gzip', crc=True)
compressed = packer.pack(data)

unpacker = Unpacker()
restored = unpacker.unpack(compressed)

스트리밍 API

from mshzip import PackStream, UnpackStream, pack_stream, unpack_stream

# Generator 기반 스트리밍
ps = PackStream(chunk_size=128)
for frame in ps.feed(data):
    output.write(frame)
for frame in ps.flush():
    output.write(frame)

# 파일 I/O 편의 함수
with open('input.bin', 'rb') as inp, open('output.msh', 'wb') as out:
    stats = pack_stream(inp, out, chunk_size=256)

with open('output.msh', 'rb') as inp, open('restored.bin', 'wb') as out:
    stats = unpack_stream(inp, out)

병렬 처리

from mshzip.parallel import WorkerPool, Task

pool = WorkerPool(4)
results = pool.run_all([
    Task(type='pack', input_path='a.bin', output_path='a.msh'),
    Task(type='pack', input_path='b.bin', output_path='b.msh'),
])
pool.shutdown()

for r in results:
    print(f'{r.success}: {r.input_size} -> {r.output_size} ({r.elapsed_ms}ms)')

CLI 옵션

옵션 기본값 설명
--chunk <N> 128 청크 크기 (8 ~ 16,777,216B)
--frame <N> 67108864 프레임당 최대 바이트 (64MB)
--codec <종류> gzip gzip 또는 none
--crc off CRC32 체크섬 추가
--verbose off 상세 출력
--workers <N> CPU 코어 수 병렬 Worker 수 (multi 명령)

테스트

# uv
uv run pytest

# pytest 직접
pytest tests/ -v

253개 테스트: varint(30) + packer(16) + unpacker(14) + roundtrip(131) + compat(32) + stream(16) + cli(7) + parallel(7)

요구 사항

  • Python 3.10+
  • 외부 의존성 없음 (표준 라이브러리만 사용)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mshzip-1.0.0.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mshzip-1.0.0-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file mshzip-1.0.0.tar.gz.

File metadata

  • Download URL: mshzip-1.0.0.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mshzip-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8e633d7eb472cc36228fda1355e7eca5f754f1d6d1502a015743fa1ad94da376
MD5 8dfcbe307ef0f067e94f3014f37f9a41
BLAKE2b-256 4441abc0ae4a9e1a83a4b1c50845525346680a46fff89e4cb0f9efa99b91ce5b

See more details on using hashes here.

File details

Details for the file mshzip-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: mshzip-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mshzip-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1df54f91348ef68f3bc02be96b7347281b3b38b4a016cfa96022c1c37e4509b3
MD5 46cbff6a6501e673f4883ccc1bcff295
BLAKE2b-256 9433a4a5d3cc096387fe5db2caead931727885bd8c44bb917815f95703d7588a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page