Fixed-chunk dedup + entropy compression utility (MSH1 binary format)
Project description
mshzip (Python)
고정 청크 dedup + 엔트로피 압축 유틸 — Python/UV 버전
MSH1 바이너리 포맷을 사용하며, Node.js 구현체와 100% 교차 호환됩니다. 표준 라이브러리만 사용 (외부 의존성 없음).
설치
# uv (권장)
uv pip install mshzip
# pip
pip install mshzip
CLI 사용법
# 압축
mshzip pack -i data.bin -o data.msh
mshzip pack -i data.bin -o data.msh --chunk 1024 --crc --verbose
# 해제
mshzip unpack -i data.msh -o data.bin
# 파일 정보
mshzip info -i data.msh
# 병렬 처리
mshzip multi pack file1.bin file2.bin file3.bin --out-dir ./compressed --workers 4
mshzip multi unpack compressed/*.msh --out-dir ./restored
# stdin/stdout 파이프
cat data.bin | mshzip pack -i - -o - > data.msh
mshzip unpack -i data.msh -o - | sha256sum
Python API
간편 API
import mshzip
# 압축
compressed = mshzip.pack(b'hello world' * 100)
# 해제
original = mshzip.unpack(compressed)
# 옵션
compressed = mshzip.pack(data, chunk_size=1024, codec='gzip', crc=True)
Packer / Unpacker 클래스
from mshzip import Packer, Unpacker
packer = Packer(chunk_size=256, codec='gzip', crc=True)
compressed = packer.pack(data)
unpacker = Unpacker()
restored = unpacker.unpack(compressed)
스트리밍 API
from mshzip import PackStream, UnpackStream, pack_stream, unpack_stream
# Generator 기반 스트리밍
ps = PackStream(chunk_size=128)
for frame in ps.feed(data):
output.write(frame)
for frame in ps.flush():
output.write(frame)
# 파일 I/O 편의 함수
with open('input.bin', 'rb') as inp, open('output.msh', 'wb') as out:
stats = pack_stream(inp, out, chunk_size=256)
with open('output.msh', 'rb') as inp, open('restored.bin', 'wb') as out:
stats = unpack_stream(inp, out)
병렬 처리
from mshzip.parallel import WorkerPool, Task
pool = WorkerPool(4)
results = pool.run_all([
Task(type='pack', input_path='a.bin', output_path='a.msh'),
Task(type='pack', input_path='b.bin', output_path='b.msh'),
])
pool.shutdown()
for r in results:
print(f'{r.success}: {r.input_size} -> {r.output_size} ({r.elapsed_ms}ms)')
CLI 옵션
| 옵션 | 기본값 | 설명 |
|---|---|---|
--chunk <N> |
128 | 청크 크기 (8 ~ 16,777,216B) |
--frame <N> |
67108864 | 프레임당 최대 바이트 (64MB) |
--codec <종류> |
gzip | gzip 또는 none |
--crc |
off | CRC32 체크섬 추가 |
--verbose |
off | 상세 출력 |
--workers <N> |
CPU 코어 수 | 병렬 Worker 수 (multi 명령) |
테스트
# uv
uv run pytest
# pytest 직접
pytest tests/ -v
253개 테스트: varint(30) + packer(16) + unpacker(14) + roundtrip(131) + compat(32) + stream(16) + cli(7) + parallel(7)
요구 사항
- Python 3.10+
- 외부 의존성 없음 (표준 라이브러리만 사용)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mshzip-1.1.0.tar.gz
(26.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
mshzip-1.1.0-py3-none-any.whl
(21.8 kB
view details)
File details
Details for the file mshzip-1.1.0.tar.gz.
File metadata
- Download URL: mshzip-1.1.0.tar.gz
- Upload date:
- Size: 26.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f04135f8d005d55a67549f15674a1e0b476e112be00b8cc069d89d0fed0152d3
|
|
| MD5 |
f80e14ae2c6922bab0afcae9850485fc
|
|
| BLAKE2b-256 |
7dc462bdb076db1f168b220eb2e754b1827c3fbf2f5edbd3c6845a7906578546
|
File details
Details for the file mshzip-1.1.0-py3-none-any.whl.
File metadata
- Download URL: mshzip-1.1.0-py3-none-any.whl
- Upload date:
- Size: 21.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20dbb2ca12a7b7265affcd2a07c6aed877ff554d083b83194fa02355310b9fb4
|
|
| MD5 |
b59915d8cb0d7ea1aaba18210f587728
|
|
| BLAKE2b-256 |
1e4ca838f0445b21ab578a99b7a53b65baecf9c98da0c10d485e2ead444403f5
|