Skip to main content

Low-level interface to the zlib library that enables capturing the decoding state

Project description

zlib-state

Low-level interface to the zlib library that enables capturing the decoding state.

Install

From PyPi:

pip install zlib-state

From source:

pip install .

Tested on Ubuntu/macOs/Windows with Python 3.7-3.12.

GzipStateFile

Wraps Decompressor as a buffered reader.

Based on my benchmarking, this is somewhat slower than python's gzip.

A typical usage pattern looks like:

import zlib_state

TARGET_LINE = 5000 # pick back up after around the 5,000th line
# Specify keep_last_state=True to tell object to grab and keep the state and pos after each block
with zlib_state.GzipStateFile('testdata/frankenstein.txt.gz', keep_last_state=True) as f:
    for i, line in enumerate(f):
        if i == TARGET_LINE:
            state, pos = f.last_state, f.last_state_pos

with zlib_state.GzipStateFile('testdata/frankenstein.txt.gz') as f:
    f.zseek(pos, state)
    remainder = f.read()

Decompressor

Very basic decompression object that's picky and unforgiving.

Based on my benchmarking, this can iterate over gzip files faster than python's gzip.

A typical usage pattern looks like:

import zlib_state

decomp = zlib_state.Decompressor(32 + 15) # from zlib; 32 indicates gzip header, 15 window size
block_count = 0
with open('testdata/frankenstein.txt.gz', 'rb') as f:
    while not decomp.eof():
        needed_input = decomp.needs_input()
        if needed_input > 0:
            # decomp needs more input, and it tells you how much.
            decomp.feed_input(f.read(needed_input))
        # next_chunk may be empty (e.g., if finished with gzip headers) or may contain data.
        # It sends as much as it has left in its output buffer, or asks zlib to continue.
        next_chunk = decomp.read() # you can also pass a maximum size to take and/or a buffer to write to
        if decomp.block_boundary():
            block_count += 1
            # When it reaches the end of a deflate block, it always stops. At these times, you can grab the state
            # if you wish.
            if block_count == 4: # resume after the 4th block
                state = decomp.get_state() # includes zdict, bits, byte -- everything it needs to resume from pos
                pos = decomp.total_in() # the current position in the binary file to resume from
    print(f'{block_count} blocks processed')
    # resume from somewhere in the file. Only possible spots are the block boundaries, given the state
    f.seek(pos)
    decomp = zlib_state.Decompressor(-15) # from zlib; 15 window size, negative means no headers
    decomp.set_state(*state)
    while not decomp.eof():
        needed_input = decomp.needs_input()
        if needed_input > 0:
            # decomp needs more input, and it tells you how much.
            decomp.feed_input(f.read(needed_input))
        next_chunk = decomp.read()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zlib-state-0.1.6.tar.gz (9.5 kB view hashes)

Uploaded Source

Built Distributions

zlib_state-0.1.6-pp37-pypy37_pp73-manylinux2010_x86_64.whl (57.0 kB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ x86-64

zlib_state-0.1.6-pp36-pypy36_pp73-manylinux2010_x86_64.whl (57.0 kB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ x86-64

zlib_state-0.1.6-cp312-cp312-win_amd64.whl (12.5 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

zlib_state-0.1.6-cp311-cp311-win_amd64.whl (12.5 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

zlib_state-0.1.6-cp310-cp310-win_amd64.whl (12.5 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

zlib_state-0.1.6-cp39-cp39-win_amd64.whl (12.5 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

zlib_state-0.1.6-cp39-cp39-manylinux2010_x86_64.whl (72.0 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

zlib_state-0.1.6-cp39-cp39-manylinux2010_i686.whl (68.4 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ i686

zlib_state-0.1.6-cp39-cp39-manylinux1_x86_64.whl (72.0 kB view hashes)

Uploaded CPython 3.9

zlib_state-0.1.6-cp39-cp39-manylinux1_i686.whl (68.4 kB view hashes)

Uploaded CPython 3.9

zlib_state-0.1.6-cp38-cp38-win_amd64.whl (12.5 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

zlib_state-0.1.6-cp38-cp38-manylinux2010_x86_64.whl (72.6 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

zlib_state-0.1.6-cp38-cp38-manylinux2010_i686.whl (69.0 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

zlib_state-0.1.6-cp38-cp38-manylinux1_x86_64.whl (72.6 kB view hashes)

Uploaded CPython 3.8

zlib_state-0.1.6-cp38-cp38-manylinux1_i686.whl (69.0 kB view hashes)

Uploaded CPython 3.8

zlib_state-0.1.6-cp37-cp37m-win_amd64.whl (12.4 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

zlib_state-0.1.6-cp37-cp37m-manylinux2010_x86_64.whl (73.0 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

zlib_state-0.1.6-cp37-cp37m-manylinux2010_i686.whl (69.3 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ i686

zlib_state-0.1.6-cp37-cp37m-manylinux1_x86_64.whl (73.0 kB view hashes)

Uploaded CPython 3.7m

zlib_state-0.1.6-cp37-cp37m-manylinux1_i686.whl (69.3 kB view hashes)

Uploaded CPython 3.7m

zlib_state-0.1.6-cp36-cp36m-manylinux2010_x86_64.whl (72.0 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

zlib_state-0.1.6-cp36-cp36m-manylinux2010_i686.whl (68.4 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.12+ i686

zlib_state-0.1.6-cp36-cp36m-manylinux1_x86_64.whl (72.0 kB view hashes)

Uploaded CPython 3.6m

zlib_state-0.1.6-cp36-cp36m-manylinux1_i686.whl (68.4 kB view hashes)

Uploaded CPython 3.6m

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page