Skip to main content

Utility for randomly accessing DEFLATE-compressed data

Project description

ZRAN

Random read access for ZLIB, GZIP and DEFLATE file formats

Description

zran is a Python extension that wraps the zran library, which was created by Mark Adler (the creator of zlib). This utility will create an index that will allow you to begin decompressing DEFLATE-compressed data (ZLIB, GZIP, or DEFLATE format) from compression block boundaries on subsequent reads. This effectively allows you to randomly access DEFLATE-compressed data once the index is created.

Installation

zran can be installed in your preferred Python environment via pip:

python -m pip install zran

Currently, only macOS/Linux x86_64 and ARM64 architectures are supported. Please open an issue or submit a PR if you would like to see support for other platforms!

Usage

To use zran, you need to:

  1. Create an index for a compressed file
  2. Save this index
  3. Use this index to access the data on subsequent reads

To create and save the index:

import zran

with open('compressed.gz', 'rb') as f:
    compressed_file = f.read()
    index = zran.Index.create_index(compressed_file)

This Index can be written to a file (index.to_file('index.dflidx')), or directly passed to zran.deompress:

start = 1000
length = 2000
data = zran.decompress(compressed_file, index, start, length)

That's it!

Contributing

We use the standard GitHub flow to manage contributions to this project. Check out this documentation if you are unfamiliar with this process.

You can install a development version of zran via pip as well:

git clone https://github.com/forrestfwilliams/zran.git
cd zran
python -m pip install .

Then, run pytest to ensure that all tests are passing. We use black with line-length 120 for formatting and ruff for linting. Please ensure that your code is correctly formatted and linted before submitting a PR. As far as I can tell, pip installing with the --editable command is not valid when the code needs to be compiled, so you will need to re-install the package if you make any changes.

Similar Projects

If you prefer to work in the C programming language, you may want to work directly with the zran source C code in the zlib library. Paul McCarthy's indexed_gzip library was a huge inspiration for this project, and in particular was a huge help while creating our setup.py file. If you plan to work exclusively with gzip files, you may be better served by the indexed_gzip library. However, this project has some unique functionality that sets it apart:

  • Use of the most up-to-date version of the zran C library
  • Support for ZLIB, GZIP, and DEFLATE formatted data
  • Greater visibility into the contents of indexes
  • Compression of the indexes when written to a file, leading to smaller index file sizes
  • The ability to modify the points contained within an index via the Index.create_modified_index() method

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zran-0.1.0.tar.gz (128.1 kB view hashes)

Uploaded Source

Built Distributions

zran-0.1.0-cp311-cp311-musllinux_1_1_x86_64.whl (585.3 kB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

zran-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (592.5 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

zran-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (198.5 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

zran-0.1.0-cp311-cp311-macosx_10_9_x86_64.whl (204.1 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

zran-0.1.0-cp310-cp310-musllinux_1_1_x86_64.whl (549.8 kB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

zran-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (549.3 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

zran-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (197.9 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

zran-0.1.0-cp310-cp310-macosx_10_9_x86_64.whl (203.3 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

zran-0.1.0-cp39-cp39-musllinux_1_1_x86_64.whl (552.6 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

zran-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (551.7 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

zran-0.1.0-cp39-cp39-macosx_11_0_arm64.whl (198.5 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

zran-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl (203.7 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

zran-0.1.0-cp38-cp38-musllinux_1_1_x86_64.whl (591.0 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

zran-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (566.0 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

zran-0.1.0-cp38-cp38-macosx_11_0_arm64.whl (198.7 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

zran-0.1.0-cp38-cp38-macosx_10_9_x86_64.whl (203.5 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page