Skip to main content

Fast random access of gzip files in Python

Project description

https://travis-ci.org/pauldmccarthy/indexed_gzip.svg?branch=master

The indexed_gzip project is a Python extension which aims to provide a drop-in replacement for the built-in Python gzip.GzipFile class, the IndexedGzipFile.

indexed_gzip was written to allow fast random access of compressed NIFTI image files (for which GZIP is the de-facto compression standard), but will work with any GZIP file. indexed_gzip is easy to use with nibabel (http://nipy.org/nibabel/).

The standard gzip.GzipFile class exposes a random access-like interface (via its seek and read methods), but every time you seek to a new point in the uncompressed data stream, the GzipFile instance has to start decompressing from the beginning of the file, until it reaches the requested location.

An IndexedGzipFile instance gets around this performance limitation by building an index, which contains seek points, mappings between corresponding locations in the compressed and uncompressed data streams. Each seek point is accompanied by a chunk (32KB) of uncompressed data which is used to initialise the decompression algorithm, allowing us to start reading from any seek point. If the index is built with a seek point spacing of 1MB, we only have to decompress (on average) 512KB of data to read from any location in the file.

See https://github.com/pauldmccarthy/indexed_gzip for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indexed_gzip-0.3.1.tar.gz (222.2 kB view hashes)

Uploaded Source

Built Distributions

indexed_gzip-0.3.1-cp35-cp35m-macosx_10_6_intel.whl (411.3 kB view hashes)

Uploaded CPython 3.5m macOS 10.6+ intel

indexed_gzip-0.3.1-cp27-cp27m-macosx_10_6_intel.whl (428.1 kB view hashes)

Uploaded CPython 2.7m macOS 10.6+ intel

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page