Skip to main content

Open compressed files transparently

Project description

https://img.shields.io/pypi/v/xopen.svg?branch=master

xopen

This Python module provides an xopen function that works like the built-in open function but also transparently deals with compressed files. Supported compression formats are currently gzip, bzip2 and xz.

xopen selects the most efficient method for reading or writing a compressed file. This often means opening a pipe to an external tool, such as pigz, which is a parallel version of gzip, or igzip, which is a highly optimized version of gzip.

If threads=0 is passed to xopen(), no external process is used. For gzip files, this will then use python-isal (which binds isa-l) if it is installed (since python-isal is a dependency of xopen, this should always be the case). Neither igzip nor python-isal support compression levels greater 3, so if no external tool is available or threads has been set to 0, Python’s built-in gzip.open is used.

For xz files, a pipe to the xz program is used because it has built-in support for multithreaded compression.

For bz2 files, pbzip2 (parallel bzip2) is used.

xopen falls back to Python’s built-in functions (gzip.open, lzma.open, bz2.open) if none of the other methods can be used.

The file format to use is determined from the file name if the extension is recognized (.gz, .bz2 or .xz). When reading a file without a recognized file extension, xopen attempts to detect the format by reading the first couple of bytes from the file.

xopen is compatible with Python versions 3.7 and later.

Usage

Open a file for reading:

from xopen import xopen

with xopen("file.txt.gz") as f:
    content = f.read()

Write to a file in binary mode, set the compression level and avoid using an external process:

from xopen import xopen

with xopen("file.txt.xz", mode="wb", threads=0, compresslevel=3)
    f.write(b"Hello")

Changes

v1.5.0 (2022-03-23)

  • #100: Dropped Python 3.6 support

  • #101: Added support for piping into and from an external xz process. Contributed by @fanninpm.

  • #102: Support setting the xz compression level. Contributed by @tsibley.

v1.4.0 (2022-01-14)

  • Add seek() and tell() to the PipedCompressionReader classes (for Windows compatibility)

v1.3.0 (2022-01-10)

  • xopen is now available on Windows (in addition to Linux and macOS).

  • For greater compatibility with the built-in open() function, xopen() has gained the parameters encoding, errors and newlines with the same meaning as in open(). Unlike built-in open(), though, encoding is UTF-8 by default.

  • A parameter format has been added that allows to force the compression file format.

v1.2.0 (2021-09-21)

  • pbzip2 is now used to open .bz2 files if threads is greater than zero (contributed by @DriesSchaumont).

v1.1.0 (2021-01-20)

  • Python 3.5 support is dropped.

  • On Linux systems, python-isal is now added as a requirement. This will speed up the reading of gzip files significantly when no external processes are used.

v1.0.0 (2020-11-05)

  • If installed, the igzip program (part of Intel ISA-L) is now used for reading and writing gzip-compressed files at compression levels 1-3, which results in a significant speedup.

v0.9.0 (2020-04-02)

  • #80: When the file name extension of a file to be opened for reading is not available, the content is inspected (if possible) and used to determine which compression format applies (contributed by @bvaisvil).

  • This release drops Python 2.7 and 3.4 support. Python 3.5 or later is now required.

v0.8.4 (2019-10-24)

  • When reading gzipped files, force pigz to use only a single process. pigz cannot use multiple cores anyway when decompressing. By default, it would use extra I/O processes, which slightly reduces wall-clock time, but increases CPU time. Single-core decompression with pigz is still about twice as fast as regular gzip.

  • Allow threads=0 for specifying that no external pigz/gzip process should be used (then regular gzip.open() is used instead).

v0.8.3 (2019-10-18)

  • #20: When reading gzipped files, let pigz use at most four threads by default. This limit previously only applied when writing to a file. Contributed by @bernt-matthias.

  • Support Python 3.8

v0.8.0 (2019-08-14)

  • #14: Speed improvements when iterating over gzipped files.

v0.6.0 (2019-05-23)

  • For reading from gzipped files, xopen will now use a pigz subprocess. This is faster than using gzip.open.

  • Python 2 support will be dropped in one of the next releases.

v0.5.0 (2019-01-30)

  • By default, pigz is now only allowed to use at most four threads. This hopefully reduces problems some users had with too many threads when opening many files at the same time.

  • xopen now accepts pathlib.Path objects.

v0.4.0 (2019-01-07)

  • Drop Python 3.3 support

  • Add a threads parameter (passed on to pigz)

v0.3.2 (2017-11-22)

  • #6: Make multi-block bz2 work on Python 2 by using external bz2file library.

v0.3.1 (2017-11-22)

  • Drop Python 2.6 support

  • #5: Fix PipedGzipReader.read() not returning anything

v0.3.0 (2017-11-15)

  • Add gzip compression parameter

v0.2.1 (2017-05-31)

  • #3: Allow appending to bz2 and lzma files where possible

v0.1.1 (2016-12-02)

  • Fix a deadlock

v0.1.0 (2016-09-09)

  • Initial release

Credits

The name xopen was taken from the C function of the same name in the utils.h file which is part of BWA.

Some ideas were taken from the canopener project. If you also want to open S3 files, you may want to use that module instead.

@kyleabeauchamp contributed support for appending to files before this repository was created.

Maintainers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xopen-1.5.0.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

xopen-1.5.0-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file xopen-1.5.0.tar.gz.

File metadata

  • Download URL: xopen-1.5.0.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.13

File hashes

Hashes for xopen-1.5.0.tar.gz
Algorithm Hash digest
SHA256 822b926afd39b6904e5d2fdee6e0944d342023f2a42339103c1507b0da48c693
MD5 fbe21bcbd587200f44c5f7c995bc91c3
BLAKE2b-256 3f693b46f6f6a243ccfe12cc7d990b001dfb37055c13017c57aca294b08e4895

See more details on using hashes here.

File details

Details for the file xopen-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: xopen-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.13

File hashes

Hashes for xopen-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 114b3b7b8a874863cc87af1750b99a8365bd7f6ff100a803b6348de76d4a79c4
MD5 c920d2ea03b1f788455772ec29192420
BLAKE2b-256 8d1eacf07bd3be5c07d36aa4362b4a3176d9961c341cbb2340cbd74b18821336

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page