Skip to main content

Open compressed files transparently

Project description

https://img.shields.io/pypi/v/xopen.svg?branch=master

xopen

This small Python module provides an xopen function that works like the built-in open function, but can also deal with compressed files. Supported compression formats are gzip, bzip2 and xz. They are automatically recognized by their file extensions .gz, .bz2 or .xz.

The focus is on being as efficient as possible on all supported Python versions. For example, xopen uses pigz, which is a parallel version of gzip, to open .gz files, which is faster than using the built-in gzip.open function. pigz can use multiple threads when compressing, but is also faster when reading .gz files, so it is used both for reading and writing if it is available. For gzip compression levels 1 to 3, igzip is used for an even greater speedup.

For use cases where using only the main thread is desired xopen can be used with threads=0. This will use python-isal (which binds isa-l) if python-isal is installed (automatic on Linux systems, as it is a requirement). For installation instructions for python-isal please checkout the python-isal homepage. If python-isal is not available gzip.open is used.

This module has originally been developed as part of the Cutadapt tool that is used in bioinformatics to manipulate sequencing data. It has been in successful use within that software for a few years.

xopen is compatible with Python versions 3.6 and later.

Usage

Open a file for reading:

from xopen import xopen

with xopen('file.txt.xz') as f:
    content = f.read()

Or without context manager:

from xopen import xopen

f = xopen('file.txt.xz')
content = f.read()
f.close()

Open a file in binary mode for writing:

from xopen import xopen

with xopen('file.txt.gz', mode='wb') as f:
    f.write(b'Hello')

Credits

The name xopen was taken from the C function of the same name in the utils.h file which is part of BWA.

Kyle Beauchamp <https://github.com/kyleabeauchamp/> has contributed support for appending to files.

Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to make reading and writing gzipped files faster.

Benjamin Vaisvil <https://github.com/bvaisvil> contributed support for format detection from content.

Dries Schaumont <https://github.com/DriesSchaumont> contributed support for faster bz2 reading and writing using pbzip2.

Some ideas were taken from the canopener project. If you also want to open S3 files, you may want to use that module instead.

Changes

v1.4.0

  • Add seek() and tell() to the PipedCompressionReader classes (for Windows compatibility)

v1.3.0

  • xopen is now available on Windows (in addition to Linux and macOS).

  • For greater compatibility with the built-in open() function, xopen() has gained the parameters encoding, errors and newlines with the same meaning as in open(). Unlike built-in open(), though, encoding is UTF-8 by default.

  • A parameter format has been added that allows to force the compression file format.

v1.2.0

  • pbzip2 is now used to open .bz2 files if threads is greater than zero.

v1.1.0

  • Python 3.5 support is dropped.

  • On Linux systems, python-isal is now added as a requirement. This will speed up the reading of gzip files significantly when no external processes are used.

v1.0.0

  • If installed, the igzip program (part of Intel ISA-L) is now used for reading and writing gzip-compressed files at compression levels 1-3, which results in a significant speedup.

v0.9.0

  • When the file name extension of a file to be opened for reading is not available, the content is inspected (if possible) and used to determine which compression format applies.

  • This release drops Python 2.7 and 3.4 support. Python 3.5 or later is now required.

v0.8.4

  • When reading gzipped files, force pigz to use only a single process. pigz cannot use multiple cores anyway when decompressing. By default, it would use extra I/O processes, which slightly reduces wall-clock time, but increases CPU time. Single-core decompression with pigz is still about twice as fast as regular gzip.

  • Allow threads=0 for specifying that no external pigz/gzip process should be used (then regular gzip.open() is used instead).

v0.8.3

  • When reading gzipped files, let pigz use at most four threads by default. This limit previously only applied when writing to a file.

  • Support Python 3.8

v0.8.0

  • Speed improvements when iterating over gzipped files.

v0.6.0

  • For reading from gzipped files, xopen will now use a pigz subprocess. This is faster than using gzip.open.

  • Python 2 support will be dropped in one of the next releases.

v0.5.0

  • By default, pigz is now only allowed to use at most four threads. This hopefully reduces problems some users had with too many threads when opening many files at the same time.

  • xopen now accepts pathlib.Path objects.

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xopen-1.4.0.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

xopen-1.4.0-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file xopen-1.4.0.tar.gz.

File metadata

  • Download URL: xopen-1.4.0.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for xopen-1.4.0.tar.gz
Algorithm Hash digest
SHA256 69d6d1d8a18efe49fc3eb51cd558a2a538c6f76495d1732d259016f58b124498
MD5 e1763ff95f3bd6fe30b6af820f11ba80
BLAKE2b-256 56644e7774b372b950def62494c1723d8037011ad4433cf60e1564c324b2e8ff

See more details on using hashes here.

File details

Details for the file xopen-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: xopen-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for xopen-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b91b75c447c404c630644af63000f53829c119e1c9f8abd1bfe529c83a5dc3b
MD5 5494d8cb09bd971a2a140d627861e899
BLAKE2b-256 1e4b0a9253a0223dd2a037df05f848f389b470c40c5016c37481c78615145a68

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page