Skip to main content

Open compressed files transparently

Project description

https://travis-ci.org/marcelm/xopen.svg?branch=master https://img.shields.io/pypi/v/xopen.svg?branch=master

xopen

This small Python module provides an xopen function that works like the built-in open function, but can also deal with compressed files. Supported compression formats are gzip, bzip2 and xz. They are automatically recognized by their file extensions .gz, .bz2 or .xz.

The focus is on being as efficient as possible on all supported Python versions. For example, xopen uses pigz, which is a parallel version of gzip, to open .gz files, which is faster than using the built-in gzip.open function. pigz can use multiple threads when compressing, but is also faster when reading .gz files, so it is used both for reading and writing if it is available.

This module has originally been developed as part of the cutadapt tool that is used in bioinformatics to manipulate sequencing data. It has been in successful use within that software for a few years.

xopen is compatible with Python versions 2.7 and 3.4 to 3.7.

Usage

Open a file for reading:

from xopen import xopen

with xopen('file.txt.xz') as f:
    content = f.read()

Or without context manager:

from xopen import xopen

f = xopen('file.txt.xz')
content = f.read()
f.close()

Open a file in binary mode for writing:

from xopen import xopen

with xopen('file.txt.gz', mode='wb') as f:
    f.write(b'Hello')

Credits

The name xopen was taken from the C function of the same name in the utils.h file which is part of BWA.

Kyle Beauchamp <https://github.com/kyleabeauchamp/> has contributed support for appending to files.

Ruben Vorderman <https://github.com/rhpvorderman/> contributed improvements to make reading gzipped files faster.

Some ideas were taken from the canopener project. If you also want to open S3 files, you may want to use that module instead.

Changes

v0.8.0

  • Speed improvements when iterating over gzipped files.

v0.6.0

  • For reading from gzipped files, xopen will now use a pigz subprocess. This is faster than using gzip.open.

  • Python 2 supported will be dropped in one of the next releases.

v0.5.0

  • By default, pigz is now only allowed to use at most four threads. This hopefully reduces problems some users had with too many threads when opening many files at the same time.

  • xopen now accepts pathlib.Path objects.

Author

Marcel Martin <mail@marcelm.net> (@marcelm_ on Twitter)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xopen-0.8.1.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

xopen-0.8.1-py2.py3-none-any.whl (7.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file xopen-0.8.1.tar.gz.

File metadata

  • Download URL: xopen-0.8.1.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.6.7

File hashes

Hashes for xopen-0.8.1.tar.gz
Algorithm Hash digest
SHA256 770d51ebb3b865e2ccaee05188e8a96cb959633de4456caa8d39ca52a878ca0f
MD5 2e41dedc80eb10483ba7b1ab6df3a0a9
BLAKE2b-256 4c32903834bdad6df4fba20fee14b5a6f3478a7b9e97c4b93ba6dd965820f4f7

See more details on using hashes here.

File details

Details for the file xopen-0.8.1-py2.py3-none-any.whl.

File metadata

  • Download URL: xopen-0.8.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.6.7

File hashes

Hashes for xopen-0.8.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 9087df746c1aee7887715e07d21ab92e9dc9e2991a401281dd9bf497b5e3de38
MD5 c2b05d3864704ede6a46305c8eb33cb8
BLAKE2b-256 147564f9768111d54846c710c1e796ce43516d373b68e9ce5e05143366268d78

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page