Skip to main content

Pure Python implementation of the XZ file format with random access support

Project description

python-xz

Pure Python implementation of the XZ file format with random access support

Leveraging the lzma module for fast (de)compression

GitHub build status Release on PyPI Code coverage Mypy type checker MIT License


📖 Documentation   |   📃 Changelog


A XZ file can be composed of several streams and blocks. This allows for fast random access when reading, but this is not supported by Python's builtin lzma module (which would read all previous blocks for nothing).

lzma lzmaffi python-xz
module type builtin cffi (C extension) pure Python
📄 read
random access ❌ no1 ✔️ yes2 ✔️ yes2
several blocks ✔️ yes ✔️✔️ yes3 ✔️✔️ yes3
several streams ✔️ yes ✔️ yes ✔️✔️ yes4
stream padding ❌ no5 ✔️ yes ✔️ yes
📝 write
w mode ✔️ yes ✔️ yes ✔️ yes
x mode ✔️ yes ❌ no ✔️ yes
a mode ✔️ new stream ✔️ new stream ❌ no
r+/w+/… modes ❌ no ❌ no ✔️ yes
several blocks ❌ no ❌ no ✔️ yes
several streams ❌ no6 ❌ no6 ✔️ yes
stream padding ❌ no ❌ no ❌ no
Notes
  1. Reading from a position will read the file from the very beginning
  2. Reading from a position will read the file from the beginning of the block
  3. Block positions available with the block_boundaries attribute
  4. Stream positions available with the stream_boundaries attribute
  5. Related issue
  6. Possible by manually closing and re-opening in append mode

Install

Install python-xz with pip:

$ python -m pip install python-xz

An unofficial package for conda is also available, see issue #5 for more information.

Usage

The API is similar to lzma: you can use either xz.open or xz.XZFile.

Read mode

>>> with xz.open('example.xz') as fin:
...     fin.read(18)
...     fin.stream_boundaries  # 2 streams
...     fin.block_boundaries   # 4 blocks in first stream, 2 blocks in second stream
...     fin.seek(1000)
...     fin.read(31)
...
b'Hello, world! \xf0\x9f\x91\x8b'
[0, 2000]
[0, 500, 1000, 1500, 2000, 3000]
1000
b'\xe2\x9c\xa8 Random access is fast! \xf0\x9f\x9a\x80'

Opening in text mode works as well, but notice that seek arguments as well as boundaries are still in bytes (just like with lzma.open).

>>> with xz.open('example.xz', 'rt') as fin:
...     fin.read(15)
...     fin.stream_boundaries
...     fin.block_boundaries
...     fin.seek(1000)
...     fin.read(26)
...
'Hello, world! 👋'
[0, 2000]
[0, 500, 1000, 1500, 2000, 3000]
1000
'✨ Random access is fast! 🚀'

Write mode

Writing is only supported from the end of file. It is however possible to truncate the file first. Note that truncating is only supported on block boundaries.

>>> with xz.open('test.xz', 'w') as fout:
...     fout.write(b'Hello, world!\n')
...     fout.write(b'This sentence is still in the previous block\n')
...     fout.change_block()
...     fout.write(b'But this one is in its own!\n')
...
14
45
28

Advanced usage:

  • Modes like r+/w+/x+ allow to open for both read and write at the same time; however in the current implementation, a block with writing in progress is automatically closed when reading data from it.
  • The check, preset and filters arguments to xz.open and xz.XZFile allow to configure the default values for new streams and blocks.
  • Change block with the change_block method (the preset and filters attributes can be changed beforehand to apply to the new block).
  • Change stream with the change_stream method (the check attribute can be changed beforehand to apply to the new stream).

FAQ

How does random-access works?

XZ files are made of a number of streams, and each stream is composed of a number of block. This can be seen with xz --list:

$ xz --list file.xz
Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1      13     16.8 MiB    297.9 MiB  0.056  CRC64   file.xz

To read data from the middle of the 10th block, we will decompress the 10th block from its start it until we reach the middle (and drop that decompressed data), then returned the decompressed data from that point.

Choosing the good block size is a tradeoff between seeking time during random access and compression ratio.

How can I create XZ files optimized for random-access?

You can open the file for writing and use the change_block method to create several blocks.

Other tools allow to create XZ files with several blocks as well:

  • XZ Utils needs to be called with flags:
$ xz -T0 file                          # threading mode
$ xz --block-size 16M file             # same size for all blocks
$ xz --block-list 16M,32M,8M,42M file  # specific size for each block
  • PIXZ creates files with several blocks by default:
$ pixz file

Python version support

As a general rule, all Python versions that are both released and still officially supported are supported by python-xz and tested against (both CPython and PyPy implementations).

If you have other use cases or find issues with some Python versions, feel free to open a ticket!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_xz-0.6.0.tar.gz (66.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_xz-0.6.0-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file python_xz-0.6.0.tar.gz.

File metadata

  • Download URL: python_xz-0.6.0.tar.gz
  • Upload date:
  • Size: 66.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for python_xz-0.6.0.tar.gz
Algorithm Hash digest
SHA256 c8dc51070799ee9e77ddd9dd207a11cc9170e9298542f81e7187072e0d543478
MD5 92892d135226671cc63d26efb761c718
BLAKE2b-256 5beed04ea840d0b48d70ba6a3d679e3140c0451e672313fc811f5150484dac7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for python_xz-0.6.0.tar.gz:

Publisher: build.yml on Rogdham/python-xz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file python_xz-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: python_xz-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for python_xz-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 81bf89467cb0865fec10f5501295be0131962df53be1ef65e8d6bb72b6e2220a
MD5 1dd787bf2f1a77b80a2207156f7dab23
BLAKE2b-256 462afe8e0669c4bc5393b2de66085647dcaaa38fbc4cf29f05a189de7351aeee

See more details on using hashes here.

Provenance

The following attestation bundles were made for python_xz-0.6.0-py3-none-any.whl:

Publisher: build.yml on Rogdham/python-xz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page