Pure Python implementation of the XZ file format with random access support
Project description
python-xz
Pure Python implementation of the XZ file format with random access support
A XZ file can be composed of several streams and blocks. This allows for random access
when reading, but this is not supported by Python's builtin lzma
module, which would
read all previous blocks for nothing.
lzma | lzmaffi | python-xz | |
---|---|---|---|
module type | builtin | cffi (C extension) | pure Python |
📄 read | |||
random access | ❌ no1 | ✔️ yes2 | ✔️ yes2 |
several blocks | ✔️ yes | ✔️✔️ yes3 | ✔️✔️ yes3 |
several streams | ✔️ yes | ✔️ yes | ✔️✔️ yes4 |
stream padding | ❌ no | ✔️ yes | ✔️ yes |
📝 write | |||
w mode |
✔️ yes | ✔️ yes | ⏳ planned |
x mode |
✔️ yes | ❌ no | ⏳ planned |
a mode |
✔️ new stream | ✔️ new stream | ⏳ planned |
r+w mode |
❌ no | ❌ no | ⏳ planned |
several blocks | ❌ no | ❌ no | ⏳ planned |
several streams | ❌ no5 | ❌ no5 | ⏳ planned |
stream padding | ❌ no6 | ✔️ yes | ⏳ planned |
- Reading from a position will read the file from the very beginning
- Reading from a position will read the file from the beginning of the block
- Block positions available with the
block_boundaries
attribute - Stream positions available with the
stream_boundaries
attribute - Possible by manually closing and re-opening in append mode
- Related issue
Usage
Read mode
The API is similar to lzma: you can use either xz.open
or xz.XZFile
.
>>> with xz.open('example.xz') as fin:
... fin.read(18)
... fin.stream_boundaries # 2 streams
... fin.block_boundaries # 4 blocks in first stream, 2 blocks in second stream
... fin.seek(1000)
... fin.read(31)
...
b'Hello, world! \xf0\x9f\x91\x8b'
[0, 2000]
[0, 500, 1000, 1500, 2000, 3000]
1000
b'\xe2\x9c\xa8 Random access is fast! \xf0\x9f\x9a\x80'
Opening in text mode works as well, but notice that seek arguments as well as boundaries
are still in bytes (just like with lzma.open
).
>>> with xz.open('example.xz', 'rt') as fin:
... fin.read(15)
... fin.stream_boundaries
... fin.block_boundaries
... fin.seek(1000)
... fin.read(26)
...
'Hello, world! 👋'
[0, 2000]
[0, 500, 1000, 1500, 2000, 3000]
1000
'✨ Random access is fast! 🚀'
Write mode
This mode is not available yet.
FAQ
How does random-access works?
XZ files are made of a number of streams, and each stream is composed of a number of
block. This can be seen with xz --list
:
$ xz --list file.xz
Strms Blocks Compressed Uncompressed Ratio Check Filename
1 13 16.8 MiB 297.9 MiB 0.056 CRC64 file.xz
To read data from the middle of the 10th block, we will decompress the 10th block from its start it until we reach the middle (and drop that decompressed data), then returned the decompressed data from that point.
Choosing the good block size is a tradeoff between seeking time during random access and compression ratio.
How can I create XZ files optimized for random-access?
XZ Utils can create XZ files with several blocks:
$ xz -T0 file # threading mode
$ xz --block-size 16M file # same size for all blocks
$ xz --block-list 16M,32M,8M,42M file # specific size for each block
PIXZ creates files with several blocks by default:
$ pixz file
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file python-xz-0.1.1.tar.gz
.
File metadata
- Download URL: python-xz-0.1.1.tar.gz
- Upload date:
- Size: 50.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d5bf3b1ae1674a10abc6b62f541bcb6f9f0fab7851ef123f54d8cac468c098c |
|
MD5 | b57e53f20f1ee89a45ae376ebdc00d04 |
|
BLAKE2b-256 | 5740d41b2119e031118363544e66cf40a2c5e0e9ac2cd42ecf7f0dee22511c47 |
File details
Details for the file python_xz-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: python_xz-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed22c36b636aa8b38e68e4418a635432f9b57445faea64600a9597920f86c610 |
|
MD5 | 019fc7a57310b68e1d25fae73b157765 |
|
BLAKE2b-256 | bd71ae6fb69fa99cec28a01703fc98d41181eaa4ea95237f7a1fc85480b80691 |