Library for indexing LZO compressed files
Python library for indexing block offsets within LZO compressed files. The implementation is largely based on that of the Hadoop Library. Index files are used to allow Hadoop to split a single file compressed with LZO into several chunks for parallel processing.
Since LZO is a block based compression algorithm, we can split the file along the lines of blocks and decompress each block on it’s own. The index is a file containing byte offsets for each block in the original LZO file.
This library is python3 fork of python-lzo-indexer.
The python code below demonstrates how easy it is to index an LZO file. This library also supports indexing a string, and a method to return the individual block offsets should you need to create a file of your own format.
import lzo_indexer with open("my-file.lzo", "r") as f, open("my-file.lzo.index", "rw") as index: lzo_indexer.index_lzo_file(f, index)
This library also includes a utility for indexing multiple lzo files, using the python indexer. This is a much faster alternative to the command line utility built into the hadoop-lzo library as it avoids the JVM.
$ lzo_indexer --help Usage: lzo_indexer [OPTIONS] <files to index> Tool for indexing LZO compressed files Options: -t, --threads INTEGER Processing threads count -e, --extension TEXT Index file extension -f, --force Force re-creation of an index even if it exists -h, --help Show this message and exit.
I welcome any contributions, though I request that any pull requests come with test coverage.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for python3_lzo_indexer-0.3.0.tar.gz
Hashes for python3_lzo_indexer-0.3.0-py2.py3-none-any.whl