Skip to main content

Library for indexing LZO compressed files

Project description

python-lzo-indexer
==================

![](https://travis-ci.org/duedil-ltd/python-lzo-indexer.png)

Python library for indexing block offsets within LZO compressed files. The implementation is largely based on that of the [Hadoop Library](https://github.com/twitter/hadoop-lzo). Index files are used to allow Hadoop to split a single file compressed with LZO into several chunks for parallel processing.

Since LZO is a block based compression algorithm, we can split the file along the lines of blocks and decompress each block on it's own. The index is a file containing byte offsets for each block in the original LZO file.


Example
-------

The python code below demonstrates how easy it is to index an LZO file. This library also supports indexing a string, and a method to return the individual block offsets should you need to create a file of your own format.

```python
import lzo_indexer

with open("my-file.lzo", "r") as f:
with open("my-file.lzo.index", "rw") as index:
lzo_indexer.index_lzo_file(f, index)
```


Command-line Utility
--------------------

This library also includes a utility for indexing multiple lzo files, using the python indexer. This is a much faster alternative to the command line utility built into the hadoop-lzo library as it avoids the JVM.

```
$ bin/lzo-indexer --help

usage: lzo-indexer [-h] [--verbose] [--force] lzo_files [lzo_files ...]

positional arguments:
lzo_files List of LZO files to index

optional arguments:
-h, --help show this help message and exit
--verbose, -v Enable verbose logging
--force, -f Force re-creation of an index even if it exists
```


Contributions
-------------

I welcome any contributions, though I request that any pull requests come with test coverage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lzo-indexer-0.0.1.tar.gz (8.0 kB view details)

Uploaded Source

File details

Details for the file lzo-indexer-0.0.1.tar.gz.

File metadata

File hashes

Hashes for lzo-indexer-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b6494b8f1c46bd201573bc875c6e6afbee4da984d71fb6b0a041cafc17a68776
MD5 0f32b5e513ecd7fda11138c55f3599f8
BLAKE2b-256 592cd226b59992c3ccab3cea67b6c451e6e4a3cc2a2fc36976d30d5d9b1905d2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page