Skip to main content

Library for indexing LZO compressed files

Project description

https://img.shields.io/pypi/pyversions/python3_lzo_indexer.svg https://img.shields.io/pypi/v/python3_lzo_indexer.svg https://coveralls.io/repos/github/Orhideous/python3_lzo_indexer/badge.svg?branch=master https://img.shields.io/travis/Orhideous/python3_lzo_indexer.svg https://pyup.io/repos/github/Orhideous/python3_lzo_indexer/shield.svg

Python library for indexing block offsets within LZO compressed files. The implementation is largely based on that of the Hadoop Library. Index files are used to allow Hadoop to split a single file compressed with LZO into several chunks for parallel processing.

Since LZO is a block based compression algorithm, we can split the file along the lines of blocks and decompress each block on it’s own. The index is a file containing byte offsets for each block in the original LZO file.

This library is python3 fork of python-lzo-indexer.

Example

The python code below demonstrates how easy it is to index an LZO file. This library also supports indexing a string, and a method to return the individual block offsets should you need to create a file of your own format.

import lzo_indexer

with open("my-file.lzo", "r") as f, open("my-file.lzo.index", "rw") as index:
    lzo_indexer.index_lzo_file(f, index)

Command-line Utility

This library also includes a utility for indexing multiple lzo files, using the python indexer. This is a much faster alternative to the command line utility built into the hadoop-lzo library as it avoids the JVM.

$ lzo_indexer --help

Usage: lzo_indexer [OPTIONS] <files to index>

  Tool for indexing LZO compressed files

Options:
  -t, --threads INTEGER  Processing threads count
  -e, --extension TEXT   Index file extension
  -f, --force            Force re-creation of an index even if it exists
  -h, --help             Show this message and exit.

Contributions

I welcome any contributions, though I request that any pull requests come with test coverage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python3_lzo_indexer-0.3.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

python3_lzo_indexer-0.3.0-py2.py3-none-any.whl (6.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file python3_lzo_indexer-0.3.0.tar.gz.

File metadata

  • Download URL: python3_lzo_indexer-0.3.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for python3_lzo_indexer-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d7d061fdcd4a3b9383496066676fc95316b38f5e48e0cf8a08d0d9cd5ff446b9
MD5 4a3008350f3045ba4422b8a5a3f4060c
BLAKE2b-256 856eac13753f3347b1e2a3e29bcb18edffdfd585d8497c479589805cf8a69161

See more details on using hashes here.

File details

Details for the file python3_lzo_indexer-0.3.0-py2.py3-none-any.whl.

File metadata

  • Download URL: python3_lzo_indexer-0.3.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for python3_lzo_indexer-0.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8df8a388b3e8073dc384a90ec6bc52e1e18a2b8ec05db221374f47e514f6eca0
MD5 2fcf0fc8d179104af5dc04f7df20eebc
BLAKE2b-256 30ad6ff63ecb4bc9f06b92aeb8a35bb50c0c999897d2ca98f0327ac48440fc51

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page