Skip to main content

pyDAC (python Directly Addressable Codes) offers a variable-length encoding scheme for (unsigned) integers with random access to any element of the encoded sequence.

Project description

pyDAC

Upload Python Package

pyDAC (python Directly Addressable Codes) offers a variable-length encoding scheme for (unsigned) integers with random access to any element of the encoded sequence.

In terms of compression, a DAC structure is very likely to outperform standard base-128 compression schemes aka VByte, Varint, VInt, EncInt etc..

As a bonus, a DAC structure gives to random access to each and every sequence element without any decoding.

Installation

Install from PyPi using

pip install pyDAC

Usage

from pyDAC import DAC

imports the module.

import random 
from pyDAC import DAC

values = random.sample(range(2**32), 10**7)
encoded_values = DAC(iter(values))

creates a DAC structure encoded_values for the values sequence.

Access

The ith element from the original values sequence can be retrieved from a DAC structure encoded_values using the subscript operator

for i in range(len(values)):
    assert values[i] == encoded_values[i]

A DAC structure encoded_values is also iterable.

You can easily loop through the stored elements stored

dac_iter = iter(encoded_values)
while True:
    try:
        val = next(dac_iter)
    except StopIteration:
        break  # Iterator exhausted: stop the loop
    else:
        print(val)

or return all stored elements at once

assert values == list(iter(encoded_values))

Miscellaneous

A DAC structure can provide compression ratios and space_savings in comparision to the minimal fixed width representation and to the variable byte representation of the original values sequence.

For example,

values = [1, 2, 1, 8, 3, 4, 5, 9, 13, 1024, 262189]
encoded_values = DAC(iter(values))

print(encoded_values.space_savings)
>>> {'vbyte': 0.08214285714285718, 'fixed_width': 0.508133971291866}

print(encoded_values.compression_ratios)
>>> {'vbyte': 1.0894941634241246, 'fixed_width': 2.0330739299610894}

Attributions

@article{
    title = {{Algorithms and Compressed Data Structures for Information Retrieval}},
    author = {Ladra, Susana},
    type = {Phd Thesis},
    institution = {Universidade da Coru{\~{n}}a},
    pages = {272},
    year = {2011},
    isbn = {5626895531}
}
@inproceedings{
    title = {{Directly addressable variable-length codes}},
    author = {Brisaboa, Nieves R. and Ladra, Susana and Navarro, Gonzalo},
    booktitle = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
    volume = {5721 LNCS},
    doi = {10.1007/978-3-642-03784-9_12},
    isbn = {3642037836},
    issn = {03029743},
    pages = {122--130},
    publisher = {Springer, Berlin, Heidelberg},
    year = {2009}
}

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyDAC-0.0.2.tar.gz (4.8 kB view hashes)

Uploaded Source

Built Distribution

pyDAC-0.0.2-py3-none-any.whl (5.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page