pyDAC (python Directly Addressable Codes) offers a variable-length encoding scheme for (unsigned) integers with random access to any element of the encoded sequence.
Project description
pyDAC
pyDAC (python Directly Addressable Codes) offers a variable-length encoding scheme for (unsigned) integers with random access to any element of the encoded sequence.
In terms of compression, a DAC structure is very likely to outperform standard base-128 compression schemes aka VByte, Varint, VInt, EncInt etc..
As a bonus, a DAC structure gives to random access to each and every sequence element without any decoding.
Installation
Install from PyPi using
pip install pyDAC
Usage
from pyDAC import DAC
imports the module.
import random
from pyDAC import DAC
values = random.sample(range(2**32), 10**7)
encoded_values = DAC(iter(values))
creates a DAC structure encoded_values
for the values
sequence.
Access
The i
th element from the original values
sequence can be retrieved from a DAC structure encoded_values
using the subscript operator
for i in range(len(values)):
assert values[i] == encoded_values[i]
A DAC structure encoded_values
is also iterable.
You can easily loop through the stored elements stored
dac_iter = iter(encoded_values)
while True:
try:
val = next(dac_iter)
except StopIteration:
break # Iterator exhausted: stop the loop
else:
print(val)
or return all stored elements at once
assert values == list(iter(encoded_values))
Miscellaneous
A DAC structure can provide compression ratios
and space_savings
in comparision to the minimal fixed width representation and to the variable byte representation of the original values
sequence.
For example,
values = [1, 2, 1, 8, 3, 4, 5, 9, 13, 1024, 262189]
encoded_values = DAC(iter(values))
print(encoded_values.space_savings)
>>> {'vbyte': 0.08214285714285718, 'fixed_width': 0.508133971291866}
print(encoded_values.compression_ratios)
>>> {'vbyte': 1.0894941634241246, 'fixed_width': 2.0330739299610894}
Attributions
@article{
title = {{Algorithms and Compressed Data Structures for Information Retrieval}},
author = {Ladra, Susana},
type = {Phd Thesis},
institution = {Universidade da Coru{\~{n}}a},
pages = {272},
year = {2011},
isbn = {5626895531}
}
@inproceedings{
title = {{Directly addressable variable-length codes}},
author = {Brisaboa, Nieves R. and Ladra, Susana and Navarro, Gonzalo},
booktitle = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
volume = {5721 LNCS},
doi = {10.1007/978-3-642-03784-9_12},
isbn = {3642037836},
issn = {03029743},
pages = {122--130},
publisher = {Springer, Berlin, Heidelberg},
year = {2009}
}
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.