pyDAC (python Directly Addressable Codes) offers a variable-length encoding scheme for (unsigned) integers with random access to any element of the encoded sequence.
Project description
pyDAC
pyDAC (python Directly Addressable Codes) offers a variable-length encoding scheme for (unsigned) integers with random access to any element of the encoded sequence.
In terms of compression, a DAC structure is very likely to outperform standard base-128 compression schemes aka VByte, Varint, VInt, EncInt etc..
As a bonus, a DAC structure gives to random access to each and every sequence element without any decoding.
Installation
Install from PyPi using
pip install pyDAC
Usage
from pyDAC import DAC
imports the module.
import random
from pyDAC import DAC
values = random.sample(range(2**32), 10**7)
encoded_values = DAC(iter(values))
creates a DAC structure encoded_values for the values sequence.
Access
The ith element from the original values sequence can be retrieved from a DAC structure encoded_values using the subscript operator
for i in range(len(values)):
assert values[i] == encoded_values[i]
A DAC structure encoded_values is also iterable.
You can easily loop through the stored elements stored
dac_iter = iter(encoded_values)
while True:
try:
val = next(dac_iter)
except StopIteration:
break # Iterator exhausted: stop the loop
else:
print(val)
or return all stored elements at once
assert values == list(iter(encoded_values))
Miscellaneous
A DAC structure can provide compression ratios and space_savings in comparision to the minimal fixed width representation and to the variable byte representation of the original values sequence.
For example,
values = [1, 2, 1, 8, 3, 4, 5, 9, 13, 1024, 262189]
encoded_values = DAC(iter(values))
print(encoded_values.space_savings)
>>> {'vbyte': 0.08214285714285718, 'fixed_width': 0.508133971291866}
print(encoded_values.compression_ratios)
>>> {'vbyte': 1.0894941634241246, 'fixed_width': 2.0330739299610894}
Attributions
@article{
title = {{Algorithms and Compressed Data Structures for Information Retrieval}},
author = {Ladra, Susana},
type = {Phd Thesis},
institution = {Universidade da Coru{\~{n}}a},
pages = {272},
year = {2011},
isbn = {5626895531}
}
@inproceedings{
title = {{Directly addressable variable-length codes}},
author = {Brisaboa, Nieves R. and Ladra, Susana and Navarro, Gonzalo},
booktitle = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
volume = {5721 LNCS},
doi = {10.1007/978-3-642-03784-9_12},
isbn = {3642037836},
issn = {03029743},
pages = {122--130},
publisher = {Springer, Berlin, Heidelberg},
year = {2009}
}
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyDAC-0.0.2.tar.gz.
File metadata
- Download URL: pyDAC-0.0.2.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3b076b2727143fd0c5c6268139b38694796c23611adf55924482e8013bbbb97
|
|
| MD5 |
17a048c7daf8be98f2521fde2947a91e
|
|
| BLAKE2b-256 |
8a35ee946608e0da41f59c00236f0b8c53c16e8f45bfc9eaa82e9f1ddca4be7b
|
File details
Details for the file pyDAC-0.0.2-py3-none-any.whl.
File metadata
- Download URL: pyDAC-0.0.2-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e5c7f174e4affec85b267ba64cbe2d1d4b027b6367d96e58af7f555b9a2a503
|
|
| MD5 |
166b9a68039a67333c985173f8d3f485
|
|
| BLAKE2b-256 |
dd13df7d9847f32eaa7dafddad39b3a8dd441f5de5fccbe6eae61950027247a1
|