Skip to main content

pyDAC (python Directly Addressable Codes) offers a variable-length encoding scheme for (unsigned) integers with random access to any element of the encoded sequence.

Project description

pyDAC

Upload Python Package

pyDAC (python Directly Addressable Codes) offers a variable-length encoding scheme for (unsigned) integers with random access to any element of the encoded sequence.

In terms of compression, a DAC structure is very likely to outperform standard base-128 compression schemes aka VByte, Varint, VInt, EncInt etc..

As a bonus, a DAC structure gives to random access to each and every sequence element without any decoding.

Installation

Install from PyPi using

pip install pyDAC

Usage

from pyDAC import DAC

imports the module.

import random 
from pyDAC import DAC

values = random.sample(range(2**32), 10**7)
encoded_values = DAC(iter(values))

creates a DAC structure encoded_values for the values sequence.

Access

The ith element from the original values sequence can be retrieved from a DAC structure encoded_values using the subscript operator

for i in range(len(values)):
    assert values[i] == encoded_values[i]

A DAC structure encoded_values is also iterable.

You can easily loop through the stored elements stored

dac_iter = iter(encoded_values)
while True:
    try:
        val = next(dac_iter)
    except StopIteration:
        break  # Iterator exhausted: stop the loop
    else:
        print(val)

or return all stored elements at once

assert values == list(iter(encoded_values))

Miscellaneous

A DAC structure can provide compression ratios and space_savings in comparision to the minimal fixed width representation and to the variable byte representation of the original values sequence.

For example,

values = [1, 2, 1, 8, 3, 4, 5, 9, 13, 1024, 262189]
encoded_values = DAC(iter(values))

print(encoded_values.space_savings)
>>> {'vbyte': 0.08214285714285718, 'fixed_width': 0.508133971291866}

print(encoded_values.compression_ratios)
>>> {'vbyte': 1.0894941634241246, 'fixed_width': 2.0330739299610894}

Attributions

@article{
    title = {{Algorithms and Compressed Data Structures for Information Retrieval}},
    author = {Ladra, Susana},
    type = {Phd Thesis},
    institution = {Universidade da Coru{\~{n}}a},
    pages = {272},
    year = {2011},
    isbn = {5626895531}
}
@inproceedings{
    title = {{Directly addressable variable-length codes}},
    author = {Brisaboa, Nieves R. and Ladra, Susana and Navarro, Gonzalo},
    booktitle = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)},
    volume = {5721 LNCS},
    doi = {10.1007/978-3-642-03784-9_12},
    isbn = {3642037836},
    issn = {03029743},
    pages = {122--130},
    publisher = {Springer, Berlin, Heidelberg},
    year = {2009}
}

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyDAC-0.0.2.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyDAC-0.0.2-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file pyDAC-0.0.2.tar.gz.

File metadata

  • Download URL: pyDAC-0.0.2.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for pyDAC-0.0.2.tar.gz
Algorithm Hash digest
SHA256 a3b076b2727143fd0c5c6268139b38694796c23611adf55924482e8013bbbb97
MD5 17a048c7daf8be98f2521fde2947a91e
BLAKE2b-256 8a35ee946608e0da41f59c00236f0b8c53c16e8f45bfc9eaa82e9f1ddca4be7b

See more details on using hashes here.

File details

Details for the file pyDAC-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pyDAC-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for pyDAC-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1e5c7f174e4affec85b267ba64cbe2d1d4b027b6367d96e58af7f555b9a2a503
MD5 166b9a68039a67333c985173f8d3f485
BLAKE2b-256 dd13df7d9847f32eaa7dafddad39b3a8dd441f5de5fccbe6eae61950027247a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page