Skip to main content

Smahat Time Series Encoding

Project description

Smahat Time Series Encoding

Smahat allows to encode a sequence of integer values using a fixed (for all values) number of bits but minimal with regards to the data range. For example: for a series of boolean values only one bit is needed, for a series of integer percentages 7 bits are needed, etc.

Smahat is useful when:

  • Time series is integer-valued. (It doesn't work with floats :))
  • The range of the data is known in advance (if not streaming, this is not necessary).
  • The data range is relatively small.
  • The data does not have properties that would make other compression algorithms useful, or these other algorithms have an unacceptable cost for the use case.

Smahat can also be used as a baseline to calculate the true compression ratio of a compression algorithm on data of a certain nature.

Installation

To install the latest release:

$ pip install smahat

You can also build a local package and install it:

$ make build
$ pip install dist/*.whl

Usage

Import smahat module.

>>> import smahat

Data to encode.

>>> values = [12, 0, 17, 15, 78, 10]

Encoding

You can use encode_next to encode one value by one:

>>> encoder = smahat.Encoder(min_value=0, max_value=100, strategy='saturate')
>>> for v in values:
...     encoder.encode_next(v)
>>> content = encoder.get_encoded()
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Or you can use Encoder.encode_all to encode all values (range min and max will be inferred from values if not provided):

>>> content = smahat.Encoder.encode_all(values, min_value=0, max_value=100, strategy='saturate')
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Decoding

To decode use Decoder.decode_all.

>>> smahat.Decoder.decode_all(content)
[12, 0, 17, 15, 78, 10]

Encoding result

The result of the encoding of a sequence of values using Smahat is a SmahatContent dictionary containing the encoded data, plus three fields : shift is used to bring the data range to start from zero (values are shifted and encoded in pure binary), n_bits_per_value indicates the number of bits used to encode each value, n_padding_bits (between 0 and 7) indicates the number of unused padding bits within the last byte.

class SmahatContent(TypedDict):
    encoded: bytes
    shift: int
    n_bits_per_value: int
    n_padding_bits: int

If you want to use this library for message exchanges, you can serialize the result of the encoding as you like (JSON, protobuf, etc.)

Contribute

$ git clone https://github.com/ghilesmeddour/smahat-time-series-encoding.git
$ cd smahat-time-series-compression
make format
make dead-code-check
make test
make type-check
make coverage
make build

TODOs

  • Add unit tests.
  • Improve doc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smahat-0.0.1.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

smahat-0.0.1-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file smahat-0.0.1.tar.gz.

File metadata

  • Download URL: smahat-0.0.1.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for smahat-0.0.1.tar.gz
Algorithm Hash digest
SHA256 ef7382309e1692d8bb91bfcf373f54be989c6cd722ef63efc2a41c8b10e1f92e
MD5 1227515d497baab257af58952b04729d
BLAKE2b-256 85a290ebe682dfd3cef0fc69b01f49324eca996325f55f198e7c80851692fbb4

See more details on using hashes here.

File details

Details for the file smahat-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: smahat-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for smahat-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 64c30db6a1fd969d5b9097cc56748add3d931f95df38e08e67e25216985d5109
MD5 feb44c264432bf9be73fb81e30a04eae
BLAKE2b-256 1ebf9ae5ee8eb12b020cc3db31a4a2f56e5bfda888ebb26b1abae8fd886de7f1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page