Skip to main content

Encodes and decodes sequences of unsigned integers with known widths (and sequences of symbols from finite sets).

Project description

Sub_Byte

Bit packer and depacker. Encodes and decodes sequences of integers with known bit-widths (and sequences of symbols equivalent to integers under some mapping).

Overview

Sub Byte stores data without wasting bits, while preserving its structure, without requiring compression or decompression. Simple bit packing is used, supporting using less than a byte of storage for <=7 bit fields, crossing byte boundaries if necessary.

A bit width for each symbol is required. The bit width sequence (a simple codec) can be associated with the encoded data as meta data. The decoder can be passed the total number of symbols to decode (e.g. whether a null byte (0b00000000), is 8 1-bit zeros, 4 2-bit zeros, 2 u4 zeros or a single u8 zero).

Alternatively, more dynamic codecs can be supported by passing null for the number of symbols to the decoder. Axtra custom code must then be written by the user, to determine when iteration ceases. This can be used e.g. to encode the actual bit widths first (in some other fixed bit widths), to encode the number of symbols or cycles, and to implement any other codec that determines bit widths, and termination of iteration, according to the user's code.

Data validation (e.g. checksums or hashes) must be done by the user, but an extra field can easily be appended to a bit width cycle.

Implementations

Python

Calculate a cache of data in Python.

uv pip install sub_byte

Typescript/Javascript

Decode a cache of data in Javascript, even in browser.

npm i sub_byte

Alternatives

Sub 4kB datasets

This library is not needed for data storage. Neither Sub_byte nor anything else, will reduce the disk space used. If the size of the un-encoded data set is less 4kB for example (or the page size of the file system on which the data will be stored, e.g. ext4, NTFS, APFS) then it is already below the minimum file size for that file system.

A bespoke protocol using custom width integer types

Up to 8 u1s (bits), up to 4 u2s, or up to 2 u3s or u4s per byte. Each developer must create their own implementation and tests. Interoperability between different private implementations is untestable.

Protocol buffers

Encodes max symbol per byte. Variable byte encoding - uses continuation bits.

Zipping (data compression)

  • Exploits statistical distributions (e.g. "E" being more common in English text than "Q") and patterns.
  • Unstructured until the end user unzips the archive.

Changelog

v0.05

Configured npm module for Typescript.

v0.04

Support dynamic codecs (null/None number of elements to decode).

Development

Type checking and linting:

Python

MyPy
mypy --python-executable=path/to/venv/where/deps/installed/python.exe src/sub_byte
Pyright

Activate venv where deps installed

pyright src/sub_byte/factories.py

TS

Typescipt compiler
npm run typecheck
Eslint
npm run eslint
Prettier
Check
npm run prettier
Auto fix
npm run prettier:write

Publishing

Bump version in package.json to x.y.z

NPM

npm run prepublish
npm pack

Double check contents of sub_byte-x.y.z.tgz

npm publish

Sign in (currently requires being the author).

PyPi

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sub_byte-0.0.7.tar.gz (26.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sub_byte-0.0.7-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file sub_byte-0.0.7.tar.gz.

File metadata

  • Download URL: sub_byte-0.0.7.tar.gz
  • Upload date:
  • Size: 26.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.20

File hashes

Hashes for sub_byte-0.0.7.tar.gz
Algorithm Hash digest
SHA256 fcf4c691496f78f5a71f5b532d865ef03bcd8cc9870f797ed2aa2d12ce34b382
MD5 2435489b6acbd2f0f39cf600d77d4fd0
BLAKE2b-256 e6c60db9daaa50c31692e4cf7765042cdf4a244d509f862f3f319a5bf92d110f

See more details on using hashes here.

File details

Details for the file sub_byte-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: sub_byte-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.20

File hashes

Hashes for sub_byte-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b226ae089c650d652cb52f2b87e69c33cf5400129dc29fe109fa6a99c0f97679
MD5 45787e8ee381beb923b0c11f9625ad08
BLAKE2b-256 c406737565c9a775c9be648984baeb0af93d8f11e61f441a4f324f5086d8efa8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page