tamp

No project description provided

These details have not been verified by PyPI

Project links

Project description

Python compat

Tamp is a low-memory, micropython-optimized, DEFLATE-inspired lossless compression library.

Features

Pure python implementation.
High compression ratios and low memory use.
Small compression and decompression implementations.
Mid-stream flushing.
Customizable dictionary for greater compression of small messages.
Convenient CLI interface.

Installation

Tamp contains 2 implementations: a desktop cpython implementation that is optimized for readability, and a micropython implementation that is optimized for runtime performance.

Desktop Python

The Tamp library and CLI requires Python >=3.8 and can be installed via:

pip install tamp

MicroPython

For micropython use, there are 3 main files:

tamp/__init__.py - Always required.
tamp/decompressor_viper.py - Required for on-device decompression.
tamp/compressor_viper.py - Required for on-device compression.

For example, if on-device decompression isn’t used, then do not include decompressor_viper.py. If manually installing, just copy these files to your microcontroller’s /lib/tamp folder.

If using Belay, tamp can be installed by adding the following to pyproject.toml.

[tool.belay.dependencies]
tamp = [
   "https://github.com/BrianPugh/tamp/blob/main/tamp/__init__.py",
   "https://github.com/BrianPugh/tamp/blob/main/tamp/compressor_viper.py",
   "https://github.com/BrianPugh/tamp/blob/main/tamp/decompressor_viper.py",
]

Usage

Tamp works on desktop python and micropython. On desktop, Tamp is bundled with the tamp command line tool for compressing and decompressing tamp files.

CLI

Compression

Use tamp compress to compress a file or stream. If no input file is specified, data from stdin will be read. If no output is specified, the compressed output stream will be written to stdout.

$ tamp compress --help

 Usage: tamp compress [OPTIONS] [INPUT_PATH]

 Compress an input file or stream.

╭─ Arguments ────────────────────────────────────────────────────────────────────────╮
│   input_path      [INPUT_PATH]  Input file to compress or decompress. Defaults to  │
│                                 stdin.                                             │
╰────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────╮
│ --output   -o      PATH                      Output file. Defaults to stdout.      │
│ --window   -w      INTEGER RANGE [8<=x<=15]  Number of bits used to represent the  │
│                                              dictionary window.                    │
│                                              [default: 10]                         │
│ --literal  -l      INTEGER RANGE [5<=x<=8]   Number of bits used to represent a    │
│                                              literal.                              │
│                                              [default: 8]                          │
│ --help                                       Show this message and exit.           │
╰────────────────────────────────────────────────────────────────────────────────────╯

Example usage:

tamp compress enwik8 -o enwik8.tamp  # Compress a file
echo "hello world" | tamp compress | wc -c  # Compress a stream and print the compressed size.

The following options can impact compression ratios and memory usage:

window - 2^window plaintext bytes to look back to try and find a pattern. A larger window size will increase the chance of finding a longer pattern match, but will use more memory, increase compression time, and cause each pattern-token to take up more space. Try smaller window values if compressing highly repetitive data, or short messages.
literal - Number of bits used in each plaintext byte. For example, if all input data is 7-bit ASCII, then setting this to 7 will improve literal compression ratios by 11.1%. The default, 8-bits, can encode any binary data.

Decompression

Use tamp decompress to decompress a file or stream. If no input file is specified, data from stdin will be read. If no output is specified, the compressed output stream will be written to stdout.

 $ tamp decompress --help

 Usage: tamp decompress [OPTIONS] [INPUT_PATH]

 Decompress an input file or stream.

╭─ Arguments ────────────────────────────────────────────────────────────────────────╮
│   input_path      [INPUT_PATH]  Input file. If not provided, reads from stdin.     │
╰────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────╮
│ --output  -o      PATH  Output file. Defaults to stdout.                           │
│ --help                  Show this message and exit.                                │
╰────────────────────────────────────────────────────────────────────────────────────╯

Example usage:

tamp decompress enwik8.tamp -o enwik8
echo "hello world" | tamp compress | tamp decompress

Python

The python library can perform one-shot compression, as well as operate on files/streams.

import tamp

# One-shot compression
string = b"I scream, you scream, we all scream for ice cream."
compressed_data = tamp.compress(string)
reconstructed = tamp.decompress(compressed_data)
assert reconstructed == string

# Streaming compression
with tamp.open("output.tamp", "wb") as f:
    for _ in range(10):
        f.write(string)

# Streaming decompression
with tamp.open("output.tamp", "rb") as f:
    reconstructed = f.read()

Benchmark

In the following section, we compare Tamp against:

zlib, a python builtin gzip-compatible DEFLATE compression library.
heatshrink, a data compression library for embedded/real-time systems. Heatshrink has similar goals as Tamp.

All of these are LZ-based compression algorithms, and tests were performed using a 1KB (10 bit) window. Since zlib already uses significantly more memory by default, the lowest memory level (memLevel=1) was used in these benchmarks. It should be noted that higher zlib memory levels will having greater compression ratios than Tamp. Currently, there is no micropython-compatible zlib or heatshrink compression implementation, so these numbers are provided simply as a reference.

Compression Ratio

The following table shows compression algorithm performance over a variety of input data sourced from the Silesia Corpus and Enwik8. This should give a general idea of how these algorithms perform over a variety of input data types.

dataset	raw	tamp	zlib	heatshrink
enwik8	100,000,000	51,635,633	56,205,166	56,110,394
build/silesia/dickens	10,192,446	5,546,761	6,049,169	6,155,768
build/silesia/mozilla	51,220,480	25,121,385	25,104,966	25,435,908
build/silesia/mr	9,970,564	5,027,032	4,864,734	5,442,180
build/silesia/nci	33,553,445	8,643,610	5,765,521	8,247,487
build/silesia/ooffice	6,152,192	3,814,938	4,077,277	3,994,589
build/silesia/osdb	10,085,684	8,520,835	8,625,159	8,747,527
build/silesia/reymont	6,627,202	2,847,981	2,897,661	2,910,251
build/silesia/samba	21,606,400	9,102,594	8,862,423	9,223,827
build/silesia/sao	7,251,944	6,137,755	6,506,417	6,400,926
build/silesia/webster	41,458,703	18,694,172	20,212,235	19,942,817
build/silesia/x-ray	8,474,240	7,510,606	7,351,750	8,059,723
build/silesia/xml	5,345,280	1,681,687	1,586,985	1,665,179

Tamp usually out-performs heatshrink, and is generally very competitive with zlib. While trying to be an apples-to-apples comparison, zlib still uses significantly more memory during both compression and decompression (see next section). Tamp accomplishes competitive performance while using around 10x less memory.

Memory Usage

The following table shows approximately how much memory each algorithm uses with a window size of 1KB (10 bit).

Action	tamp	zlib	heatshrink
Compression	(1 << windowBits)	(1 << (windowBits+2)) + 7 KB	(1 << windowBits)
Decompression	(1 << windowBits)	(1 << windowBits) + 7 KB	(1 << windowBits)

Both tamp and heatshrink have a few dozen bytes of overhead in addition to the primary window buffer, but are implementation-specific and ignored for clarity here.

Runtime

The desktop implementation is quite slow and could be significantly sped up (probably 1000x) with a C or Rust implementation. However, Tamp inherently targets smaller files due to it’s operating requirements, on the order of tens of megabytes or less. Therefore, the slow runtime is still quite insignificant for small files. If the desktop runtime performance becomes an issue for your use-case, please open up a Github issue and we can prioritize an optimized implementation.

When to use Tamp

On a Pi Pico (rp2040), the viper implementation of Tamp can compress data at around 4,300 bytes/s when using a 10-bit window. The data can then be decompressed at around 42,700 bytes/s. Tamp is good for compressing data on-device. If purely decompressing data on-device, it will nearly always be better to use the micropython-builtin zlib.decompress, when available.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.6.0

Jun 3, 2024

1.5.0

May 9, 2024

1.4.1

Apr 25, 2024

1.4.0

Apr 25, 2024

1.3.1

Jan 15, 2024

1.3.0

Dec 5, 2023

1.2.0

Nov 29, 2023

1.1.6

Aug 16, 2023

1.1.5

Jul 22, 2023

1.1.4

Jun 29, 2023

1.1.3

Jun 18, 2023

1.1.2

Jun 15, 2023

1.1.1

Jun 9, 2023

1.1.0

Jun 3, 2023

1.0.3

May 19, 2023

1.0.2

Apr 29, 2023

1.0.1

Apr 28, 2023

This version

1.0.0

Apr 28, 2023

0.0.0

Apr 14, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tamp-1.0.0.tar.gz (22.1 kB view details)

Uploaded Apr 28, 2023 Source

Built Distribution

tamp-1.0.0-py3-none-any.whl (21.2 kB view details)

Uploaded Apr 28, 2023 Python 3

File details

Details for the file tamp-1.0.0.tar.gz.

File metadata

Download URL: tamp-1.0.0.tar.gz
Upload date: Apr 28, 2023
Size: 22.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.10.2 Linux/5.15.0-1036-azure

File hashes

Hashes for tamp-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`50d9b2aaed92942c9322a254e8d7b2c1e0e993f8f002665d57d3466eaa3d9e8d`
MD5	`436991f8c1495c0cf71a5bc62702ce9e`
BLAKE2b-256	`9e13c76f8235e86f204f536271512c04c7396d98bb92f60e688877394498b972`

See more details on using hashes here.

File details

Details for the file tamp-1.0.0-py3-none-any.whl.

File metadata

Download URL: tamp-1.0.0-py3-none-any.whl
Upload date: Apr 28, 2023
Size: 21.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.10.2 Linux/5.15.0-1036-azure

File hashes

Hashes for tamp-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`770a49c5ce2cb3ac44e63db840bf845a9e146d09c337f7840bea493fc85f748b`
MD5	`dd2810d61bb87ac79088d911b0c1e02d`
BLAKE2b-256	`ef95128192bb1cdbb7d5f821da29e31ee3b78d91a993644d7b25c42513a6337e`