Skip to main content

Composable entropy coding primitives for research and production (Python and Rust).

Project description

Composable Entropy Coding Primitives for Research and Production

The constriction library provides a set of composable implementations of entropy coding algorithms. It has APIs for both the Python and Rust languages and it focuses on correctness, versatility, ease of use, compression performance, and computational efficiency. The goals of constriction are to three-fold:

  1. to facilitate research on novel lossless and lossy compression methods by providing a composable set of entropy coding primitives rather than a rigid implementation of a single preconfigured method; in compression research, different applications put different requirements on the entropy coding method. For example, you may prefer a Range Coder for an autoregressive entropy model (because it preserves the order of encoded symbols), but you may prefer an ANS Coder for a hierarchical entropy model (because it supports bits-back coding). With many other libraries, swapping out a Range Coder for an ANS Coder would mean that you not only have to find and learn how to use new library, but you would also have to port the part of your code that represents probabilistic entropy models so that it adheres to the rules of the new library. By contrast, the composable architecture of constriction lets you seamlessly swap out individual components of your compression pipeline (such as the core entropy coding algorithm) independently from other components (such as the fixed-point representation of entropy models or the strategy for dealing with zero probability symbols).
  2. to simplify the transition from research code to reliable software products by exposing the exact same functionality via both a Python API (for rapid prototyping on research code) and a Rust API (for turning successful prototypes into production); This approach bridges the gap between two communities that have vastly different requirements on their software development tools: while data scientists and machine learning researchers need the quick iteration cycles that scripting languages like Python provide, real-world compression codecs that are to be used outside of laboratory conditions have to be implemented in a compiled language that runs fast and that doesn't require setting up a complex runtime environment with lots of dependencies. With constriction, you can seamlessly turn your Python research code into a high-performance standalone binary, library, or WebAssembly module. By default, the Python and Rust API are binary compatible, so you can gradually port one component at a time without breaking things. On top of this, the Rust API provides optional fine-grained control over issues relevant to real-world deployments such as the trade-off between compression effectiveness, memory usage, and run-time efficiency, as well as hooks into the backing data sources and sinks, while preventing accidental misuse through Rust's powerful type system.
  3. to serve as a teaching resource by providing a collection of several complementary entropy coding algorithms within a single consistent framework, thus making the various algorithms easily discoverable and comparable on practical examples; additional teaching material is being made publicly available as a by-product of an ongoing university course on data compression with deep probabilistic models.

For an example of a compression codec that started as research code in Python and was then deployed as a fast and dependency-free WebAssembly module using constriction's Rust API, have a look at The Linguistic Flux Capacitor.

Project Status

We currently provide implementations of the following entropy coding algorithms:

  • Asymmetric Numeral Systems (ANS): a fast modern entropy coder with near-optimal compression effectiveness that supports advanced use cases like bits-back coding.
  • Range Coding: a computationally efficient variant of Arithmetic Coding, that has essentially the same compression effectiveness as ANS Coding but operates as a queue ("first in first out"), which makes it preferable for autoregressive models.
  • Chain Coding: an experimental new entropy coder that combines the (net) effectiveness of stream codes with the locality of symbol codes; it is meant for experimental new compression approaches that perform joint inference, quantization, and bits-back coding in an end-to-end optimization. This experimental coder is mainly provided to prove to ourselves that the API for encoding and decoding, which is shared across all stream coders, is flexible enough to express complex novel tasks.
  • Huffman Coding: a well-known symbol code, mainly provided here for teaching purpose; you'll usually want to use a stream code like ANS or Range Coding instead since symbol codes can have a considerable overhead on the bitrate, especially in the regime of low entropy per symbol, which is common in machine-learning based compression methods.

Further, constriction provides implementations of common probability distributions in fixed-point arithmetic, which can be used as entropy models in either of the above stream codes. The library also provides adapters for turning custom probability distributions into exactly invertible fixed-point arithmetic.

The provided implementations of entropy coding algorithms and probability distributions are extensively tested and should be considered reliable (except for the still experimental Chain Coder). However, their APIs may change in future versions of constriction if more user experience reveals any shortcomings of the current APIs in terms of ergonomics. Please file an issue if you run into a scenario where the current APIs are suboptimal.

Quick Start Guides And Examples in Python and Rust

Python

The easiest way to install constriction for Python is via pip (the following command also installs scipy, which is not required but useful if you want to use constriction with custom probability distributions):

pip install constriction numpy scipy

Then go ahead and use it:

import constriction
import numpy as np

# Let's use a Range Coder in this example. Constriction also provides an ANS 
# Coder, a Huffman Coder, and an experimental new "Chain Coder".
encoder = constriction.stream.queue.RangeEncoder()

# Define some data and a sequence of entropy models. We use quantized Gaussians
# here, but you could also use other models or even provide your own.
min_supported_symbol, max_supported_symbol = -100, 100
symbols = np.array([23, -15, 78, 43, -69], dtype=np.int32)
means = np.array([35.2, -1.7, 30.1, 71.2, -75.1], dtype=np.float64)
stds = np.array([10.1, 25.3, 23.8, 35.4, 3.9], dtype=np.float64)

# Encode the symbols and get the compressed data.
encoder.encode_leaky_gaussian_symbols(
    symbols, min_supported_symbol, max_supported_symbol, means, stds)
compressed = encoder.get_compressed()
print(compressed)

# Create a decoder and recover the original symbols.
decoder = constriction.stream.queue.RangeDecoder(compressed)
reconstructed = decoder.decode_leaky_gaussian_symbols(
    min_supported_symbol, max_supported_symbol, means, stds)
assert np.all(reconstructed == symbols)

There's a lot more you can do with constriction's Python API. Please check out the Python API Documentation.

Rust

Add this line to your Cargo.toml:

[dependencies]
constriction = "0.1.2"
probability = "0.17" # Not strictly required but used in many code examples.

If you compile in no_std mode then you have to deactivate constriction's default features (and you can't use the probability crate):

[dependencies]
constriction = {version = "0.1.2", default-features = false} # for `no_std` mode

Then go ahead and use it:

use constriction::stream::{model::DefaultLeakyQuantizer, stack::DefaultAnsCoder, Decode};

// Let's use an ANS Coder in this example. Constriction also provides a Range
// Coder, a Huffman Coder, and an experimental new "Chain Coder".
let mut coder = DefaultAnsCoder::new();
 
// Define some data and a sequence of entropy models. We use quantized Gaussians here,
// but `constriction` also provides other models and allows you to implement your own.
let symbols = vec![23i32, -15, 78, 43, -69];
let quantizer = DefaultLeakyQuantizer::new(-100..=100);
let means = vec![35.2f64, -1.7, 30.1, 71.2, -75.1];
let stds = vec![10.1f64, 25.3, 23.8, 35.4, 3.9];
let models = means.iter().zip(&stds).map(
    |(&mean, &std)| quantizer.quantize(probability::distribution::Gaussian::new(mean, std))
);

// Encode symbols (in *reverse* order, because ANS Coding operates as a stack).
coder.encode_symbols_reverse(symbols.iter().zip(models.clone())).unwrap();

// Obtain temporary shared access to the compressed bit string. If you want ownership of the
// compressed bit string, call `.into_compressed()` instead of `.get_compressed()`.
println!("Encoded into {} bits: {:?}", coder.num_bits(), &*coder.get_compressed().unwrap());

// Decode the symbols and verify correctness.
let reconstructed = coder.decode_symbols(models).collect::<Result<Vec<_>, _>>().unwrap();
assert_eq!(reconstructed, symbols);

There's a lot more you can do with constriction's Rust API. Please check out the Rust API Documentation.

Compiling From Source

Users of constriction typically don't need to manually compile the library from source. Just install constriction via pip or cargo as described in the above quick start guides.

Contributors can compile constriction manually as follows:

  1. Prepare your system:
    • If you don't have a Rust toolchain, install one as described on https://rustup.rs
    • If you already have a Rust toolchain, make sure it's on version 1.51 or later. Run rustc --version to find out and rustup update stable if you need to update.
  2. git clone the repository and cd into it.
  3. To compile the Rust library:
    • compile in development mode and execute all tests: cargo test
    • compile in release mode (i.e., with optimizations) and run the benchmarks: cargo bench
  4. If you want to compile the Python module:
    • install poetry.
    • install Python dependencies: cd into the repository and run poetry install
    • build the Python module: poetry run maturin develop '--cargo-extra-args=--features pybindings'
    • run Python unit tests: poetry run pytest tests/python
    • start a Python REPL that sees the compiled Python module: poetry run ipython

Contributing

Pull requests and issue reports are welcome. Unless contributors explicitly state otherwise at the time of contributing, all contributions will be assumed to be licensed under either one of MIT license, Apache License Version 2.0, or Boost Software License Version 1.0, at the choice of each licensee.

There's no official guide for contributions since nobody reads those anyway. Just be nice to other people and act like a grown-up (i.e., it's OK to make mistakes as long as you strive for improvement and are open to respectfully phrased opinions of other people).

License

This work is licensed under the terms of the MIT license, Apache License Version 2.0, or Boost Software License Version 1.0. You can choose between one of them if you use this work. See the files whose name start with LICENSE in this directory. The compiled python extension module is linked with a number of third party libraries. Binary distributions of the constriction python extension module contain a file LICENSE.html that includes all licenses of all dependencies (the file is also available online).

What's With the Name?

Constriction is a library of compression primitives with bindings for Rust and Python. Pythons are a family of nonvenomous snakes that subdue their prey by "compressing" it, a method known as constriction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

constriction-0.1.2-cp39-none-win_amd64.whl (257.7 kB view details)

Uploaded CPython 3.9Windows x86-64

constriction-0.1.2-cp39-cp39-manylinux2010_x86_64.whl (306.1 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.12+ x86-64

constriction-0.1.2-cp39-cp39-macosx_10_7_x86_64.whl (293.0 kB view details)

Uploaded CPython 3.9macOS 10.7+ x86-64

constriction-0.1.2-cp38-none-win_amd64.whl (257.8 kB view details)

Uploaded CPython 3.8Windows x86-64

constriction-0.1.2-cp38-cp38-manylinux2010_x86_64.whl (306.1 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64

constriction-0.1.2-cp38-cp38-macosx_10_7_x86_64.whl (293.2 kB view details)

Uploaded CPython 3.8macOS 10.7+ x86-64

constriction-0.1.2-cp37-none-win_amd64.whl (257.4 kB view details)

Uploaded CPython 3.7Windows x86-64

constriction-0.1.2-cp37-cp37m-manylinux2010_x86_64.whl (306.3 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64

constriction-0.1.2-cp37-cp37m-macosx_10_7_x86_64.whl (293.1 kB view details)

Uploaded CPython 3.7mmacOS 10.7+ x86-64

File details

Details for the file constriction-0.1.2-cp39-none-win_amd64.whl.

File metadata

  • Download URL: constriction-0.1.2-cp39-none-win_amd64.whl
  • Upload date:
  • Size: 257.7 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for constriction-0.1.2-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 946b8adfdf68caa2850f0e0d274a4ede70c3dca388ef6c0ae558347b7522a48d
MD5 d5ec97bd7b5622ac8d403c69dba5de96
BLAKE2b-256 b296023ccdd5c50c9d6a4aff2c79011ff5f1fc541a0294cb79bd2d6e40da5685

See more details on using hashes here.

File details

Details for the file constriction-0.1.2-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

  • Download URL: constriction-0.1.2-cp39-cp39-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 306.1 kB
  • Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for constriction-0.1.2-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 90d6b763667d38a49c63c3bc40a97203a5325cf0641858c0977c0746b699b330
MD5 0e5b35ebbf0b391d2a9158755d0a5169
BLAKE2b-256 f6a01b0c39e89a566795568faa13b9913e3257d5a44ebebf41a1743ebb71dc68

See more details on using hashes here.

File details

Details for the file constriction-0.1.2-cp39-cp39-macosx_10_7_x86_64.whl.

File metadata

  • Download URL: constriction-0.1.2-cp39-cp39-macosx_10_7_x86_64.whl
  • Upload date:
  • Size: 293.0 kB
  • Tags: CPython 3.9, macOS 10.7+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for constriction-0.1.2-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 77332e2068b98fade41820adfbbd7c53da4876d101b4d09c4f8d8d6e126d8885
MD5 a18a2aed7f217484087fe012f2e203f2
BLAKE2b-256 b217f11b9e5bd24c09c427f331c35faecc73811b3f036cd44ae125d7ce22c470

See more details on using hashes here.

File details

Details for the file constriction-0.1.2-cp38-none-win_amd64.whl.

File metadata

  • Download URL: constriction-0.1.2-cp38-none-win_amd64.whl
  • Upload date:
  • Size: 257.8 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.10

File hashes

Hashes for constriction-0.1.2-cp38-none-win_amd64.whl
Algorithm Hash digest
SHA256 4d26d5b70784ca133026fcccf751f80739016a35d3f0328c780d089b561c4779
MD5 8d72f127768b8635f77688f16b32ab95
BLAKE2b-256 a1c1ac57dc74d8106ed62cbc37ed6d98590aefc271ef40911d554ecc5a5b6f56

See more details on using hashes here.

File details

Details for the file constriction-0.1.2-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: constriction-0.1.2-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 306.1 kB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.10

File hashes

Hashes for constriction-0.1.2-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 79287a62929d3528405c2f9554432cdd58e324afa9cfdfef3e67b1dbc8810f55
MD5 00d96ba47b4aec77308f0bfe3667f29a
BLAKE2b-256 1414249a072e84646cd0a171211446d095c3dce1871aba295fbe79356d703772

See more details on using hashes here.

File details

Details for the file constriction-0.1.2-cp38-cp38-macosx_10_7_x86_64.whl.

File metadata

  • Download URL: constriction-0.1.2-cp38-cp38-macosx_10_7_x86_64.whl
  • Upload date:
  • Size: 293.2 kB
  • Tags: CPython 3.8, macOS 10.7+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.10

File hashes

Hashes for constriction-0.1.2-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 d44cf66939c56e0cc5fa2f68305becb64021ce35590651fc2dd577c9b1438701
MD5 eec9ea3ab48a42fbba78c1ae42f0a28a
BLAKE2b-256 1051a6045e63124c410c2d8b888609f6757a75293c345c0ba4d8ef2dfeac8c4a

See more details on using hashes here.

File details

Details for the file constriction-0.1.2-cp37-none-win_amd64.whl.

File metadata

  • Download URL: constriction-0.1.2-cp37-none-win_amd64.whl
  • Upload date:
  • Size: 257.4 kB
  • Tags: CPython 3.7, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.9

File hashes

Hashes for constriction-0.1.2-cp37-none-win_amd64.whl
Algorithm Hash digest
SHA256 872c071e33931156c60705ecf95c19a56fcc1508db8c884a8953ff8651123d4e
MD5 e472c34aa055942371b9daa1813de625
BLAKE2b-256 b877b7c928356d76cac6d03a5510a0d9753149413f83e1135a15d87a0133290f

See more details on using hashes here.

File details

Details for the file constriction-0.1.2-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: constriction-0.1.2-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 306.3 kB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10

File hashes

Hashes for constriction-0.1.2-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 e067ba197a8a16b95d017739d01e36e5bb2c0412d189d2e85ccfb745ab01cd06
MD5 0eea8e1f1374a90fede603cb6ff2529d
BLAKE2b-256 6c963e74a923542af935948a31c0b00e224f55b9968626df6a490a8fec29e424

See more details on using hashes here.

File details

Details for the file constriction-0.1.2-cp37-cp37m-macosx_10_7_x86_64.whl.

File metadata

  • Download URL: constriction-0.1.2-cp37-cp37m-macosx_10_7_x86_64.whl
  • Upload date:
  • Size: 293.1 kB
  • Tags: CPython 3.7m, macOS 10.7+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10

File hashes

Hashes for constriction-0.1.2-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 3b0d8dd80fdf0d635ae253ef95fb9ed6a4595477eb1f24754ea34e585fd553eb
MD5 80eea0f8e983cdc525caf7c2472d88e3
BLAKE2b-256 7ae3435f9f4398b6e35c2ba412a42640a98867093df5f7fb98b9fb9e1e245f91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page