Skip to main content

Simple ANS (Asymmetric Numeral Systems) implementation

Project description

simple_ans

A Python package that provides lossless compression of integer datasets through Asymmetric Numeral Systems (ANS), implemented in C++ with pybind11 bindings.

I used the following to guide the implementation:

While there are certainly many ANS implementations that are parts of other packages, this one strives to be as simple as possible, with the C++ implementation being just a small amount of code in a single file. The Python interface is also simple and easy to use. At the same time it attempts to be as efficient as possible both in terms of compression ratio and encoding/decoding speed.

Important: This implementation is designed for data with approximately 2 to 5000 distinct values. Performance may degrade significantly with datasets containing more unique values.

Technical overview of ANS and Streaming ANS

Installation

simple_ans is available on PyPI:

pip install simple-ans

Developers may want to clone the repository and do an editable install:

git clone https://github.com/flatironinstitute/simple_ans.git
cd simple_ans
pip install -e .

For developers who want automatic rebuilds of the compiled extension:

pip install "scikit-build-core>=0.5.0" "pybind11>=2.11.1" "pip>=24" ninja
pip install -e . -Ceditable.rebuild=true --no-build-isolation

Usage

This package is designed for compressing quantized numerical data.

import numpy as np
from simple_ans import ans_encode, ans_decode

# Example: Compressing quantized Gaussian data
# Generate sample data following normal distribution
n_samples = 10000
# Generate Gaussian data, scale by 4, and quantize to integers
signal = np.round(np.random.normal(0, 1, n_samples) * 4).astype(np.int32)

# Encode (automatically determines optimal symbol counts)
encoded = ans_encode(signal)

# Decode
decoded = ans_decode(encoded)

# Verify
assert np.all(decoded == signal)

# Get compression stats
original_size = signal.nbytes
compressed_size = encoded.size()  # in bytes
compression_ratio = original_size / compressed_size
print(f"Compression ratio: {compression_ratio:.2f}x")

Tests

To run the tests, install with the test extra and run pytest:

pip install "simple-ans[test]"
pytest tests/

Simple benchmark

You can run a very simple benchmark that compares simple_ans with zlib, zstandard, lzma, and blosc2 at various compression levels for a toy dataset of quantized Gaussian noise. See devel/benchmark.py and devel/benchmark_ans_only.py.

The benchmark.py also runs in a CI environment and produces the following graph:

Benchmark

We see that for this example, the ANS-based compression ratio is higher than the other methods, almost reaching the theoretical ideal. The encode rate in MB/s is also faster than all but blosc. The decode rate is faster than Zlib and lzma but slower than Zstandard or blosc. I think in principle, we should be able to speed up the decoding. Let me know if you have ideas for this.

To install the benchmark dependencies, use:

pip install .[benchmark]

Extended benchmarks

A more comprehensive benchmark (devel/benchmark2.py) tests the compression performance across different types of distributions:

  • Bernoulli distributions with varying probabilities (p = 0.1 to 0.5)
  • Quantized Gaussian distributions with different quantization steps
  • Poisson distributions with various lambda parameters

The benchmark compares simple_ans against zstd-22, zlib-9, and blosc (using bitshuffle, zstd-1, and 2 MiB blocks), measuring compression ratios and processing speeds:

Compression Ratios

Encode Speeds

Decode Speeds

The results show that simple_ans achieves the overall highest compression ratios—close to the theoretical ideal across all distributions. The encode speed is faster than all but blosc. blosc typically achieves the highest encode and decode speeds and the second-highest compression ratios.

Authors

Jeremy Magland, Center for Computational Mathematics, Flatiron Institute

Robert Blackwell, Scientific Computing Core, Flatiron Institute

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_ans-0.3.1.tar.gz (48.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

simple_ans-0.3.1-cp313-cp313-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

simple_ans-0.3.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (179.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

simple_ans-0.3.1-cp312-cp312-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

simple_ans-0.3.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (179.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

simple_ans-0.3.1-cp311-cp311-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

simple_ans-0.3.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (179.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

simple_ans-0.3.1-cp310-cp310-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

simple_ans-0.3.1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (178.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

simple_ans-0.3.1-cp39-cp39-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9musllinux: musl 1.2+ x86-64

simple_ans-0.3.1-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (179.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file simple_ans-0.3.1.tar.gz.

File metadata

  • Download URL: simple_ans-0.3.1.tar.gz
  • Upload date:
  • Size: 48.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for simple_ans-0.3.1.tar.gz
Algorithm Hash digest
SHA256 1d936a34e72bc8c78ad32e8d356da49039fddcd2680f33dffa62a707db9744b2
MD5 4684bf8ad8781eda8ecf526426ab9494
BLAKE2b-256 f1630bf97388ed6ab1b3fd24fecb24f8db6142ff2de74d118efac9e542ba5642

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ans-0.3.1.tar.gz:

Publisher: publish.yml on flatironinstitute/simple_ans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file simple_ans-0.3.1-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for simple_ans-0.3.1-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 4f18eaa0c91818d8c6c78a55cf434d68d062f3ba46880cac8f8222f4e7eb5031
MD5 aa91351f5c1920486d6d828451962bf7
BLAKE2b-256 22b4d209f6a14820634198c25887ac26ad8e313d875b808727774f86bd45e4cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ans-0.3.1-cp313-cp313-musllinux_1_2_x86_64.whl:

Publisher: publish.yml on flatironinstitute/simple_ans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file simple_ans-0.3.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for simple_ans-0.3.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 51be6d8e478ee66a8e40251d2abcbe503496cfc4493606bbd3924047819fd9db
MD5 db291ecdc175cc81c0e649443cb41a2f
BLAKE2b-256 2109904ddc357a25dc11c4a62623c3bc60cdc6aca7e183ad9bfb1b258357abd8

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ans-0.3.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: publish.yml on flatironinstitute/simple_ans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file simple_ans-0.3.1-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for simple_ans-0.3.1-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 d3a99105d912003e8e87eda03ddda0bfd82f9e02f0039da56e016f7080c1bef2
MD5 16b7e915c2ba639f411a412700f97e13
BLAKE2b-256 b915355c5ae1b8cfd50b8ee81cbb4bf67882385bb277e2d02d3f39a168ffe7cf

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ans-0.3.1-cp312-cp312-musllinux_1_2_x86_64.whl:

Publisher: publish.yml on flatironinstitute/simple_ans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file simple_ans-0.3.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for simple_ans-0.3.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 b210ae1e89b747ffdf5dcdae144eaf57e507315e6d403c10f481b1af03970159
MD5 3ef909f99c71edc9787e797b64b55f3f
BLAKE2b-256 65ec7dd19f7dea10d34b165dee575a358e62b14fabd0e4661910030d91c3f011

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ans-0.3.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: publish.yml on flatironinstitute/simple_ans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file simple_ans-0.3.1-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for simple_ans-0.3.1-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 104e392e83fa4cc316f781125d482c7c94bccc14d12dd95b1df7e780e75ec1f8
MD5 62328e35b79d316beb324e66e95b7980
BLAKE2b-256 c30d2278358c707ceed4cfac255a6e8d65fd85819f4fb7a4b65261e2f4c06bac

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ans-0.3.1-cp311-cp311-musllinux_1_2_x86_64.whl:

Publisher: publish.yml on flatironinstitute/simple_ans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file simple_ans-0.3.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for simple_ans-0.3.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 c32cd06e7c22990c6051c448995e3520ad2649a66dce25530cd02780ef41634a
MD5 e9e66ba3dcea764d7257c225c4f56d8c
BLAKE2b-256 3342c26b366b27d0f16267aaf1b539588a3eaf774f0e0f52333760cc3fc71fc3

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ans-0.3.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: publish.yml on flatironinstitute/simple_ans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file simple_ans-0.3.1-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for simple_ans-0.3.1-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 5166e50953d08d34388ae4dfce5208dbf5ddf99134dbf670e19301587616937a
MD5 9dcf6d9144c909c79cdfcc02bb618ccf
BLAKE2b-256 9a856ce2687747e1f2fb67c60b2663a32ccb4344de87926681d9a61e75e44a0c

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ans-0.3.1-cp310-cp310-musllinux_1_2_x86_64.whl:

Publisher: publish.yml on flatironinstitute/simple_ans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file simple_ans-0.3.1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for simple_ans-0.3.1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 c4bb70eb9e2e1ca6293a2dc8604a109833ba7f6f742750e1312457555ae52eaa
MD5 dda0a3d44e8e0e0e76312c7daa14b7f8
BLAKE2b-256 9c430a3a2bce7f796fcf31feb0505fb82685823c4e0069dddcb6cf66bcfffe61

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ans-0.3.1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: publish.yml on flatironinstitute/simple_ans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file simple_ans-0.3.1-cp39-cp39-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for simple_ans-0.3.1-cp39-cp39-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 a846bd87d26b7b3a8816ae5573ca3aec1ffe44ef028d8a5b206447f4a060d4be
MD5 afae5a77fa61ec9e3d93c9f1bb5e0b25
BLAKE2b-256 c81f298503a8c72e58698bc1e5d34225e1bdcd5e973cf0b81963ff014267db4b

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ans-0.3.1-cp39-cp39-musllinux_1_2_x86_64.whl:

Publisher: publish.yml on flatironinstitute/simple_ans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file simple_ans-0.3.1-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for simple_ans-0.3.1-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 ea169f7e7feffd5b957bdf5d5a63f52146d3fccf3dfa444eca95d8e8f4f5c16e
MD5 edb4e185a46c33f33f7afd19ef0f0c36
BLAKE2b-256 55f0b43dbe57caab2d7d8e0f795d253c5071eb3d1b63c5b4abf04cd7db88c54a

See more details on using hashes here.

Provenance

The following attestation bundles were made for simple_ans-0.3.1-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: publish.yml on flatironinstitute/simple_ans

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page