Skip to main content

Python package to convert numerical series & numpy arrays into compressed strings

Project description

PyPI version Build Status Coverage Status

numcompress

Simple way to compress and decompress numerical series & numpy arrays.

  • Easily gets you above 80% compression ratio
  • You can specify the precision you need for floating points (up to 10 decimal points)
  • Useful to store or transmit stock prices, monitoring data & other time series data in compressed string format

Compression algorithm is based on google encoded polyline format. I modified it to preserve arbitrary precision and apply it to any numerical series. The work is motivated by usefulness of time aware polyline built by Arjun Attam at HyperTrack. After building this I came across arrays that are much efficient than lists in terms memory footprint. You might consider using that over numcompress if you don't care about conversion to string for transmitting or storing purpose.

Installation

pip install numcompress

Usage

from numcompress import compress, decompress

# Integers
>>> compress([14578, 12759, 13525])
'B_twxZnv_nB_bwm@'

>>> decompress('B_twxZnv_nB_bwm@')
[14578.0, 12759.0, 13525.0]
# Floats - lossless compression
# precision argument specifies how many decimal points to preserve, defaults to 3
>>> compress([145.7834, 127.5989, 135.2569], precision=4)
'Csi~wAhdbJgqtC'

>>> decompress('Csi~wAhdbJgqtC')
[145.7834, 127.5989, 135.2569]
# Floats - lossy compression
>>> compress([145.7834, 127.5989, 135.2569], precision=2)
'Acn[rpB{n@'

>>> decompress('Acn[rpB{n@')
[145.78, 127.6, 135.26]
# compressing and decompressing numpy arrays
>>> from numcompress import compress_ndarray, decompress_ndarray
>>> import numpy as np

>>> series = np.random.randint(1, 100, 25).reshape(5, 5)
>>> compressed_series = compress_ndarray(series)
>>> decompressed_series = decompress_ndarray(compressed_series)

>>> series
array([[29, 95, 10, 48, 20],
       [60, 98, 73, 96, 71],
       [95, 59,  8,  6, 17],
       [ 5, 12, 69, 65, 52],
       [84,  6, 83, 20, 50]])

>>> compressed_series
'5*5,Bosw@_|_Cn_eD_fiA~tu@_cmA_fiAnyo@o|k@nyo@_{m@~heAnrbB~{BonT~lVotLoinB~xFnkX_o}@~iwCokuCn`zB_ry@'

>>> decompressed_series
array([[29., 95., 10., 48., 20.],
       [60., 98., 73., 96., 71.],
       [95., 59.,  8.,  6., 17.],
       [ 5., 12., 69., 65., 52.],
       [84.,  6., 83., 20., 50.]])

>>> (series == decompressed_series).all()
True

Compression Ratio

Test # of Numbers Compression ratio
Integers 10k 91.14%
Floats 10k 81.35%

You can run the test suite with -s switch to see the compression ratio. You can even modify the tests to see what kind of compression ratio you will get for your own input.

pytest -s

Here's a quick example showing compression ratio:

>>> series = random.sample(range(1, 100000), 50000)  # generate 50k random numbers between 1 and 100k
>>> text = compress(series)  # apply compression

>>> original_size = sum(sys.getsizeof(i) for i in series)
>>> original_size
1200000

>>> compressed_size = sys.getsizeof(text)
>>> compressed_size
284092

>>> compression_ratio = ((original_size - compressed_size) * 100.0) / original_size
>>> compression_ratio
76.32566666666666

We get ~76% compression for 50k random numbers between 1 & 100k. This ratio increases for real world numerical series as the difference between consecutive numbers tends to be lower. Think of stock prices, monitoring & other time series data.

Contribute

If you see any problem, open an issue or send a pull request. You can write to me at amit.juschill@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numcompress-0.1.2.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

numcompress-0.1.2-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file numcompress-0.1.2.tar.gz.

File metadata

  • Download URL: numcompress-0.1.2.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.8

File hashes

Hashes for numcompress-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e7e4a2062943a1534a22021fd7901be66532b2c89ba04fbbf6d5ac47fe8f73a4
MD5 8332835a9cfb0a9e2c6ada9c453ddbe7
BLAKE2b-256 5c764fec4fc7534dc96fdb220a1aeccbc034c733eddd4dd6f0445b52f21c65d1

See more details on using hashes here.

File details

Details for the file numcompress-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: numcompress-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.8

File hashes

Hashes for numcompress-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a169693f69f4679e0a0a5c95876621346c72d8c3ccc5c4438e682958bb2cf81c
MD5 b6d436044611eb30ecf29f1a4ddcdba2
BLAKE2b-256 3fe2b2af4b85fb5e2dd1aeedfaa35a72739f2084f2aaea58483622cbae295a50

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page