Python package to convert numerical series & numpy arrays into compressed strings
Project description
numcompress
Simple way to compress and decompress numerical series & numpy arrays.
 Easily gets you above 80% compression ratio
 You can specify the precision you need for floating points (up to 10 decimal points)
 Useful to store or transmit stock prices, monitoring data & other time series data in compressed string format
Compression algorithm is based on google encoded polyline format. I modified it to preserve arbitrary precision and apply it to any numerical series. The work is motivated by usefulness of time aware polyline built by Arjun Attam at HyperTrack. After building this I came across arrays that are much efficient than lists in terms memory footprint. You might consider using that over numcompress if you don't care about conversion to string for transmitting or storing purpose.
Installation
pip install numcompress
Usage
from numcompress import compress, decompress # Integers >>> compress([14578, 12759, 13525]) 'B_twxZnv_nB_bwm@' >>> decompress('B_twxZnv_nB_bwm@') [14578.0, 12759.0, 13525.0]
# Floats  lossless compression # precision argument specifies how many decimal points to preserve, defaults to 3 >>> compress([145.7834, 127.5989, 135.2569], precision=4) 'Csi~wAhdbJgqtC' >>> decompress('Csi~wAhdbJgqtC') [145.7834, 127.5989, 135.2569]
# Floats  lossy compression >>> compress([145.7834, 127.5989, 135.2569], precision=2) 'Acn[rpB{n@' >>> decompress('Acn[rpB{n@') [145.78, 127.6, 135.26]
# compressing and decompressing numpy arrays >>> from numcompress import compress_ndarray, decompress_ndarray >>> import numpy as np >>> series = np.random.randint(1, 100, 25).reshape(5, 5) >>> compressed_series = compress_ndarray(series) >>> decompressed_series = decompress_ndarray(compressed_series) >>> series array([[29, 95, 10, 48, 20], [60, 98, 73, 96, 71], [95, 59, 8, 6, 17], [ 5, 12, 69, 65, 52], [84, 6, 83, 20, 50]]) >>> compressed_series '5*5,Bosw@__Cn_eD_fiA~tu@_cmA_fiAnyo@ok@nyo@_{m@~heAnrbB~{BonT~lVotLoinB~xFnkX_o}@~iwCokuCn`zB_ry@' >>> decompressed_series array([[29., 95., 10., 48., 20.], [60., 98., 73., 96., 71.], [95., 59., 8., 6., 17.], [ 5., 12., 69., 65., 52.], [84., 6., 83., 20., 50.]]) >>> (series == decompressed_series).all() True
Compression Ratio
Test  # of Numbers  Compression ratio 

Integers  10k  91.14% 
Floats  10k  81.35% 
You can run the test suite with s switch to see the compression ratio. You can even modify the tests to see what kind of compression ratio you will get for your own input.
pytest s
Here's a quick example showing compression ratio:
>>> series = random.sample(range(1, 100000), 50000) # generate 50k random numbers between 1 and 100k >>> text = compress(series) # apply compression >>> original_size = sum(sys.getsizeof(i) for i in series) >>> original_size 1200000 >>> compressed_size = sys.getsizeof(text) >>> compressed_size 284092 >>> compression_ratio = ((original_size  compressed_size) * 100.0) / original_size >>> compression_ratio 76.32566666666666
We get ~76% compression for 50k random numbers between 1 & 100k. This ratio increases for real world numerical series as the difference between consecutive numbers tends to be lower. Think of stock prices, monitoring & other time series data.
Contribute
If you see any problem, open an issue or send a pull request. You can write to me at amit.juschill@gmail.com
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size  File type  Python version  Upload date  Hashes 

Filename, size numcompress0.1.2py3noneany.whl (5.4 kB)  File type Wheel  Python version py3  Upload date  Hashes View 
Filename, size numcompress0.1.2.tar.gz (5.3 kB)  File type Source  Python version None  Upload date  Hashes View 
Hashes for numcompress0.1.2py3noneany.whl
Algorithm  Hash digest  

SHA256  a169693f69f4679e0a0a5c95876621346c72d8c3ccc5c4438e682958bb2cf81c 

MD5  b6d436044611eb30ecf29f1a4ddcdba2 

BLAKE2256  3fe2b2af4b85fb5e2dd1aeedfaa35a72739f2084f2aaea58483622cbae295a50 