Skip to main content

Using Machine Learning to learn how to Compress

Project description

Build Status PyPI PyPI HitCount

You can read the introductory blog post or try it live at https://shrynk.ai

Features

  • ✓ Compress your data smartly based on Machine Learning
  • ✓ Takes User Requirements in the form of weights for size, write_time and read_time
  • ✓ Trains & caches a model based on compression methods available in the system, using packaged data
  • CLI for compressing and decompressing
  • ✓ Works with CSV, JSON and Bytes in general

CLI

shrynk compress myfile.json       # will yield e.g. myfile.json.gz or myfile.json.bz2
shrynk decompress myfile.json.gz  # will yield myfile.json

shrynk compress myfile.csv --size 0 --write 1 --read 0

shrynk benchmark myfile.csv                  # shows benchmark results
shrynk benchmark --predict myfile.csv        # will also show the current prediction
shrynk benchmark --save --predict myfile.csv # will add the result to the training data too

Usage

Installation:

pip install shrynk

Then in Python:

import pandas as pd
from shrynk import save, load

# save dataframe compressed
my_df = pd.DataFrame({"a": [1]})
file_path = save(my_df, "mypath.csv")
# e.g. mypath.csv.bz2

# load compressed file
loaded_df = load(file_path)

If you just want the prediction, you can also:

import pandas as pd
from shrynk import infer

infer(pd.DataFrame({"a": [1]}))
# {"engine": "csv", "compression": "bz2"}

Add your own data

If you want more control you can do the following:

import pandas as pd
from shrynk import PandasCompressor

df = pd.DataFrame({"a": [1, 2, 3]})

pdc = PandasCompressor("default")
pdc.run_benchmarks(df) # adds data to the default

pdc.train_model(size=3, write=1, read=1)

pdc.predict(df)

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shrynk-0.2.25.tar.gz (2.9 MB view details)

Uploaded Source

Built Distribution

shrynk-0.2.25-py2.py3-none-any.whl (4.5 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file shrynk-0.2.25.tar.gz.

File metadata

  • Download URL: shrynk-0.2.25.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.0a0

File hashes

Hashes for shrynk-0.2.25.tar.gz
Algorithm Hash digest
SHA256 0bdf6cb92b88d132baea8c45c58defece17152aa52dac45d55b3251e85466c6d
MD5 ae018c77d30578392d40dfedb9e6265c
BLAKE2b-256 f1c51ec8a657e7858dd0f98e02e80ed148e5fff25abd6c442f9684996a9680c3

See more details on using hashes here.

File details

Details for the file shrynk-0.2.25-py2.py3-none-any.whl.

File metadata

  • Download URL: shrynk-0.2.25-py2.py3-none-any.whl
  • Upload date:
  • Size: 4.5 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.0a0

File hashes

Hashes for shrynk-0.2.25-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 036f1e8a3d383e581ab03f84099bb4963e37422bafa999f6a54befc321aa5792
MD5 c03bb32f49a0fc2a7abec1e361b45733
BLAKE2b-256 ade06e7705989faea1ccc9b6129bf940d68926ba3d95ad50f97a88e4c42b9a02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page