Using Machine Learning to learn how to Compress
Project description
You can read the introductory blog post or try it live at https://shrynk.ai
Features
- ✓ Compress your data smartly based on Machine Learning
- ✓ Takes User Requirements in the form of weights for
size
,write_time
andread_time
- ✓ Trains & caches a model based on compression methods available in the system, using packaged data
- ✓ CLI for compressing and decompressing
- ✓ Works with
CSV
,JSON
andBytes
in general
CLI
shrynk compress myfile.json # will yield e.g. myfile.json.gz or myfile.json.bz2
shrynk decompress myfile.json.gz # will yield myfile.json
shrynk compress myfile.csv --size 0 --write 1 --read 0
shrynk benchmark myfile.csv # shows benchmark results
shrynk benchmark --predict myfile.csv # will also show the current prediction
shrynk benchmark --save --predict myfile.csv # will add the result to the training data too
Usage
Installation:
pip install shrynk
Then in Python:
import pandas as pd
from shrynk import save, load
# save dataframe compressed
my_df = pd.DataFrame({"a": [1]})
file_path = save(my_df, "mypath.csv")
# e.g. mypath.csv.bz2
# load compressed file
loaded_df = load(file_path)
If you just want the prediction, you can also:
import pandas as pd
from shrynk import infer
infer(pd.DataFrame({"a": [1]}))
# {"engine": "csv", "compression": "bz2"}
Add your own data
If you want more control you can do the following:
import pandas as pd
from shrynk import PandasCompressor
df = pd.DataFrame({"a": [1, 2, 3]})
pdc = PandasCompressor("default")
pdc.run_benchmarks(df) # adds data to the default
pdc.train_model(size=3, write=1, read=1)
pdc.predict(df)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
shrynk-0.2.25.tar.gz
(2.9 MB
view hashes)
Built Distribution
Close
Hashes for shrynk-0.2.25-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 036f1e8a3d383e581ab03f84099bb4963e37422bafa999f6a54befc321aa5792 |
|
MD5 | c03bb32f49a0fc2a7abec1e361b45733 |
|
BLAKE2b-256 | ade06e7705989faea1ccc9b6129bf940d68926ba3d95ad50f97a88e4c42b9a02 |