A python library for data compression using TAC (Tiny Anomaly Compression)
Project description
TACpy
TACpy is a Python software package for data compression using TAC (Tiny Anomaly Compression). The TAC algorithm is based on the concept the data eccentricity and does not require previously established mathematical models or any assumptions about the underlying data distribution. Additionally, it uses recursive equations, which enables an efficient computation with low computational cost, using little memory and processing power.
Dependencies
Python 3, Pandas, Numpy, Matplotlib, Seaborn, Scikit-learn, Ipython
Installation
In progress...
pip install tac
Example of Use
To begin you can import TACpy using
# FULL PACKAGE
import tac
Or try each of our implemented functionalities
# MODEL FUNCTIONS
from tac.models.TAC import TAC
from tac.models.AutoTAC import AutoTAC
# RUN FUNCTIONS
from tac.run.single import (print_run_details)
from tac.run.multiple import (run_multiple_instances, get_optimal_params, display_multirun_optimal_values, run_optimal_combination)
# UTILS FUNCTIONS
from tac.utils.format_save import (create_param_combinations, create_compressor_list, create_eval_df)
from tac.utils.metrics import (get_compression_report, print_compression_report, calc_statistics)
from tac.utils.plots import (plot_curve_comparison, plot_dist_comparison, plot_multirun_metric_results)
Running Multiple tests with TAC
- Setting up the initial variables
model_name = 'TAC_Voltage_data'
params = {
'window_size': np.arange(1, 41, 1),
'm': np.round(np.arange(0.1, 0.8, 0.1), 2),
}
param_combination = create_param_combinations(params)
compressor_list = create_compressor_list(param_combination)
- Once you created the list of compressors you can run
result_df = run_multiple_instances(compressor_list=compressor_list,
param_list=param_combination,
series_to_compress=df['voltage'].dropna(), # Example of sensor data
cf_score_beta=2
)
- This function returns a pandas Dataframe containing the results of all compression methods. You can expect something like:
- You can also check the optimal combination by running the following code:
display_multirun_optimal_values(result_df=result_df)
Parameter combinations for MAX CF_SCORE
param reduction_rate reduction_factor mse rmse nrmse mae 239 (35, 0.2) 0.96 22.86 37.25 6.10 0.16 1.35 240 (35, 0.3) 0.96 23.30 38.06 6.17 0.16 1.39 psnr ncc cf_score 239 31.56 0.99 0.98 240 31.47 0.99 0.98
Parameter combinations for NEAR MAX CF_SCORE
param reduction_rate reduction_factor mse rmse nrmse mae 99 (15, 0.2) 0.91 11.61 32.59 5.71 0.15 1.05 \ 215 (31, 0.6) 0.96 27.59 82.61 9.09 0.23 2.53 157 (23, 0.4) 0.94 17.06 60.59 7.78 0.20 1.59 105 (16, 0.1) 0.92 11.87 32.77 5.72 0.15 1.04 171 (25, 0.4) 0.95 18.24 62.56 7.91 0.20 1.69 psnr ncc cf_score 99 32.14 0.99 0.97 215 28.10 0.97 0.97 157 29.45 0.98 0.97 105 32.12 0.99 0.97 171 29.31 0.98 0.97
Visualize multirun results with a plot
- By default this plot returns a visualization for the metrics
reduction_rate
,ncc
andcf_score
.
plot_multirun_metric_results(result_df=result_df)
- The result should look like this;
Running a single complession with the optimal parameter found
- You don't need to run the visualization and the
display_multirun_optimal_values
in order to get the optimal compressor created, by running the following code it's possible to get the best result:
optimal_param_list = get_optimal_params(result_df=result_df)
print("Best compressor param combination: ", optimal_param_list)
- With the list of optimal parameter (There is a possibility that multiple compressors are considered the best) run the function below to get get the compression result.
points_to_keep, optimal_results_details = run_optimal_combination(optimal_list=optimal_param_list,
serie_to_compress=df['voltage'].dropna(),
model='TAC'
)
- If you want to see the result details use:
print_run_details(optimal_results_details)
POINTS:
- total checked: 50879
- total kept: 1114
- percentage discaded: 97.81 %
POINT EVALUATION TIMES (ms):
- mean: 0.0021414514134244587
- std: 0.046957627024743445
- median: 0.0
- max: 1.5192031860351562
- min: 0.0
- total: 108.95490646362305
RUN TIME (ms):
- total: 119.3452
Evaluating the Results
- Now, to finish the process of the compression, you should follow the next steps:
1. Step - Create the evaluation dataframe:
evaluation_df = create_eval_df(original=df['voltage'].dropna(), flag=points_to_keep)
evaluation_df.info()
2. Step - Evaluate the performance:
report = get_compression_report(
original=evaluation_df['original'],
compressed=evaluation_df['compressed'],
decompressed=evaluation_df['decompressed'],
cf_score_beta=2
)
print_compression_report(
report,
model_name=model_name,
cf_score_beta=2,
model_params=optimal_param_list
)
After that you expect to see something like the following informations:
RUN INFO
- Model: TAC_Voltage_data
- Optimal Params: [(35, 0.2), (35, 0.3)]
- CF-Score Beta: 2
RESULTS
SAMPLES NUMBER reduction
- Original length: 50879 samples
- Reduced length: 1114 samples
- Samples reduced by a factor of 45.67 times
- Sample reduction rate: 97.81%
FILE SIZE compression
- Original size: 544858 Bytes
- Compressed size: 14165 Bytes
- file compressed by a factor of 38.47 times
- file compression rate: 97.4%
METRICS
- MSE: 41.3406
- RMSE: 6.4297
- NRMSE: 0.164
- MAE: 1.4593
- PSNR: 31.1085
- NCC: 0.9865
- CF-Score: 0.984
3. Step - Create the model visualizations:
# plot the curves comparison (original vs decompressed)
plot_curve_comparison(
evaluation_df.original,
evaluation_df.decompressed,
show=True
)
And finally here is a example of the result:
Literature reference
-
Signoretti, G.; Silva, M.; Andrade, P.; Silva, I.; Sisinni, E.; Ferrari, P. "An Evolving TinyML Compression Algorithm for IoT Environments Based on Data Eccentricity". Sensors 2021, 21, 4153. https://doi.org/10.3390/s21124153
-
Medeiros, T.; Amaral, M.; Targino, M; Silva, M.; Silva, I.; Sisinni, E.; Ferrari, P.; "TinyML Custom AI Algorithms for Low-Power IoT Data Compression: A Bridge Monitoring Case Study" - 2023 IEEE International Workshop on Metrology for Industry 4.0 & IoT (MetroInd4.0&IoT), 2023. 10.1109/MetroInd4.0IoT57462.2023.10180152
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file conect2ai-0.1.0.tar.gz
.
File metadata
- Download URL: conect2ai-0.1.0.tar.gz
- Upload date:
- Size: 58.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cb489622dbe662c8f52c23c2b218b59d75456dcaed42be75e159637ab298b80 |
|
MD5 | c3a7ea5c94577197193c46debba7ebb9 |
|
BLAKE2b-256 | 10c44142d7b08dbdcf2ce06d31d267b2938e15c38a3a01bc1caf2d32272050e4 |
File details
Details for the file conect2ai-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: conect2ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7080726c47c32dd5549cee2143333d4c17cd7177e414e88edd86a0779a1efbb |
|
MD5 | e7a06e0765d1efef46878742c0a6a0f8 |
|
BLAKE2b-256 | e99e148274dd6460fc87c59cde48b2a27629d59d03fd2f322c0569ffb0b8387d |