Skip to main content

My utility functions are stored here which I often use for my purposes.

Project description

msdlib


PyPI license PyPI version Build : passing




Introduction

This library was first designed for my personal usage for data visualization, processing and machine learning purposes. But then I thought it might help others to do the same things with easier ways. That thought motivated me to make new functions inside the library. Its not yet rich but I am strongly willing to make it much bigger, richer and helpful for all.

Dependencies

This library mostly works on top of pandas, numpy, matplotlib and some other several packages. I am providing a list here-

  • Numpy
  • Pandas
  • Matplotlib
  • Scipy
  • Seaborn
  • Pytorch (for mlutils package)
  • Its best to install the most updated versions of these dependencies

    Installation

    pip install msdlib

    or if you have --user related issues during installation, please use

    pip install --user msdlib

    License

    MIT open source License has been issued for this library.

    Examples

    You can find easy examples on how to use the functions and classes from this library here. Please download the used data from here.

    Documentation

    Complete documentation of classes and functions can be found here. The list is alphabetically ordered.

    Overview

    The whole library can be divided into 4 main portions.

    1. Visualization tools
    2. Data processing tools
    3. Machine learning tools
    4. Miscellaneous

    Some of the frequently used programs are shown bellow.

    Visualization Tools

    data_gridplot:

    Its a function for scatter plots between every pair of features along with distributions (similar to matrix_plot in pandas). But it enables you to save the image, change figure_size, titles etc and also has one special feature for clusters in the data if any.


    plot_time_series:

    This is a function and the most useful function for me from all my library functions and classes. It helps to plot time series data with a lot of flexibility. Please check out the example scripts for illustrations and guidance to use it.

    plot_heatmap:

    Flexible heatmap plotter function with options to remove symmetrical triangular side and many other options.

    get_color_from_cmap:

    This function creates colors from a specified matplotlib colormap

    Data Processing Tools

    Filters:

    This is a class defined for applying low pass, high pass, band pass and band stop filters. It also enables us to visualize frequency domain of the signal, designed filter and also let us visualize the filtered signal if we apply a filter on the signal.

    get_spectrogram:

    This is a function that allows us to calculate spectrogram of any time series signal and also plots heatmap for that spectrogram with proper frequency bins and time axis.

    grouped_mode:

    This function calculates the mode for grouped data. It iterates over different number of groupings and tries to find the most accurate mode value for a grouped data. It also have supports to ignore one or more values when calculating mode.

    get_edges_from_ts:

    This function finds edges of a time series signal if we apply threshold to remove parts of the signal. It also provides the duration and interval of the edges.

    moving_slope:

    This is a function intended to calculate moving/rolling slope using linear regression on chunks of the signal. Quite capable to handle nan data and data missing problem so no need to worry about that.

    standardize:

    standardization function

    normalize:

    normalization function



    Machine Learning Tools

    feature_evaluator:

    This function is one of the most useful tools. It can calculate feature importance from statistical point of view with applying any tree based approach. It can show the results using bar plot and can handle classification and regression both kind of labels.

    class_result:

    This function calculates classification evaluation parameters like precision, recall, accuracy, f1 score etc and also able to show confusion matrix as a pandas dataframe.

    rsquare_rmse:

    This function calculates r square value and root mean square error.

    one_hot_encoding:

    This function converts classification labels in one hot encoded format

    SplitDataset:

    This is one of the most useful classes in this library. It enables us to split data set into train, validation and test sets. We have three options here to split data set.

  • random_split
  • cross_validation_split
  • sequence_split (specially necessary for RNN)


  • Miscellaneous

    ProgressBar

    This is a custom progress bar which shows loop progress with a visual bar along with other information like elapsed and remaining time, loop count, total count, percentage of completion etc. (You should only use it if you dont print anything inside your loop)

    name_separation

    This function helps to create new line is the number of character exceeds maximum length in one line. Its very useful for plotting in matplotlib with large names in axis labels or titles.

    Project details


    Download files

    Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

    Source Distribution

    msdlib-0.1.2.5.tar.gz (3.1 MB view hashes)

    Uploaded Source

    Built Distribution

    msdlib-0.1.2.5-py3-none-any.whl (30.5 kB view hashes)

    Uploaded Python 3

    Supported by

    AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page