My utility functions are stored here which I often use for my purposes.
Project description
msdlib
Introduction
This library was first designed for my personal usage for data visualization, processing and machine learning purposes. But then I thought it might help others to do the same things with easier ways. That thought motivated me to make new functions inside the library. Its not yet rich but I am strongly willing to make it much bigger, richer and helpful for all.
Dependencies
This library mostly works on top of pandas, numpy, matplotlib and some other several packages. I am providing a list here-
Its best to install the most updated versions of these dependencies
Installation
pip install msdlib
or if you have --user related issues during installation, please use
pip install --user msdlib
License
MIT open source License has been issued for this library.
Examples
You can find easy examples on how to use the functions and classes from this library here.
Please download the used data from here.
Documentation
Complete documentation of classes and functions can be found here. The list is alphabetically ordered.
Overview
The whole library can be divided into 4 main portions.
- Visualization tools
- Data processing tools
- Machine learning tools
- Miscellaneous
Some of the frequently used programs are shown bellow.
Visualization Tools
data_gridplot:
Its a function for scatter plots between every pair of features along with distributions (similar to matrix_plot in pandas). But it enables you to save the image, change figure_size, titles etc and also has one special feature for clusters in the data if any.
plot_time_series:
This is a function and the most useful function for me from all my library functions and classes. It helps to plot time series data with a lot of flexibility. Please check out the example scripts for illustrations and guidance to use it.
plot_heatmap:
Flexible heatmap plotter function with options to remove symmetrical triangular side and many other options.
get_color_from_cmap:
This function creates colors from a specified matplotlib colormap
Data Processing Tools
Filters:
This is a class defined for applying low pass, high pass, band pass and band stop filters. It also enables us to visualize frequency domain of the signal, designed filter and also let us visualize the filtered signal if we apply a filter on the signal.
get_spectrogram:
This is a function that allows us to calculate spectrogram of any time series signal and also plots heatmap for that spectrogram with proper frequency bins and time axis.
grouped_mode:
This function calculates the mode for grouped data. It iterates over different number of groupings and tries to find the most accurate mode value for a grouped data. It also have supports to ignore one or more values when calculating mode.
get_edges_from_ts:
This function finds edges of a time series signal if we apply threshold to remove parts of the signal. It also provides the duration and interval of the edges.
moving_slope:
This is a function intended to calculate moving/rolling slope using linear regression on chunks of the signal. Quite capable to handle nan data and data missing problem so no need to worry about that.
standardize:
standardization function
normalize:
normalization function
Machine Learning Tools
feature_evaluator:
This function is one of the most useful tools. It can calculate feature importance from statistical point of view with applying any tree based approach. It can show the results using bar plot and can handle classification and regression both kind of labels.
class_result:
This function calculates classification evaluation parameters like precision, recall, accuracy, f1 score etc and also able to show confusion matrix as a pandas dataframe.
rsquare_rmse:
This function calculates r square value and root mean square error.
one_hot_encoding:
This function converts classification labels in one hot encoded format
SplitDataset:
This is one of the most useful classes in this library. It enables us to split data set into train, validation and test sets. We have three options here to split data set.
Miscellaneous
ProgressBar
This is a custom progress bar which shows loop progress with a visual bar along with other information like elapsed and remaining time, loop count, total count, percentage of completion etc. (You should only use it if you dont print anything inside your loop)
name_separation
This function helps to create new line is the number of character exceeds maximum length in one line. Its very useful for plotting in matplotlib with large names in axis labels or titles.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.