msdlib is meant for making life easier of a common data scientist/data analyst/ML enginner.
Project description
msdlib
Introduction
The main purpose of this library is to make data science works easier and simpler with less amount of coding, providing helper functions for plotting, ML training, evaluation, result summarization etc. The purpose is to focus more on making common tasks easier so that a beginner to mid level developer is able to do his/her jobs easily and can get started career with enough pace.
Dependencies
- Numpy
- Pandas
- Matplotlib
- Scipy
- Seaborn
- joblib
- Pytorch
- tensorboard
All of these packages except Pytorch will be installed automatically during msdlib installation. Python >= 3.8 is necessary.
Pytorch should be installed by following installation procedure suggested https://pytorch.org/.
Installation
pip install msdlib
or if you have --user related issues during installation, please use
pip install --user msdlib
License
MIT open source License has been issued for this library.
Examples
You can find easy examples on how to use the functions and classes from this library here. Necessary data is also provided in this directory.
Documentation
Complete documentation of classes and functions can be found here https://msdlib.readthedocs.io/.
Call for contributions
We seek active participation of enthusiastic developers from around the world to enrich this library more, adding more functionalities from different aspects,
giving more flexibility, completing unfinished functionalities and maintain the library in regular manner.
We would be grateful for your invaluable suggestions and participations.
Overview
The whole library can be divided into 4 main portions.
- Machine learning tools
- Visualization tools
- Data processing tools
- Fintech and Miscellaneous
Some of the frequently used programs are shown bellow.
1. Machine Learning Tools
mlutils:
This module provides functionalities for easier implementation of Pytorch Deep Learning models. It offers several facilities such as-
* Scikit-like easy implementation of Pytorch models using fit, predict and evaluate methods
* Constructing Deep Learning models in a few lines of code
* Producing automated results with beautiful tables having precision, recall, f1_score, accuracy and specificity in classification problems
* Producing automated graphs of true-vs-prediction and result preparation for regression model
Examples are available for regression, binary and multi-class classification models here.
paramOptimizer:
This is a class which can conduct easy Hyper-parameter optimization process.
Currently it enables us to apply grid search and random search for any model/function/mathematical entity
SplitDataset:
This is one of the most useful classes in this library. It enables us to split data set into train, validation and test sets. We have three options here to split data set-
* random_split
* cross_validation_split
* sequence_split (specially necessary for RNN/LSTM)
Examples are available here.
one_hot_encoding:
This function converts classification labels in one hot encoded format
feature_evaluator:
This function is one of the most useful tools. It can calculate feature importance from statistical point of view. It can show the results using bar plot and can handle classification and regression both kind of labels.
class_result:
This function calculates classification model evaluation parameters like precision, recall, accuracy, f1 score, specificity etc. and also able to show confusion matrix as a pandas dataframe. Example can be found here.
rsquare_rmse:
This function calculates r square value and root mean square error.
2. Visualization Tools
data_gridplot:
Its a function for scatter plots between every pair of features along with distributions (similar to matrix_plot in pandas). But it enables you to save the image, change figure_size, titles etc and also has one special feature for clusters in the data if any. Example can be found here.
plot_time_series:
This is one of the the most useful functions in this library. It helps to plot time series data with a lot of flexibility. Please check out the example scripts for illustrations and guidance to use it. Example can be found here.
plot_heatmap:
Flexible heatmap plotter function with options to remove symmetrical triangular side and several other options.
3. Data Processing Tools
Filters:
This is a class defined for applying low pass, high pass, band pass and band stop filters. It also enables us to visualize frequency domain of the signal, designed filter and also let us visualize the filtered signal if we apply a filter on the signal. Example can be found here
get_spectrogram:
This is a function that allows us to calculate spectrogram of any time series signal and also plots heatmap for that spectrogram with proper frequency bins and time axis. Example can be found here
4. Fintech & Miscellaneous
msdbacktest (under development):
This module intends to provide helper functionalities for trading automation, strategy implementation, back-testing,
evaluating strategy by different popular ratios like maximum drawdown, calmar ratio, sharpe ratio etc.
Currently only a few functionalities are available and is still under development.
ProgressBar:
This is a custom progress bar which shows loop progress with a visual bar along with other information like elapsed and remaining time, loop count, total count, percentage of completion etc. (You should only use it if you dont print anything inside your loop)
Example can be found here
Reference/citation
@manual{msdlib,
title="{msdlib}: A package for easier data science practices",
author="{Abdullah Al Masud and {msdlib Developers}}",
year=2020,
month=Jan,
url="{https://github.com/abdullah-al-masud/msdlib}"
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file msdlib-1.1.13.tar.gz
.
File metadata
- Download URL: msdlib-1.1.13.tar.gz
- Upload date:
- Size: 55.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93860613790cd4767577de174cd03b72378b6058c4c34fa8d6f930138683d702 |
|
MD5 | 79d08da4ad8731b0525968f2c0f29b90 |
|
BLAKE2b-256 | 3f71a810a235d8d7eaa8275888a394ddea10ad427dad00004c2104b5bba29454 |
File details
Details for the file msdlib-1.1.13-py3-none-any.whl
.
File metadata
- Download URL: msdlib-1.1.13-py3-none-any.whl
- Upload date:
- Size: 54.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01d27d2e75b8120b0b099f7a5566469ed7651047e861bb1c34a4e895e8703516 |
|
MD5 | 9cc8336b534427cdc4fe694f0b02b803 |
|
BLAKE2b-256 | 3e640ec678a61b5739a5ff5bb4214eed6e7dfcc7f07824bbe0749c55a5c93048 |