Package that will clean the data, do basic EDA and provide an insight to basic models, LR and ridge
Project description
simplefit
A python package that cleans the data, does basic EDA and returns scores for basic classification and regression models
Overview
This package helps data scientists to clean the data, perform basic EDA, visualize graphical interpretations and analyse performance of the baseline model and basic Classification or Regression models, namely Logistic Regression, Ridge on their data.
Functions
Function Name | Input | Output | Description |
---|---|---|---|
cleaner | dataframe |
list of 3 dataframes | Loads and cleans the dataset, removes NA rows, strip extra white spaces, etc and returns clean dataframe |
plot_distributions | dataframe , bins , dist_cols , class_label |
Altair histogram plot object | creates numerical distribution plots on either all the numeric columns or the ones provided to it |
plot_corr | dataframe , corr |
Altair correlation plot object | creates correlation plot for all the columns in the dataframe |
plot_splom | dataframe , pair_cols |
Altair SPLOM plot object | creates SPLOM plot for all the numeric columns in the dataframe or the ones passed by the user |
regressor | train_df , target_col , numeric_feats , categorical_feats , text_col , cv |
dataframe |
Preprocesses the data, fits baseline model(Dummy Regressor ) and Ridge with default setup and returns model scores in the form of a dataframe |
classifier | train_df , target_col , numeric_feats , categorical_feats , text_col , cv |
dataframe |
Preprocesses the data, fits baseline model(Dummy Classifier ) and Logistic Regression with default setup and returns model scores in the form of a dataframe |
Our Package in the Python Ecosystem
There exists a subset of our package as standalone packages, namely auto-eda, eda-report, quick-eda, s11-classifier. But these packages only do the EDA or just the classification using XGBoostClassifier
. But with our package, we aim to do all the basic steps of a ML pipeline and save the data scientist's time and effort by cleaning, preprocessing, returning grpahical visualisations from EDA and providing an insight about the basic model performances, after which the user can decide which other models to use.
Installation
$ pip install git+https://github.com/UBC-MDS/simplefit
Usage
Contributing
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Contributors
This python package was developed by the following Master of Data Science program candidates at the University of the British Columbia:
- Mohammadreza Mirzazadeh @rezam747
- Zihan Zhou @zzhzoe
- Navya Dahiya @nd265
- Sanchit Singh @Sanchit120496
License
simplefit
was created by Reza Zoe Navya Sanchit. It is licensed under the terms of the MIT license.
Credits
simplefit
was created with cookiecutter
and the py-pkgs-cookiecutter
template.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file simplefit-0.1.4.tar.gz
.
File metadata
- Download URL: simplefit-0.1.4.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | add146563069a4e8d32cee4cc60abe97335d25ede4464065ebab899bf8e5f176 |
|
MD5 | b0b5338ef718d33e86a1eddfbad44241 |
|
BLAKE2b-256 | 6f26655ddbf9812c39f06c3c26cf807407492ffe96fa7abbcffdf85fce4d0ad5 |
File details
Details for the file simplefit-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: simplefit-0.1.4-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68a838a45160c0834a399b413168b604186f27491a2b1ee30852bc8999b924a9 |
|
MD5 | ff556921945ec7ce5dd419502fb538b8 |
|
BLAKE2b-256 | 9f2605e03da46e82f7974219313810be4a324f223185fe16a5796e08a67468cb |