Skip to main content

This is a Machine Learning Tool Developed to Help Non Tech Students and Non Coders to try ML for there Data

Project description

JustDataML :Simplified Machine Learning for Everyone

Welcome to JustDataML, a user-friendly tool designed to make machine learning accessible to all, regardless of technical background. JustDataML automates the process of model selection, data preprocessing, and prediction, allowing you to focus on insights rather than coding complexities.

Authors

Features

  • Automated Model Selection
  • Streamlined Preprocessing
  • Quick Predictions
  • Flexible Configuration
  • Hyperparameter Tuning
  • Prediction accurately

Requirements

to Run this Tool you will Require Below this thing to do

Installation

Suggestion: Please Use a Virtual env for this to succesfully run it

  1. Install my Application with pip install
  pip install JustDataML

and Now you can use it as

   JDML --Config <Data Config file> --Train --Predict <test.csv> --Output <Output Name>
  1. Install by cloning this repository
  git clone https://github.com/jaideepsinhdabhi/JustDataML.git

Go into the Repo Folder

  cd JustDataML

we need to download all the required packages

  pip install -r requirements.txt

Usage/Examples CLI Tool

python JDML/JDML.py -- Config <Data Config File> --Train --Predict <test.csv> --Output <Output Name>

Arguments

  • -C, --Config: [Compulsory] You need to provide a configuration file to obtain all the necessary arguments.

    Note: Make sure you give proper argument mentioned in the Document.

  • -T, --Train: [Optional] If provided, it initiates model training based on the specified data_config file.

    Note : Ensure that the Data files are available in the "Data" folder.

  • -P, --Predict: [Optional] If provided, performs prediction using the trained model. Make sure not to delete or modify the artifact folder to get the results.

  • -O, --Output: [Required with -P (--Predict)] Specifies the output name for the predicted dataframe generated from the test data. Note: it will also generate Model_summary file and if Hyperparameter is given Yes in Config file then it will generate stats file for that too

Usage/Examples Python

we can also use this tool as a python library

from JDML.JDML import Just_Data_ML
import argparse

Import the Tool

arguments = argparse.Namespace(Config="Data_Config.csv",Train=True,Predict="Test.csv", Output="Output.csv")

Yes we need to give arguments like this (will fix thsi issue in next release)

jdml=Just_Data_ML(arguments)

parsing the arugments

jdml.Data_df

For checking the DataFrame you provided

jdml.Data_train()

For Model Training

jdml.Predict_test()

For Predicting the Test gvien

Voila You will have the Prediction file (Column Target_Out) in the Output Folder.

Data Configuration File Example

This is an example of a data configuration file (Data_Config.csv) used with the JustData_ML (JDML) tool. This file specifies the necessary information for training and predicting with machine learning models.

CSV Structure:

The CSV file contains the following fields:

  • Data_Name: Name of the dataset file with extension and it should be present in Data Folder (Iris.data in this example).

  • Features_Cols: Comma-separated list of feature columns in the dataset (sepal length, sepal width, petal length, petal width in this example).

  • Target_Col: Name of the target column in the dataset (class in this example).

  • Problem_Objective: Objective of the machine learning task (Classification or Regression).

  • Normalization_tech: Normalization technique to be applied (StandardScaler, MinMaxScaler, etc.).

  • Model_to_include: Models to include in the training process (ALL or specific models). (below Listed for Models)

  • HyperParamter_Yes_or_No: Indicates whether hyperparameter tuning should be performed (Yes or No).

    A sample csv file and some Data are there in Data Folder for a demo Run

    Note : It will also generate logs into a logs folder for every run please check for every Run to get more idea on that.

Available Models for Regression and Classification

Regression Models:

  1. Random Forest

    • Description: Random Forest is an ensemble learning method that operates by constructing multiple decision trees during training and outputting the mean prediction of the individual trees.
  2. Decision Tree

    • Description: Decision Tree is a non-parametric supervised learning method used for classification and regression. It works by partitioning the input space into regions and predicting the target variable based on the average of the training instances in the corresponding region.
  3. Gradient Boosting

    • Description: Gradient Boosting is a machine learning technique for regression and classification problems that builds models in a stage-wise manner and tries to fit new models to the residuals of the previous models.
  4. Linear Regression

    • Description: Linear Regression is a linear approach to modeling the relationship between a dependent variable and one or more independent variables.
  5. XGBRegressor

    • Description: XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable.
  6. Support Vector Reg

    • Description: Support Vector Regression (SVR) is a type of Support Vector Machine (SVM) algorithm that is used to predict a continuous variable.
  7. Linear Ridge

    • Description: Ridge Regression is a linear regression technique that is used to analyze multiple regression data that suffer from multicollinearity.
  8. Linear Lasso

    • Description: Lasso regression is a type of linear regression that uses shrinkage. It penalizes the absolute size of the regression coefficients.
  9. ElasticNet

    • Description: ElasticNet is a linear regression model that combines the properties of Ridge Regression and Lasso Regression.
  10. AdaBoost Regressor

    • Description: AdaBoost (Adaptive Boosting) is an ensemble learning method that combines multiple weak learners to create a strong learner.
  11. KNeighborsRegressor

    • Description: KNeighborsRegressor is a simple, non-parametric method used for regression tasks based on the k-nearest neighbors algorithm.

Classification Models:

  1. Logistic Regression

    • Description: Logistic Regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome.
  2. Ridge Classification

    • Description: Ridge Classifier is a classifier that uses Ridge Regression to classify data points.
  3. GaussianNB

    • Description: Gaussian Naive Bayes is a simple probabilistic classifier based on applying Bayes' theorem with strong independence assumptions between the features.
  4. KNeighborsClassifier

    • Description: KNeighborsClassifier is a simple, instance-based learning algorithm used for classification tasks based on the k-nearest neighbors algorithm.
  5. Decision Tree Classifier

    • Description: Decision Tree Classifier is a non-parametric supervised learning method used for classification.
  6. Random Forest Classifier

    • Description: Random Forest Classifier is an ensemble learning method for classification that operates by constructing multiple decision trees during training and outputting the class that is the mode of the classes (classification) of the individual trees.
  7. Support Vector Classifier

    • Description: Support Vector Classifier (SVC) is a type of Support Vector Machine (SVM) algorithm that is used for classification tasks.
  8. AdaBoost Classifier

    • Description: AdaBoost Classifier is an ensemble learning method that combines multiple weak learners to create a strong learner.
  9. Gradient Boosting Classifier

    • Description: Gradient Boosting Classifier is a machine learning technique for classification problems that builds models in a stage-wise manner and tries to fit new models to the residuals of the previous models.
  10. XGBClassifier

    • Description: XGBoost Classifier is an optimized distributed gradient boosting library designed for classification problems.

These are the available regression and classification models supported by the JDML tool. You can use them for training and prediction based on your specific machine learning tasks.

Feedback

If you have any feedback or suggestions , please reach out to us at jaideep.dabhi7603@gmail.com

Acknowledgements

Hi, I'm Jaideepsinh Dabhi (jD)! 👋

🚀 About Me

🚀 Data Scientist | Analytics Enthusiast | Python Aficionado 🐍
I'm a Data Scientist working in a BioTech Industry.
I am based out of India 🇮🇳
I love to code in python ,Bash and R
I have a strong base in statistics and Machine learning
I am passionate about networking and fostering meaningful connections within the tech community. Feel free to reach out if you'd like to discuss Data Science 👨🏻‍💻, Machine Learning 🦾, Chess ♞ or Pens 🖋️

🔗 Links

GitHub linkedin

License

This project is licensed under the Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

JustDataML-0.0.26.tar.gz (17.4 kB view hashes)

Uploaded Source

Built Distribution

JustDataML-0.0.26-py3-none-any.whl (20.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page