Skip to main content

A machine learning package for preprocessing and model training.

Project description

Machine Learning Feature Engineering and Modeling Toolkit

Overview

This repository contains a comprehensive set of tools and functions for data preprocessing, feature engineering, and model training. It supports both regression and classification tasks with hyperparameter tuning using Optuna, ensuring efficient and accurate results.


Features

  1. Anomaly Detection: Detects outliers using the Interquartile Range (IQR) method and replaces them with NaN.
  2. Missing Value Handling: Imputes or drops missing values with customizable strategies (mean, median, etc.).
  3. Scaling and Transformation: Provides multiple scaling methods (StandardScaler, MinMaxScaler), power transformations (Yeo-Johnson, Box-Cox), and log transformations.
  4. Feature Engineering:
    • Removes low-variance features.
    • Eliminates highly correlated features based on thresholds.
  5. Regression Models: Supports Linear Regression, Random Forest, XGBoost, and SVR with hyperparameter tuning.
  6. Classification Models: Includes Logistic Regression, Random Forest, and XGBoost with hyperparameter optimization.
  7. Customizable Parameters: Allows easy configuration for preprocessing, modeling, and evaluation.

Usage

1. Anomaly Detection

Detect anomalies in the dataset using the Interquartile Range (IQR) method and replace them with NaN.

2. Handle Missing Values

Impute or drop missing values from the dataset using the specified strategy (e.g., mean, median, etc.).

3. Scaling and Transformation

Apply various scaling methods and transformations (e.g., StandardScaler, MinMaxScaler, PowerTransformer, etc.).

4. Feature Engineering

Remove low-variance features and highly correlated features from the dataset.

5. Regression Modeling

Train and evaluate multiple regression models with hyperparameter tuning using Optuna.

6. Classification Modeling

Train and evaluate multiple classification models with hyperparameter optimization using Optuna.

Dependencies

This project requires the following Python libraries:

  • pandas
  • numpy
  • scikit-learn
  • seaborn
  • xgboost
  • optuna

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Rabbi-0.1.0.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Rabbi-0.1.0-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file Rabbi-0.1.0.tar.gz.

File metadata

  • Download URL: Rabbi-0.1.0.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for Rabbi-0.1.0.tar.gz
Algorithm Hash digest
SHA256 479c3be7b18f351d6e7d1c8b5426fcba35e510dc9c8d5e4cb560e69dd15fe7cf
MD5 dfb38a4015eb002a2a1c892b968a0310
BLAKE2b-256 5e638486cd10ff119e274003016d5a17b9036118faa6eb427208672e6a974932

See more details on using hashes here.

File details

Details for the file Rabbi-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: Rabbi-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for Rabbi-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 159e3c537ab5de2286dbcc870cb6057e70871321ff59a37604c5a5da4695161d
MD5 52b51060d017c6087118b2c4b12057a5
BLAKE2b-256 08db3b89c4dde76108d8505b0571b56b15ce1257ea373bb29aa5792b4fc7bc3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page