Skip to main content

No project description provided

Project description

PyPI: classifier-toolkit

pip install classifier-toolkit

Classifier Toolkit

Poetry Linting - Ruff Code style - Black

This is a new project.


Table of Content

  1. Installation
  2. Usage
  3. Modules Overview
  4. Future Work

Installation

This library is published in the PyPI directory. To install, users can run pip install 'classifier_toolkit' command.

Usage

This library automates binary classification tasks in the finance domain, specifically for default and fraud labeling. It includes several packages designed to address the main steps in any machine learning/data science task:

  1. EDA: which is accessible by EDA_Toolkit. This package provides the EDA and feature engineering functionality alongside with all the necessary visualizations.
  2. Feature Selection: To be implemented.
  3. Model fitting and hyperparameter tuning: To be implemented.
  4. Evaluation and reporting: To be implemented.

In the future, the package architectures will be included here. However, for now please consult the docstrings in the specific methods in the relevant modules.

Note: that this library does not contain data wrangling steps (although it contains feature engineering), it's an intermediate step between EDA and feature engineering where users should fix any data quality related issues. Therefore, conducting the EDA is crucial to mitigate any issues before moving onto the feature engineering and the subsequent steps.

Modules Overview

  • EDA Toolkit: This module includes classes and methods for performing comprehensive exploratory data analysis. It provides automated warnings for data quality issues, univariate and bivariate analysis, and various data visualizations to help understand the dataset.

  • Univariate Analysis: This class focuses on the analysis of individual variables. It includes methods for calculating statistical measures, visualizing distributions, and assessing relationships between variables and a target through techniques like Cramer's V and Information Value. This helps in understanding the significance and distribution of each feature independently.

  • Bivariate Analysis: This class deals with the analysis of two variables to understand their relationship. It includes functionalities for generating correlation heatmaps, performing ANOVA tests between numerical and categorical variables, and computing pairwise Cramer's V for categorical features. This aids in identifying patterns and correlations between pairs of variables, which is crucial for feature selection and engineering.

  • Feature Engineering: This module assists in transforming features, handling missing values, encoding categorical variables, and more. It aims to enhance the dataset's quality for better model performance.

  • Visualizations: This module offers a wide range of plotting capabilities to visually analyze data distributions, relationships, and other crucial aspects of the dataset.

  • Automated Warnings: A utility to automatically check the dataset for common issues such as missing or duplicate values, outliers, and more, providing warnings to guide data cleaning efforts.

  • Feature Selection: This module provides various feature selection techniques:

    • Embedded Methods: Includes ElasticNet for regularization-based feature selection.
    • Wrapper Methods:
      • Recursive Feature Elimination (RFE) with support for various ensemble methods (Random Forest, XGBoost, LightGBM, CatBoost).
      • Sequential Feature Selection (forward, backward, floating, and bidirectional).
    • Meta Selector: Combines multiple feature selection methods to provide a robust selection.
    • Utility Functions: Includes scoring functions and plotting utilities for feature importance visualization.

Future Work

The next planned improvements and additions to the library include:

  • Adding model fitting and hyperparameter tuning functionalities.
  • Developing comprehensive evaluation and reporting tools to assist with model assessment.
  • Expanding documentation to include architecture diagrams and detailed usage examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

classifier_toolkit-0.2.2.tar.gz (80.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

classifier_toolkit-0.2.2-py3-none-any.whl (97.6 kB view details)

Uploaded Python 3

File details

Details for the file classifier_toolkit-0.2.2.tar.gz.

File metadata

  • Download URL: classifier_toolkit-0.2.2.tar.gz
  • Upload date:
  • Size: 80.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.6 Darwin/24.5.0

File hashes

Hashes for classifier_toolkit-0.2.2.tar.gz
Algorithm Hash digest
SHA256 1c49910ce1ad5ca55a03ed73d0ca5add5b12902850aebfcaac051d574d59f7d4
MD5 38e902dda9ff6fbd6100efbc697e8e6d
BLAKE2b-256 d72fb508b0df287a2ab36af5dc3c945149ae162dfdb1300fd444dacd71776558

See more details on using hashes here.

File details

Details for the file classifier_toolkit-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: classifier_toolkit-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 97.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.6 Darwin/24.5.0

File hashes

Hashes for classifier_toolkit-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e8c87d36f252218660ebbc93d3d455336f76db25e56e75932121eb84023293a3
MD5 11d316f807ede539815577a0a2079393
BLAKE2b-256 83e6ac3cbb062305bdc837016a050ce57652b8968a5646e3ce6324e52c73d2b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page