Skip to main content

No project description provided

Reason this release was yanked:

outdated

Project description

Classifier Toolkit

Poetry Linting - Ruff Code style - Black

This is a new project.


Table of Content

  1. Installation
  2. Usage
  3. Modules Overview
  4. Future Work

Installation

This library is published in the PyPI directory. To install, users can run pip install 'classifier_toolkit' command.

Usage

This library automates binary classification tasks in the finance domain, specifically for default and fraud labeling. It includes several packages designed to address the main steps in any machine learning/data science task:

  1. EDA: which is accessible by EDA_Toolkit. This package provides the EDA and feature engineering functionality alongside with all the necessary visualizations.
  2. Feature Selection: To be implemented.
  3. Model fitting and hyperparameter tuning: To be implemented.
  4. Evaluation and reporting: To be implemented.

In the future, the package architectures will be included here. However, for now please consult the docstrings in the specific methods in the relevant modules.

Note: that this library does not contain data wrangling steps (although it contains feature engineering), it's an intermediate step between EDA and feature engineering where users should fix any data quality related issues. Therefore, conducting the EDA is crucial to mitigate any issues before moving onto the feature engineering and the subsequent steps.

Modules Overview

  • EDA Toolkit: This module includes classes and methods for performing comprehensive exploratory data analysis. It provides automated warnings for data quality issues, univariate and bivariate analysis, and various data visualizations to help understand the dataset.

  • Univariate Analysis: This class focuses on the analysis of individual variables. It includes methods for calculating statistical measures, visualizing distributions, and assessing relationships between variables and a target through techniques like Cramer's V and Information Value. This helps in understanding the significance and distribution of each feature independently.

  • Bivariate Analysis: This class deals with the analysis of two variables to understand their relationship. It includes functionalities for generating correlation heatmaps, performing ANOVA tests between numerical and categorical variables, and computing pairwise Cramer's V for categorical features. This aids in identifying patterns and correlations between pairs of variables, which is crucial for feature selection and engineering.

  • Feature Engineering: This module assists in transforming features, handling missing values, encoding categorical variables, and more. It aims to enhance the dataset's quality for better model performance.

  • Visualizations: This module offers a wide range of plotting capabilities to visually analyze data distributions, relationships, and other crucial aspects of the dataset.

  • Automated Warnings: A utility to automatically check the dataset for common issues such as missing or duplicate values, outliers, and more, providing warnings to guide data cleaning efforts.

  • Feature Selection: This module provides various feature selection techniques:

    • Embedded Methods: Includes ElasticNet for regularization-based feature selection.
    • Wrapper Methods:
      • Recursive Feature Elimination (RFE) with support for various ensemble methods (Random Forest, XGBoost, LightGBM, CatBoost).
      • Sequential Feature Selection (forward, backward, floating, and bidirectional).
    • Meta Selector: Combines multiple feature selection methods to provide a robust selection.
    • Utility Functions: Includes scoring functions and plotting utilities for feature importance visualization.

Future Work

The next planned improvements and additions to the library include:

  • Adding model fitting and hyperparameter tuning functionalities.
  • Developing comprehensive evaluation and reporting tools to assist with model assessment.
  • Expanding documentation to include architecture diagrams and detailed usage examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

classifier_toolkit-0.2.0.tar.gz (59.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

classifier_toolkit-0.2.0-py3-none-any.whl (71.3 kB view details)

Uploaded Python 3

File details

Details for the file classifier_toolkit-0.2.0.tar.gz.

File metadata

  • Download URL: classifier_toolkit-0.2.0.tar.gz
  • Upload date:
  • Size: 59.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Darwin/24.1.0

File hashes

Hashes for classifier_toolkit-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e1e8c9283bb8c554fb28dd2d23e243070df25ab529b404c849bc8e539aa35d4c
MD5 ed24317c8ed2c1e370a1c4d81d00573d
BLAKE2b-256 7ebf9bb065abb68e29ea1303d0c18b6395cf9d038def2c77d7582059cf4e134a

See more details on using hashes here.

File details

Details for the file classifier_toolkit-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: classifier_toolkit-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 71.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Darwin/24.1.0

File hashes

Hashes for classifier_toolkit-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cdb31cbd0f2dcabe9654a4a81cd2c87016cb2ae16cd090dff0d5f8172a1be5e3
MD5 6bdc978435420ddd4f4cfc2655fab6a9
BLAKE2b-256 92ddbc7ed60223e5c53792a4f9ee9a4844cb1095b9773dca79838992c1e38506

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page