Skip to main content

Impute missing values using Lightgbm

Project description

imputepy

Impute missing values using Lightgbm.

Installation

pip install imputepy

Features

  • Automated Imputation: Utilizes LightGBM models to impute missing values, selecting between regression and classification models based on the column's data type.
  • Flexible Column Exclusion: Allows specific columns to be excluded from the imputation process.
  • Dynamic Filtering for Categorical Columns: Filters categorical columns based on a specified upper limit of unique values to enhance efficiency.
  • Customizable Thresholds for Categorical Detection: Enables setting custom thresholds for unique value counts to refine which columns are considered categorical.
  • Comprehensive Imputation Strategy: Combines missing value identification, column type determination, and the application of LightGBM models for effective imputation.
  • Direct Imputation into Original DataFrame: Imputes missing values directly into the original DataFrame, maintaining the data structure for seamless data preprocessing integration.

Usage

from imputepy import LGBMimputer
import pandas as pd
import numpy as np

df = pd.read_csv('data/df.csv')
df_imp = LGBMimputer(df, filter=True, exclude=None, filter_upper_limit=50, unique_count_limit=15)

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

imputepy was created by Sam Fo. It is licensed under the terms of the MIT license.

Credits

imputepy was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imputepy-1.0.0.tar.gz (4.2 kB view hashes)

Uploaded Source

Built Distribution

imputepy-1.0.0-py3-none-any.whl (4.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page