Skip to main content

Python script for managing categorical data in machine learning, ensuring proper encoding and handling of unseen labels.

Project description

Categorical Data Encoding for Machine Learning:

This repository contains a Python script for managing categorical data in preparation for machine learning tasks. It includes functions for encoding categorical data using LabelEncoder, decoding the encoded data, and handling unseen labels in the data. These functions help ensure that categorical data is properly processed and encoded for machine learning models.

Key Features:

text_to_numbers: Encode categorical columns using LabelEncoder and save the encoders for later use.

re_encode: Re-encode categorical columns, handling unseen labels and maintaining consistency. Remove rows with unseen labels in a specific categorical column and re-encode the data.

numbers_to_text: Decode previously encoded categorical columns to their original values.

Usage:

You can use these functions to preprocess and manage categorical data in your machine learning projects. They simplify the procedure of storing encoder files for individual datasets, allowing you to reuse them in the future. The script is designed to make it easier to work with categorical data, especially when dealing with unseen labels and encoding consistency.

For more understanding reffer to the docstrings for detailed understanding on each function.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LeEncoderML-0.1.2.tar.gz (3.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page