Skip to main content

Python script for managing categorical data in machine learning, ensuring proper encoding and handling of unseen labels.

Project description

Categorical Data Encoding for Machine Learning:

This repository contains a Python script for managing categorical data in preparation for machine learning tasks. It includes functions for encoding categorical data using LabelEncoder, decoding the encoded data, and handling unseen labels in the data. These functions help ensure that categorical data is properly processed and encoded for machine learning models.

Key Features:

text_to_numbers: Encode categorical columns using LabelEncoder and save the encoders for later use.

re_encode: Re-encode categorical columns, handling unseen labels and maintaining consistency.

remove_row_fit: Remove rows with unseen labels in a specific categorical column and re-encode the data.

numbers_to_text: Decode previously encoded categorical columns to their original values.

Usage:

You can use these functions to preprocess and manage categorical data in your machine learning projects. They simplify the procedure of storing encoder files for individual datasets, allowing you to reuse them in the future. The script is designed to make it easier to work with categorical data, especially when dealing with unseen labels and encoding consistency.

For more understanding reffer to the docstrings for detailed understanding on each function.

Changelog

0.0.1(30/10/2023)

  • First Release

0.0.2(30/10/2023)

  • Second Release: Updated code.

0.0.3(30/10/2023)

  • Third Release: Updated functions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LeEncoderML-0.0.3.tar.gz (3.7 kB view details)

Uploaded Source

File details

Details for the file LeEncoderML-0.0.3.tar.gz.

File metadata

  • Download URL: LeEncoderML-0.0.3.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for LeEncoderML-0.0.3.tar.gz
Algorithm Hash digest
SHA256 b943110e668e9fc00e1e06417bb0be3a321219216100c59e8082e3a66e0c89c3
MD5 bc08459d4b32f8d13f4d794df6082004
BLAKE2b-256 5c09ac6963138362a99adcadfea30e4bcf8b2cb1038c4c187c651719cce1a0ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page