Skip to main content

Label encoder backed by pandas

Project description

Pandas-powered LabelEncoder

Performance benchmark

From the test, compare to sklearn's LabelEncoder.

Total rows: 24,123,464
Scikit-learn's LabelEncoder - 13.35 seconds
Pandas-powered LabelEncoder - 2.44 seconds

Usage

Installation

pip install pandas-label-encoder

Initiation and fitting

import pandas_label_encoder as ec
from pandas_label_encoder import EncoderCategoryError

categories = ['Cat', 'Dog', 'Bird']  # can be pd.Series, np.array, list

# Fit at inititation
animal_encoder = ec.Encoder(categories)

# Fit later
animal_encoder = ec.Encoder()
animal_encoder.fit(categories)

animal_encoder.categories # ['Cat', 'Dog', 'Bird'], read-only

# Trying to use functions before assign appropiate categories will raise EncoderCategoryError
ec.Encoder().transform() # Raise EncoderCategoryError
ec.Encoder().inverse_transform() # Raise EncoderCategoryError

Transform

  • Unknown categories would be parsed as -1
  • If you want to raise an error, there are 2 validation options.
    • validation=all -- Raise EncoderError if any result is -1
    • validation=any -- Raise EncoderError if all of them are -1
from pandas_label_encoder import EncoderValidationError

animal_encoder.transform(['Cat']) # [2]
animal_encoder.transform(['Fish']) # [-1]

animal_encoder.transform(['Fish'], validation='all') # Raise EncoderValidationError
animal_encoder.transform(['Fish'], validation='any') # Raise EncoderValidationError

try:
  animal_encoder.transform(['Fish', 'Cat'], validation='all') # Raise EncoderValidationError
except EncoderError:
  print('There is an unknown animal.')

animal_encoder.transform(['Fish', 'Cat'], validation='any') # [-1, 2]

Inverse transform

  • Unknown categories would be parsed as NaN
  • If you want to raise an error, there are 2 validation options.
    • validation=all -- Raise EncoderError if any result is NaN
    • validation=any -- Raise EncoderError if all of them are NaN
from pandas_label_encoder import EncoderValidationError

animal_encoder.inverse_transform([2]) # ['Cat']
animal_encoder.inverse_transform([9]) # [NaN]

animal_encoder.inverse_transform([9], validation='all') # Raise EncoderValidationError
animal_encoder.inverse_transform([9], validation='any') # Raise EncoderValidationError

try:
  animal_encoder.inverse_transform([9, 2], validation='all') # Raise EncoderValidationError
except EncoderError:
  print('There is an unknown animal.')

animal_encoder.inverse_transform([9, 2], validation='any') # [NaN, 'Cat']

Save and load the encoder

The load_encoder and encoder.Encoder.load methods will load the encoder and check for the encoder version.

Different encoder version may have some changes that cause errors.

To check current encoder version, use encoder.Encoder.__version__.

from pandas_label_encoder import save_encoder, load_encoder

# Save or load other encoder directly from the encoder itself
animal_encoder.save(path) # save current encoder
animal_encoder.load(path) # load other encoder and assign to current encoder

# Save or load other encoder by using functions
animal_encoder = load_encoder(path)
save_encoder(path)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_label_encoder-1.0.1.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandas_label_encoder-1.0.1-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file pandas_label_encoder-1.0.1.tar.gz.

File metadata

  • Download URL: pandas_label_encoder-1.0.1.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.9.1 Darwin/21.6.0

File hashes

Hashes for pandas_label_encoder-1.0.1.tar.gz
Algorithm Hash digest
SHA256 5e21d36993b90fe85e7a679ac607c03506c4dfbd5e698521e9a22136350e73b3
MD5 42cda95b6e52a530909d5c33ef759367
BLAKE2b-256 c40063ed3f15b935d652e616e74e49377f8d0c7f2d1d816ec547127f4ef1a7ab

See more details on using hashes here.

File details

Details for the file pandas_label_encoder-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas_label_encoder-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 12c8d9bdc5a1c3fdb3686a7f4ddde99776decf24f0339144793d0e308c24a94e
MD5 9b59726841a1c64e4b4d29954b5ac2c9
BLAKE2b-256 e8a8b8714cd50a60f7bf06bcd9befb2275adeeb48316e5ed4451a4b61be7d5f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page