Skip to main content

categorical encoding for featuretools

Project description

categorical-encoding

CircleCI codecov Documentation Status

categorical-encoding is a Python library for encoding categorical data, intended for use with Featuretools. categorical-encoding allows for easy encoding of data and integration into Featuretools pipeline for automated feature engineering within the machine learning pipeline.

Install

python -m pip install "featuretools[categorical_encoding]"

Description

For more general questions regarding how to use categorical encoding in a machine learning pipeline, consult the guides located in the categorical encoding github repository.

import categorical_encoding as ce

encoder = ce.Encoder()
encoder.fit(feature_matrix, features)
fm_encoded_ordinal = encoder.transform(feature_matrix, features)
>>> feature_matrix
    product_id  purchased  value countrycode
id                                          
0    coke zero       True    0.0          US
1    coke zero       True    5.0          US
2    coke zero       True   10.0          US
3          car       True   15.0          US
4          car       True   20.0          US
5   toothpaste       True    0.0          AL
>>> fm_encoded_ordinal
    PRODUCT_ID_ordinal  purchased  value  COUNTRYCODE_ordinal
id                                                           
0                    1       True    0.0                    1
1                    1       True    5.0                    1
2                    1       True   10.0                    1
3                    2       True   15.0                    1
4                    2       True   20.0                    1
5                    3       True    0.0                    2

Supports easy integration into Featuretools through its support and use of features. Learn features through fitting an encoder to data, and then use those features to easily generate new tables of encoded data.

>>> features = encoder.get_features()
[<Feature: PRODUCT_ID_ordinal>,
 <Feature: purchased>,
 <Feature: value>,
 <Feature: COUNTRYCODE_ordinal>]
>>> feature_matrix_2 = ft.calculate_feature_matrix(features, es)
    PRODUCT_ID_ordinal  purchased  value  COUNTRYCODE_ordinal
id                                                           
0                    1       True    0.0                    1
1                    1       True    5.0                    1
2                    1       True   10.0                    1
3                    2       True   15.0                    1
4                    2       True   20.0                    1
5                    3       True    0.0                    2

Feature Labs

Featuretools

categorical-encoding is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

categorical_encoding-0.1.0.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

categorical_encoding-0.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file categorical_encoding-0.1.0.tar.gz.

File metadata

  • Download URL: categorical_encoding-0.1.0.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.3

File hashes

Hashes for categorical_encoding-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c9538389c7d231bd1c53c5b2ded19660e4c63b384f277127fb1afddc8f874179
MD5 3126da0d1d186a52b7985baec3ccfdd3
BLAKE2b-256 31ea8dd1a6fa203a0a92434c2521fbc94ed44eaea31ac12614ce76b1cbcce2f8

See more details on using hashes here.

File details

Details for the file categorical_encoding-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: categorical_encoding-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.3

File hashes

Hashes for categorical_encoding-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0c5e78009c9a4634fdaea1696d3e26b8b87536c5eb60992a9dbf34734bae88f4
MD5 637a0e371d2ba1a8614d9bb16476fc9a
BLAKE2b-256 e77475dd2c9232ad35277b31c563aa70ea63885f77474470e99ac25c8b24ba56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page