Skip to main content

categorical encoding for featuretools

Project description

categorical-encoding

CircleCI

categorical-encoding is a Python library for encoding categorical data, intended for use with Featuretools. categorical-encoding allows for seamless encoding of data and integration into Featuretools pipeline for automated feature engineering within the machine learning pipeline.

Install

python -m pip install "featuretools[categorical_encoding]"

Description

Install Demo Guide Requirements

python -m pip install demo-requirements.txt

For more general questions regarding how to use categorical encoding in a machine learning pipeline, consult the guides located in the categorical encoding github repository.

>>> feature_matrix
    product_id  purchased  value countrycode
id
0    coke zero       True    0.0          US
1    coke zero       True    5.0          US
2    coke zero       True   10.0          US
3          car       True   15.0          US
4          car       True   20.0          US
5   toothpaste       True    0.0          AL

Integrates into standard procedure of train/test split within applied machine learning processes.

>>> train_data = feature_matrix.iloc[[0, 1, 4, 5]]
>>> train_data
    product_id  purchased  value countrycode
id
0    coke zero       True    0.0          US
1    coke zero       True    5.0          US
4          car       True   20.0          US
5   toothpaste       True    0.0          AL
>>> test_data = feature_matrix.iloc[[2, 3]]
>>> test_data
   product_id  purchased  value countrycode
id
2   coke zero       True   10.0          US
3         car       True   15.0          US
>>> import categorical_encoding as ce
>>> encoder = ce.Encoder(method='leave_one_out')
>>> train_enc = encoder.fit_transform(train_data, features, train_data['value'])
>>> test_enc = encoder.transform(test_data)

Encoder fits and transforms to train data, and then transforms test data using its learned fitted encoding.

>>> train_enc
    PRODUCT_ID_leave_one_out  purchased  value  COUNTRYCODE_leave_one_out
id
0                       5.00       True    0.0                      12.50
1                       0.00       True    5.0                      10.00
4                       6.25       True   20.0                       2.50
5                       6.25       True    0.0                       6.25
>>> test_enc
    PRODUCT_ID_leave_one_out  purchased  value  COUNTRYCODE_leave_one_out
id
2                       2.50       True   10.0                   8.333333
3                       6.25       True   15.0                   8.333333

Supports easy integration into Featuretools through its support and use of features. First, learn features through fitting an encoder to data. Then, when new data comes in, easily prepare it for your trained machine learning model by using those features to seamlessly generate new tables of encoded data.

>>> features = encoder.get_features()
[<Feature: PRODUCT_ID_leave_one_out>,
 <Feature: purchased>,
 <Feature: value>,
 <Feature: COUNTRYCODE_leave_one_out>]
>>> features_encoded = enc.get_features()
>>> fm2_encoded = ft.calculate_feature_matrix(features_encoded, es, instance_ids=[6,7])
>>> fm2_encoded
    PRODUCT_ID_leave_one_out  purchased  value  COUNTRYCODE_leave_one_out
id
6                       6.25       True    1.0                       6.25
7                       6.25       True    2.0                       6.25

Feature Labs

Featuretools

categorical-encoding is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

categorical_encoding-0.4.1.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

categorical_encoding-0.4.1-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file categorical_encoding-0.4.1.tar.gz.

File metadata

  • Download URL: categorical_encoding-0.4.1.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for categorical_encoding-0.4.1.tar.gz
Algorithm Hash digest
SHA256 d5776006bf4541e5aafb887199efd3aa45a71c1503a0c574d5cb38fd5dd334f3
MD5 a1d16fcea9ab1482101977bdfdf6007c
BLAKE2b-256 60767278c974ae1403c95ebe00d8888bfaf12cee0c9ad3582fa544e242cdacf8

See more details on using hashes here.

File details

Details for the file categorical_encoding-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: categorical_encoding-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for categorical_encoding-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5689715c526c9481f549f0c069807b7f95ff766bf953197680332f45bcd2aba3
MD5 9efd94ba608b88edc7f8898e6599fca4
BLAKE2b-256 610778dc49c63829a16363e189f90f99c88b4f0e711fb6162701b8b64c17b225

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page