categorical encoding for featuretools
Project description
categorical-encoding
categorical-encoding is a Python library for encoding categorical data, intended for use with Featuretools. categorical-encoding allows for easy encoding of data and integration into Featuretools pipeline for automated feature engineering within the machine learning pipeline.
Install
python -m pip install "featuretools[categorical_encoding]"
Description
For more general questions regarding how to use categorical encoding in a machine learning pipeline, consult the guides located in the categorical encoding github repository.
import categorical_encoding as ce
encoder = ce.Encoder()
encoder.fit(feature_matrix, features)
fm_encoded = encoder.transform(feature_matrix, features)
feature_matrix
product_id purchased value countrycode
id
0 coke zero True 0.0 US
1 coke zero True 5.0 US
2 coke zero True 10.0 US
3 car True 15.0 US
4 car True 20.0 US
5 toothpaste True 0.0 AL
fm_encoded
PRODUCT_ID_ordinal purchased value COUNTRYCODE_ordinal
id
0 1 True 0.0 1
1 1 True 5.0 1
2 1 True 10.0 1
3 2 True 15.0 1
4 2 True 20.0 1
5 3 True 0.0 2
Supports easy integration into Featuretools through its support and use of features. Learn features through fitting an encoder to data, and then use those features to easily generate new tables of encoded data.
>>> features = encoder.get_features()
[<Feature: PRODUCT_ID_ordinal>,
<Feature: purchased>,
<Feature: value>,
<Feature: COUNTRYCODE_ordinal>]
>>> feature_matrix_2 = ft.calculate_feature_matrix(features, es)
Feature Labs
categorical-encoding is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for categorical_encoding-0.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8c64a9fe10b84cf2c1af6c3e150dbdc49a6babe70831a1dabdbd3b4023106c8 |
|
MD5 | ed95c0c3794594ccedf6fde7b8d93739 |
|
BLAKE2b-256 | 8f5a67b6a19036d7d05af266d469e1d72613e99eda67345472e943bb1ba362d2 |
Hashes for categorical_encoding-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a32d785b159dd38ca83fc158c41810af39c648a71fc4605cbcc3791493b25617 |
|
MD5 | 601a8ec15f4c157fe3b3595b163593b5 |
|
BLAKE2b-256 | 7cee1be63369ef56b1fd584a92e6e92a65fdc5bb5f56f7bb3494856824ed028a |