Feature extraction, processing and interpretation algorithms and functions for machine learning and data science.

These details have not been verified by PyPI

Project links

Homepage

Project description

feature_stuff: a python machine learning library for advanced feature extraction, processing and interpretation.

Latest Release	see on pypi.org
Package Status	see on pypi.org
License	see on github
Build Status	see on travis

What is it

feature_stuff is a Python package providing fast and flexible algorithms and functions for extracting, processing and interpreting features:

Numeric feature extraction

feature_stuff.add_interactions	generic function for adding interaction features to a data frame either by passing them as a list or by passing a boosted trees model to extract the interactions from.
feature_stuff.target_encoding	target encoding of a feature column using exponential prior smoothing or mean prior smoothing
feature_stuff.cv_target_encoding	target encoding of a feature column taking cross-validation folds as input
feature_stuff.add_knn_values	creates a new feature with the K-nearest-neighbours of the values of a given feature
feature_stuff.model_features_insights_extractions.add_group_values	generic and memory efficient enrichment of features dataframe with group values

Model feature insights extraction

get_xgboost_interactions

takes a trained xgboost model and returns a list of interactions between features, to the order of maximum depth of all trees.

Installation

Binary installers for the latest released version are available at the Python package index .

# or PyPI
pip install feature_stuff

The source code is currently hosted on GitHub at: https://github.com/hiflyin/Feature-Stuff

Installation from sources

In the Feature-Stuff directory (same one where you found this file after cloning the git repo), execute:

python setup.py install

or for installing in development mode:

python setup.py develop

Alternatively, you can use pip if you want all the dependencies pulled in automatically (the -e option is for installing it in development mode):

pip install -e .

How to use it

Below are examples for some functions. See the attached API of each function/ algorithm, for a complete documentation.

feature_stuff.add_interactions

Inputs:
    df: a pandas dataframe
    model: boosted trees model (currently xgboost supported only). Can be None in which case the interactions have
    to be provided
    interactions: list in which each element is a list of features/columns in df, default: None

Output: df containing the group values added to it

Example on extracting interactions from tree based models and adding them as new features to your dataset.

import feature_stuff as fs
import pandas as pd
import xgboost as xgb

data = pd.DataFrame({"x0":[0,1,0,1], "x1":range(4), "x2":[1,0,1,0]})
print data
   x0  x1  x2
0   0   0   1
1   1   1   0
2   0   2   1
3   1   3   0

target = data.x0 * data.x1 + data.x2*data.x1
print target.tolist()
[0, 1, 2, 3]

model = xgb.train({'max_depth': 4, "seed": 123}, xgb.DMatrix(data, label=target), num_boost_round=2)
fs.addInteractions(data, model)

# at least one of the interactions in target must have been discovered by xgboost
print data
   x0  x1  x2  inter_0
0   0   0   1        0
1   1   1   0        1
2   0   2   1        0
3   1   3   0        3

# if we want to inspect the interactions extracted
from feature_stuff import model_features_insights_extractions as insights
print insights.get_xgboost_interactions(model)
[['x0', 'x1']]

feature_stuff.target_encoding

Inputs:
    df: a pandas dataframe containing the column for which to calculate target encoding (categ_col)
    ref_df: a pandas dataframe containing the column for which to calculate target encoding and the target (y_col)
        for example we might want to use train data as ref_df to encode test data
    categ_col: the name of the categorical column for which to calculate target encoding
    y_col: the name of the target column, or target variable to predict
    smoothing_func: the name of the function to be used for calculating the weights of the corresponding target
        value inside ref_df. Default: exponentialPriorSmoothing.
    aggr_func: the statistic used to aggregate the target variable values inside each category of the categ_col
    smoothing_prior_weight: a prior weight to put on each category. Default 1.

Output: df containing a new column called <categ_col + "_bayes_" + aggr_func> containing the encodings of categ_col

Example on extracting target encodings from categorical features and adding them as new features to your dataset.

import feature_stuff as fs
import pandas as pd

train_data = pd.DataFrame({"x0":[0,1,0,1]})
test_data = pd.DataFrame({"x0":[1, 0, 0, 1]})
target = range(4)

train_data = fs.target_encoding(train_data, train_data, "x0", target, smoothing_func=fs.exponentialPriorSmoothing,
                                        aggr_func="mean", smoothing_prior_weight=1)
test_data = fs.target_encoding(test_data, train_data, "x0", target, smoothing_func=fs.exponentialPriorSmoothing,
                                        aggr_func="mean", smoothing_prior_weight=1)

#train data with target encoding of "x0"
print(train_data)
   x0  y_xx  g_xx  x0_bayes_mean
0   0     0     0       1.134471
1   1     1     0       1.865529
2   0     2     0       1.134471
3   1     3     0       1.865529

#test data with target encoding of "x0"
print(test_data)
   x0  x0_bayes_mean
0   1       1.865529
1   0       1.134471
2   0       1.134471
3   1       1.865529

feature_stuff.cv_target_encoding

Inputs:
    df: a pandas dataframe containing the column for which to calculate target encoding (categ_col) and the target
    categ_cols: a list or array with the the names of the categorical columns for which to calculate target encoding
    y_col: a numpy array of the target variable to predict
    cv_folds: a list with fold pairs as tuples of numpy arrays for cross-val target encoding
    smoothing_func: the name of the function to be used for calculating the weights of the corresponding target
        value inside ref_df. Default: exponentialPriorSmoothing.
    aggr_func: the statistic used to aggregate the target variable values inside each category of the categ_col
    smoothing_prior_weight: a prior weight to put on each category. Default 1.
    verbosity: 0-none, 1-high_level, 2-detailed

Output: df containing a new column called <categ_col + "_bayes_" + aggr_func> containing the encodings of categ_col

See feature_stuff.target_encoding example above.

Contributing to feature-stuff

All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.dev6 pre-release

Jun 29, 2018

0.0.dev5 pre-release

Jun 20, 2018

0.0.dev4 pre-release

Jun 16, 2018

0.0.dev3 pre-release

Jun 16, 2018

0.0.dev2 pre-release

Jun 16, 2018

0.0.dev0 pre-release

Jun 16, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_stuff-0.0.dev6.tar.gz (10.2 kB view details)

Uploaded Jun 29, 2018 Source

Built Distribution

feature_stuff-0.0.dev6-py2.py3-none-any.whl (18.3 kB view details)

Uploaded Jun 29, 2018 Python 2Python 3

File details

Details for the file feature_stuff-0.0.dev6.tar.gz.

File metadata

Download URL: feature_stuff-0.0.dev6.tar.gz
Upload date: Jun 29, 2018
Size: 10.2 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for feature_stuff-0.0.dev6.tar.gz
Algorithm	Hash digest
SHA256	`55bc052c2080035e4cc2bda2b76a8601594a76337ffb1501e748f4bd7cfda0f0`
MD5	`07d62fbe8aeb7a8a53cd810bd014155b`
BLAKE2b-256	`36709bf7590e0bcabca8ca98d1077c58f4f1a97833850561e86ce6c093076e55`

See more details on using hashes here.

File details

Details for the file feature_stuff-0.0.dev6-py2.py3-none-any.whl.

File metadata

Download URL: feature_stuff-0.0.dev6-py2.py3-none-any.whl
Upload date: Jun 29, 2018
Size: 18.3 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for feature_stuff-0.0.dev6-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`2d5e78c0b7bbc09f0e8083172812e41029b51e6afe13f0e67c118e68e69ec25e`
MD5	`42142d7ecee5303a0a627d8a53f02e92`
BLAKE2b-256	`fd7d69b3d1487c7fb8bdaa58a471d1fd4db3b9e95d01a2a39caaabf151d71d4e`

See more details on using hashes here.

feature-stuff 0.0.dev6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

feature_stuff: a python machine learning library for advanced feature extraction, processing and interpretation.

What is it

Installation

Installation from sources

How to use it

feature_stuff.add_interactions

feature_stuff.target_encoding

feature_stuff.cv_target_encoding

Contributing to feature-stuff

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes