Skip to main content

Feature extraction, processing and interpretation algorithms and functions for machine learning and data science.

Project description


feature_stuff: a python machine learning library for advanced feature extraction, processing and interpretation.

Latest Release see on pypi.org
Package Status see on pypi.org
License see on github
Build Status see on travis

What is it

feature_stuff is a Python package providing fast and flexible algorithms and functions for extracting, processing and interpreting features:

Numeric feature extraction

add_interactions generic function for adding interaction features to a data frame either by passing them as a list or by passing a boosted trees model to extract the interactions from.
target_encoding target encoding of a feature column using exponential prior smoothing or mean prior smoothing
cv_target_encoding target encoding of a feature column taking cross-validation folds as input
add_knn_values creates a new feature with the K-nearest-neighbours of the values of a given feature
add_group_values generic and memory efficient enrichment of features dataframe with group values

Model feature insights extraction

get_xgboost_interactions takes a trained xgboost model and returns a list of interactions between features, to the order of maximum depth of all trees.

Installation

Binary installers for the latest released version are available at the Python package index .

# or PyPI
pip install feature_stuff

The source code is currently hosted on GitHub at: https://github.com/hiflyin/Feature-Stuff

Installation from sources

In the Feature-Stuff directory (same one where you found this file after cloning the git repo), execute:

python setup.py install

or for installing in development mode:

python setup.py develop

Alternatively, you can use pip if you want all the dependencies pulled in automatically (the -e option is for installing it in development mode):

pip install -e .

How to use it

< see the attached API of each function/ algorithm >

Example on extracting interactions form tree based models and adding them as new features to your dataset.

import feature_stuff as fs
import pandas as pd
import xgboost as xgb

data = pd.DataFrame({"x0":[0,1,0,1], "x1":range(4), "x2":[1,0,1,0]})
print data
   x0  x1  x2
0   0   0   1
1   1   1   0
2   0   2   1
3   1   3   0

target = data.x0 * data.x1 + data.x2*data.x1
print target.tolist()
[0, 1, 2, 3]

model = xgb.train({'max_depth': 4, "seed": 123}, xgb.DMatrix(data, label=target), num_boost_round=2)
fs.addInteractions(data, model)

# at least one of the interactions in target must have been discovered by xgboost
print data
   x0  x1  x2  inter_0
0   0   0   1        0
1   1   1   0        1
2   0   2   1        0
3   1   3   0        3

# if we want to inspect the interactions extracted
from feature_stuff import model_features_insights_extractions as insights
print insights.get_xgboost_interactions(model)
[['x0', 'x1']]

Contributing to feature-stuff

All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_stuff-0.0.dev5.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

feature_stuff-0.0.dev5-py2.py3-none-any.whl (16.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file feature_stuff-0.0.dev5.tar.gz.

File metadata

File hashes

Hashes for feature_stuff-0.0.dev5.tar.gz
Algorithm Hash digest
SHA256 00b12bd8ac03b93734e1dfbf94d35ddc88b394ea476b7bdde9d6d78e9278be3e
MD5 4dda635b20d46792994c974c94cb0c36
BLAKE2b-256 50b6fb70b4ab2eb976f228ec455ccc04225909ac8e4d61dcf7357f4534abddc8

See more details on using hashes here.

File details

Details for the file feature_stuff-0.0.dev5-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for feature_stuff-0.0.dev5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 def5309ed0907cf8aff09a380362baedef313fafb9f77141c394daf16cf22c4f
MD5 379474c0a948a0534d6bb055d41de22a
BLAKE2b-256 ef27f01d3c33e4046c2b513b38cf38746a78f7a5391df90df99182c0a7161053

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page