Skip to main content

Feature extraction, processing and interpretation algorithms and functions for machine learning and data science.

Project description


feature_stuff: a python algorithms distribution for advanced feature extraction, processing and interpretation in machine learning, data science, AI.

Latest Release
Package Status
License
Build Status

What is it

feature_stuff is a Python package providing fast and flexible algorithms and functions for extracting, processing and interpreting features. It includes functions like feature interaction extraction from from boosted decision tree based models, generic target encoding and memory efficient enrichment of features dataframe with group values.

How to get it

Binary installers for the latest released version are available at the Python package index .

# or PyPI
pip install pandas

The source code is currently hosted on GitHub at: https://github.com/hiflyin/Feature-Stuff

Installation from sources

In the Feature-Stuff directory (same one where you found this file after cloning the git repo), execute:

python setup.py install

or for installing in development mode:

python setup.py develop

Alternatively, you can use pip if you want all the dependencies pulled in automatically (the -e option is for installing it in development mode):

pip install -e .
``

## How to use it

Example on extracting interactions form tree based models and adding
them as new features to your dataset.

```sh
import feature_stuff as fs
import pandas as pd
import xgboost as xgb

data = pd.DataFrame({"x0":[0,1,0,1], "x1":range(4), "x2":[1,0,1,0]})
print data
   x0  x1  x2
0   0   0   1
1   1   1   0
2   0   2   1
3   1   3   0

target = data.x0 * data.x1 + data.x2*data.x1
print target.tolist()
[0, 1, 2, 3]

model = xgb.train({'max_depth': 4, "seed": 123}, xgb.DMatrix(data, label=target), num_boost_round=2)
fs.addInteractions(data, model)

# at least one of the interactions in target must have been discovered by xgboost
print data
   x0  x1  x2  inter_0
0   0   0   1        0
1   1   1   0        1
2   0   2   1        0
3   1   3   0        3

# if we want to inspect the interactions extracted
from feature_stuff import model_features_insights_extractions as insights
print insights.get_xgboost_interactions(model)
[['x0', 'x1']]

``

## Contributing to feature-stuff

All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_stuff-0.0.dev2.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

feature_stuff-0.0.dev2-py2.py3-none-any.whl (12.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file feature_stuff-0.0.dev2.tar.gz.

File metadata

File hashes

Hashes for feature_stuff-0.0.dev2.tar.gz
Algorithm Hash digest
SHA256 754a4bea67cce08abd11361d6a1f3bf36628ab45b96e38e861a62060a5f0fc43
MD5 efb3b55dc793c62889cfc889a26820e4
BLAKE2b-256 9e481c9e3c2b2cb9a939ceb9d3a78e9bd28e2bd8ef19ce1257b7b5823a55974e

See more details on using hashes here.

File details

Details for the file feature_stuff-0.0.dev2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for feature_stuff-0.0.dev2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 22479593071fc7c977b3c9896ae5ff57c40d3b05940a54eee2c6fe5c0c2917ec
MD5 a8a037c6ac877e4da5e2576bb45e2a00
BLAKE2b-256 29453baa00148d4d74d1ac1ad4dda3c830548726c7892d6dff70dd1c20eacdd3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page