Skip to main content

OpenFE: automated feature generation beyond expert-level performance

Project description

OpenFE: An efficient automated feature generation tool

| Paper | Documentation | Examples |

OpenFE is a new framework for automated feature generation for tabular data. OpenFE is easy-to-use, effective, and efficient with following advantages:

  • OpenFE can discover effective candidate features for improving the learning performance of both GBDT and neural networks.
  • OpenFE is efficient and supports parallel computing.
  • OpenFE covers 23 useful and effective operators for generating candidate features.
  • OpenFE supports binary-classification, multi-classification, and regression tasks.

For further details, please refer to the paper.

Extensive comparison experiments on public datasets show that OpenFE outperforms existing feature generation methods on both effectiveness and efficiency. Moreover, we validate OpenFE on the IEEE-CIS Fraud Detection Kaggle competition, and show that a simple XGBoost model with features generated by OpenFE beats 99.3% of 6351 data science teams. The features generated by OpenFE results in larger performance improvement than the features provided by the first-place team in the competition.

Get Started and Documentation

Installation

It is recommended to use pip for installation.

pip install openfe

Please do not use conda install openfe for installation. It will install another python package different from ours.

A Quick Example

It only takes four lines of codes to generate features by OpenFE. First, we generate features by OpenFE. Next, we augment the train and test data by the generated features.

from openfe import openfe, transform

ofe = openfe()
features = ofe.fit(data=train_x, label=train_y, n_jobs=n_jobs)  # generate new features
train_x, test_x = transform(train_x, test_x, features, n_jobs=n_jobs) # transform the train and test data according to generated features.

We provide an example using the standard california_housing dataset in this link. A more complicated example demonstrating OpenFE can outperform machine learning experts in the IEEE-CIS Fraud Detection Kaggle competition is provided in this link. Users can also refer to our [documentation] for more advanced usage of OpenFE and FAQ about feature generation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openfe-0.0.5.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

openfe-0.0.5-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file openfe-0.0.5.tar.gz.

File metadata

  • Download URL: openfe-0.0.5.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.8

File hashes

Hashes for openfe-0.0.5.tar.gz
Algorithm Hash digest
SHA256 1cb8f1b72a98fe3c629075ee8c1573446de19839ce5c69a0651fc298031587b9
MD5 282536bcf7a68ddbd222172d9d2efc5b
BLAKE2b-256 add430b96510d4863262f098fa5278015889f9ddba5b397464bd7e295adb7304

See more details on using hashes here.

File details

Details for the file openfe-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: openfe-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.8

File hashes

Hashes for openfe-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0b90ab3ccbf3b49fa51232c8c9505325834c8bdd2c42b71c9770167966d9c223
MD5 b00284e8cc955cb78b684a133305bcd3
BLAKE2b-256 92589d70d3bb17af8ef9894a6c696701216eacbf9657c8588f8ec611b10c335e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page