Skip to main content

a framework for automated feature engineering

Project description

Featuretools

"One of the holy grails of machine learning is to automate more and more of the feature engineering process." ― Pedro Domingos, A Few Useful Things to Know about Machine Learning

Tests Coverage Status PyPI version Anaconda-Server Badge StackOverflow Downloads

Featuretools is a python library for automated feature engineering. See the documentation for more information.

Installation

Install with pip

python -m pip install featuretools

or from the Conda-forge channel on conda:

conda install -c conda-forge featuretools

Add-ons

You can install add-ons individually or all at once by running

python -m pip install featuretools[complete]

Update checker - Receive automatic notifications of new Featuretools releases

python -m pip install featuretools[update_checker]

Example

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

>> import featuretools as ft
>> es = ft.demo.load_mock_customer(return_entityset=True)
>> es.plot()

Featuretools can automatically create a single table of features for any "target dataframe"

>> feature_matrix, features_defs = ft.dfs(entityset=es, target_dataframe_name="customers")
>> feature_matrix.head(5)
            zip_code  COUNT(transactions)  COUNT(sessions)  SUM(transactions.amount) MODE(sessions.device)  MIN(transactions.amount)  MAX(transactions.amount)  YEAR(join_date)  SKEW(transactions.amount)  DAY(join_date)                   ...                     SUM(sessions.MIN(transactions.amount))  MAX(sessions.SKEW(transactions.amount))  MAX(sessions.MIN(transactions.amount))  SUM(sessions.MEAN(transactions.amount))  STD(sessions.SUM(transactions.amount))  STD(sessions.MEAN(transactions.amount))  SKEW(sessions.MEAN(transactions.amount))  STD(sessions.MAX(transactions.amount))  NUM_UNIQUE(sessions.DAY(session_start))  MIN(sessions.SKEW(transactions.amount))
customer_id                                                                                                                                                                                                                                  ...
1              60091                  131               10                  10236.77               desktop                      5.60                    149.95             2008                   0.070041               1                   ...                                                     169.77                                 0.610052                                   41.95                               791.976505                              175.939423                                 9.299023                                 -0.377150                                5.857976                                        1                                -0.395358
2              02139                  122                8                   9118.81                mobile                      5.81                    149.15             2008                   0.028647              20                   ...                                                     114.85                                 0.492531                                   42.96                               596.243506                              230.333502                                10.925037                                  0.962350                                7.420480                                        1                                -0.470007
3              02139                   78                5                   5758.24               desktop                      6.78                    147.73             2008                   0.070814              10                   ...                                                      64.98                                 0.645728                                   21.77                               369.770121                              471.048551                                 9.819148                                 -0.244976                               12.537259                                        1                                -0.630425
4              60091                  111                8                   8205.28               desktop                      5.73                    149.56             2008                   0.087986              30                   ...                                                      83.53                                 0.516262                                   17.27                               584.673126                              322.883448                                13.065436                                 -0.548969                               12.738488                                        1                                -0.497169
5              02139                   58                4                   4571.37                tablet                      5.91                    148.17             2008                   0.085883              19                   ...                                                      73.09                                 0.830112                                   27.46                               313.448942                              198.522508                                 8.950528                                  0.098885                                5.599228                                        1                                -0.396571

[5 rows x 69 columns]

We now have a feature vector for each customer that can be used for machine learning. See the documentation on Deep Feature Synthesis for more examples.

Featuretools contains many different types of built-in primitives for creating features. If the primitive you need is not included, Featuretools also allows you to define your own custom primitives.

Demos

Predict Next Purchase

Repository | Notebook

In this demonstration, we use a multi-table dataset of 3 million online grocery orders from Instacart to predict what a customer will buy next. We show how to generate features with automated feature engineering and build an accurate machine learning pipeline using Featuretools, which can be reused for multiple prediction problems. For more advanced users, we show how to scale that pipeline to a large dataset using Dask.

For more examples of how to use Featuretools, check out our demos page.

Testing & Development

The Featuretools community welcomes pull requests. Instructions for testing and development are available here.

Support

The Featuretools community is happy to provide support to users of Featuretools. Project support can be found in four places depending on the type of question:

  1. For usage questions, use Stack Overflow with the featuretools tag.
  2. For bugs, issues, or feature requests start a Github issue.
  3. For discussion regarding development on the core library, use Slack.
  4. For everything else, the core developers can be reached by email at help@featuretools.com.

Citing Featuretools

If you use Featuretools, please consider citing the following paper:

James Max Kanter, Kalyan Veeramachaneni. Deep feature synthesis: Towards automating data science endeavors. IEEE DSAA 2015.

BibTeX entry:

@inproceedings{kanter2015deep,
  author    = {James Max Kanter and Kalyan Veeramachaneni},
  title     = {Deep feature synthesis: Towards automating data science endeavors},
  booktitle = {2015 {IEEE} International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, October 19-21, 2015},
  pages     = {1--10},
  year      = {2015},
  organization={IEEE}
}

Built at Alteryx Innovation Labs

Alteryx Innovation Labs

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featuretools-1.0.0.tar.gz (275.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

featuretools-1.0.0-py3-none-any.whl (329.1 kB view details)

Uploaded Python 3

File details

Details for the file featuretools-1.0.0.tar.gz.

File metadata

  • Download URL: featuretools-1.0.0.tar.gz
  • Upload date:
  • Size: 275.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for featuretools-1.0.0.tar.gz
Algorithm Hash digest
SHA256 04def8fe7dd3d3eec3cbf0a0355fe7729e012bbe39924d47ce3dfb732ef6c568
MD5 493e573a6a0a435aceb6d1c495b70eb5
BLAKE2b-256 ba43f1524521c3f8f7d35a1a27ad138dedfa772605c47eed8223d110f8b01f1a

See more details on using hashes here.

File details

Details for the file featuretools-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: featuretools-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 329.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.12

File hashes

Hashes for featuretools-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e4958946696d060f90c0c4b1cf48dcaf71f50ec5f506d4713be4c3c3bfbdb9d2
MD5 9d69572a4768c2965732279902739982
BLAKE2b-256 101b522096c713d1bede98b76e47420a206de06d073a4f1e5a8692152cf08759

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page