Skip to main content

Featuretools Transformer for Scikit-Learn Pipeline use.

Project description

featuretools-sklearn-transformer

CircleCI codecov

Featuretools' DFS as a scikit-learn transformer

Install

pip install featuretools_sklearn_transformer

Use

To use the transformer in a pipeline, initialize an instance of the transformer by passing in the parameters you would like to use for calculating features. To fit the model and generate features for the training data, pass in an entityset or list of entities and relationships containing only the relevant training data as the X input, along with the training targets as the y input. To generate a feature matrix from test data, pass in an entityset containing only the relevant test data as the X input.

The input supplied for X can take several formats:

  • To use a Featuretools EntitySet without cutoff times, simply pass in the EntitySet
  • To use a Featuretools EntitySet with a cutoff times DataFrame, pass in a tuple of the form (EntitySet, cutoff_time_df)
  • To use a list of Entities and Relationships without cutoff times, pass a tuple of the form (entities, relationships)
  • To use a list of Entities and Relationships with a cutoff times DataFrame, pass a tuple of the form ((entities, relationships), cutoff_time_df)

Note that because this transformer requires a Featuretools EntitySet or Entities and relationships as input, it does not currently work with certain methods such as sklearn.model_selection.cross_val_score or sklearn.model_selection.GridSearchCV which expect the X values to be an iterable which can be split by the method.

The example below shows how to use the transformer with an EntitySet, both with and without a cutoff time DataFrame.

import featuretools as ft
import pandas as pd

from featuretools.wrappers import DFSTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import ExtraTreesClassifier

# Get example data
train_es = ft.demo.load_mock_customer(return_entityset=True, n_customers=3)
test_es = ft.demo.load_mock_customer(return_entityset=True, n_customers=2)
y = [True, False, True]

# Build pipeline
pipeline = Pipeline(steps=[
    ('ft', DFSTransformer(target_entity="customers",
                          max_features=2)),
    ('et', ExtraTreesClassifier(n_estimators=100))
])

# Fit and predict
pipeline.fit(X=train_es, y=y) # fit on customers in training entityset
pipeline.predict_proba(test_es) # predict probability of each class on test entityset
pipeline.predict(test_es) # predict on test entityset

# Same as above, but using cutoff times
train_ct = pd.DataFrame()
train_ct['customer_id'] = [1, 2, 3]
train_ct['time'] = pd.to_datetime(['2014-1-1 04:00',
                                   '2014-1-2 17:20',
                                   '2014-1-4 09:53'])

pipeline.fit(X=(train_es, train_ct), y=y)

test_ct = pd.DataFrame()
test_ct['customer_id'] = [1, 2]
test_ct['time'] = pd.to_datetime(['2014-1-4 13:48',
                                  '2014-1-5 15:32'])
pipeline.predict_proba((test_es, test_ct))
pipeline.predict((test_es, test_ct))

Built at Alteryx Innovation Labs

Alteryx Innovation Labs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file featuretools_sklearn_transformer-0.2.0.tar.gz.

File metadata

  • Download URL: featuretools_sklearn_transformer-0.2.0.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9

File hashes

Hashes for featuretools_sklearn_transformer-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e70fe6645ea4aaf019f4074b0cc2673191071dae808fc4a3465374d1649cde05
MD5 21fdf7286555da2979dab4d649081819
BLAKE2b-256 4ac16ca534016eef64674077c64e345ed001e20bfa503ae8872b76a3ea08f0ce

See more details on using hashes here.

File details

Details for the file featuretools_sklearn_transformer-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: featuretools_sklearn_transformer-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9

File hashes

Hashes for featuretools_sklearn_transformer-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 de97ec02923a2399cc0d7aeda33019a39ca85402f0ee4e68742df4c2832a7672
MD5 7f84c3e09091ac46bd8dc37acc6e22a7
BLAKE2b-256 61b1e305bb9fef6bd1e33dd2e253ee6e5e35d0d58128f953f356d60113af8986

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page