Featuretools Transformer for Scikit-Learn Pipeline use.
Project description
featuretools-sklearn-transformer
Featuretools' DFS as a scikit-learn transformer
Install
pip install featuretools_sklearn_transformer
Use
To use the transformer in a pipeline, initialize an instance of the transformer by passing in
the parameters you would like to use for calculating features. To fit the model and generate features for
the training data, pass in an entityset or list of entities and relationships containing only the relevant
training data as the X
input, along with the training targets as the y
input. To generate a feature matrix from test data, pass in
an entityset containing only the relevant test data as the X
input.
The input supplied for X
can take several formats:
- To use a Featuretools EntitySet without cutoff times, simply pass in the EntitySet
- To use a Featuretools EntitySet with a cutoff times DataFrame, pass in a tuple of the form (EntitySet, cutoff_time_df)
- To use a list of Entities and Relationships without cutoff times, pass a tuple of the form (entities, relationships)
- To use a list of Entities and Relationships with a cutoff times DataFrame, pass a tuple of the form ((entities, relationships), cutoff_time_df)
Note that because this transformer requires a Featuretools EntitySet or Entities and relationships as input, it does not currently work
with certain methods such as sklearn.model_selection.cross_val_score
or sklearn.model_selection.GridSearchCV
which expect the X
values
to be an iterable which can be split by the method.
The example below shows how to use the transformer with an EntitySet, both with and without a cutoff time DataFrame.
import featuretools as ft
import pandas as pd
from featuretools.wrappers import DFSTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import ExtraTreesClassifier
# Get example data
train_es = ft.demo.load_mock_customer(return_entityset=True, n_customers=3)
test_es = ft.demo.load_mock_customer(return_entityset=True, n_customers=2)
y = [True, False, True]
# Build pipeline
pipeline = Pipeline(steps=[
('ft', DFSTransformer(target_entity="customers",
max_features=2)),
('et', ExtraTreesClassifier(n_estimators=100))
])
# Fit and predict
pipeline.fit(X=train_es, y=y) # fit on customers in training entityset
pipeline.predict_proba(test_es) # predict probability of each class on test entityset
pipeline.predict(test_es) # predict on test entityset
# Same as above, but using cutoff times
train_ct = pd.DataFrame()
train_ct['customer_id'] = [1, 2, 3]
train_ct['time'] = pd.to_datetime(['2014-1-1 04:00',
'2014-1-2 17:20',
'2014-1-4 09:53'])
pipeline.fit(X=(train_es, train_ct), y=y)
test_ct = pd.DataFrame()
test_ct['customer_id'] = [1, 2]
test_ct['time'] = pd.to_datetime(['2014-1-4 13:48',
'2014-1-5 15:32'])
pipeline.predict_proba((test_es, test_ct))
pipeline.predict((test_es, test_ct))
Built at Alteryx Innovation Labs
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file featuretools_sklearn_transformer-0.2.0.tar.gz
.
File metadata
- Download URL: featuretools_sklearn_transformer-0.2.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e70fe6645ea4aaf019f4074b0cc2673191071dae808fc4a3465374d1649cde05 |
|
MD5 | 21fdf7286555da2979dab4d649081819 |
|
BLAKE2b-256 | 4ac16ca534016eef64674077c64e345ed001e20bfa503ae8872b76a3ea08f0ce |
File details
Details for the file featuretools_sklearn_transformer-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: featuretools_sklearn_transformer-0.2.0-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | de97ec02923a2399cc0d7aeda33019a39ca85402f0ee4e68742df4c2832a7672 |
|
MD5 | 7f84c3e09091ac46bd8dc37acc6e22a7 |
|
BLAKE2b-256 | 61b1e305bb9fef6bd1e33dd2e253ee6e5e35d0d58128f953f356d60113af8986 |