composeml

No project description provided

Project description

Compose

Compose is a python library for automated prediction engineering. An end user defines an outcome of interest over the data by writing a "labeling function". Compose will automatically search and extract historical training examples to train machine learning examples, balance them across time, entities and label categories to reduce biases in learning process. See the documentation for more information.

Its result is then provided to the automatic feature engineering tools Featuretools and subsequently to AutoML/ML libraries to develop a model. This automation for the very early stage of ML pipeline process allows our end user to easily define a task and solve it. The workflow of an applied machine learning engineer then becomes:

Compose

Installation

Compose can be installed by running the following command.

pip install composeml

Example

In this example, we will generate labels on a mock dataset of transactions. For each customer, we want to label whether the total purchase amount over the next hour of transactions will exceed $300. Additionally, we want to predict one hour in advance.

Load Data

With the package installed, we load in the data. To get an idea on how the transactions looks, we preview the data frame.

import composeml as cp

df = cp.demos.load_transactions()

df[df.columns[:7]].head()

transaction_id	session_id	transaction_time	product_id	amount	customer_id	device
298	1	2014-01-01 00:00:00	5	127.64	2	desktop
10	1	2014-01-01 00:09:45	5	57.39	2	desktop
495	1	2014-01-01 00:14:05	5	69.45	2	desktop
460	10	2014-01-01 02:33:50	5	123.19	2	tablet
302	10	2014-01-01 02:37:05	5	64.47	2	tablet

Create Labeling Function

To get started, we define the labeling function that will return the total purchase amount given a hour of transactions.

def total_spent(df):
    total = df['amount'].sum()
    return total

Construct Label Maker

With the labeling function, we create the LabelMaker for our prediction problem. To process one hour of transactions for each customer, we set the target_entity to the customer ID and the window_size to one hour.

label_maker = cp.LabelMaker(
    target_entity="customer_id",
    time_index="transaction_time",
    labeling_function=total_spent,
    window_size="1h",
)

Search Labels

Next, we automatically search and extract the labels by using LabelMaker.search. For more details on how the label maker works, see Main Concepts.

labels = label_maker.search(
    df.sort_values('transaction_time'),
    num_examples_per_instance=-1,
    gap=1,
    verbose=True,
)

labels.head()

customer_id	cutoff_time	total_spent
1	2014-01-01 00:45:30	914.73
1	2014-01-01 00:46:35	806.62
1	2014-01-01 00:47:40	694.09
1	2014-01-01 00:52:00	687.80
1	2014-01-01 00:53:05	656.43

Transform Labels

With the generated LabelTimes, we will apply specific transforms for our prediction problem. To make the labels binary, a threshold is applied for amounts exceeding $300.

labels = labels.threshold(300)

labels.head()

customer_id	cutoff_time	total_spent
1	2014-01-01 00:45:30	True
1	2014-01-01 00:46:35	True
1	2014-01-01 00:47:40	True
1	2014-01-01 00:52:00	True
1	2014-01-01 00:53:05	True

Additionally, the label times are shifted one hour earlier for predicting in advance.

labels = labels.apply_lead('1h')

labels.head()

customer_id	cutoff_time	total_spent
1	2013-12-31 23:45:30	True
1	2013-12-31 23:46:35	True
1	2013-12-31 23:47:40	True
1	2013-12-31 23:52:00	True
1	2013-12-31 23:53:05	True

Describe Labels

After transforming the labels, we can use LabelTimes.describe to print out the distribution with the settings and transforms that were used to make these labels. This is useful as a reference for understanding how the labels were generated from raw data. Also, the label distribution is helpful for determining if we have imbalanced labels.

labels.describe()

Label Distribution
------------------
False      56
True       44
Total:    100


Settings
--------
num_examples_per_instance        -1
minimum_data                   None
window_size                  <Hour>
gap                               1


Transforms
----------
1. threshold
  - value:    300

2. apply_lead
  - value:    1h

Testing & Development

The Feature Labs community welcomes pull requests. Instructions for testing and development are available here.

Support

The Feature Labs open source community is happy to provide support to users of Compose. Project support can be found in four places depending on the type of question:

For usage questions, use Stack Overflow with the composeml tag.
For bugs, issues, or feature requests start a Github issue.
For discussion regarding development on the core library, use Slack.
For everything else, the core developers can be reached by email at help@featurelabs.com.

Citing Compose

Compose is built upon a newly defined part of the machine learning process - prediction engineering. If you use Compose please consider citing this paper: James Max Kanter, Gillespie, Owen, Kalyan Veeramachaneni. Label, Segment,Featurize: a cross domain framework for prediction engineering. IEEE DSAA 2016.

BibTeX entry:

@inproceedings{kanter2016label,
  title={Label, segment, featurize: a cross domain framework for prediction engineering},
  author={Kanter, James Max and Gillespie, Owen and Veeramachaneni, Kalyan},
  booktitle={2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)},
  pages={430--439},
  year={2016},
  organization={IEEE}
}

Acknowledgements

Compose open source has been developed by Feature Labs engineering team. The open source development has been supported in part by DARPA's Data driven discovery of models program (D3M).

Feature Labs

Compose is an open source project created by Feature Labs. We developed Compose to enable flexible definition of the machine learning task. Read more about our rationale behind automating and developing this stage of the machine learning process here.

To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.

Project details

Release history Release notifications | RSS feed

0.10.1

Jan 7, 2023

0.10.0

Jan 6, 2023

0.9.1

Nov 2, 2022

0.9.0

May 12, 2022

0.8.0

Jan 20, 2022

0.7.0

Nov 2, 2021

0.6.0

Feb 11, 2021

0.5.1

Sep 22, 2020

0.5.0

Aug 28, 2020

0.4.0

Jul 2, 2020

0.3.0

Jun 1, 2020

0.2.0

Apr 23, 2020

0.1.8

Mar 11, 2020

0.1.7

Jan 31, 2020

This version

0.1.6

Oct 22, 2019

0.1.5

Sep 16, 2019

0.1.4

Aug 7, 2019

0.1.3

Jul 9, 2019

0.1.2

Jun 20, 2019

0.1.1

May 31, 2019

0.1.0

May 31, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

composeml-0.1.6.tar.gz (22.3 kB view details)

Uploaded Oct 22, 2019 Source

Built Distribution

composeml-0.1.6-py3-none-any.whl (25.7 kB view details)

Uploaded Oct 22, 2019 Python 3

File details

Details for the file composeml-0.1.6.tar.gz.

File metadata

Download URL: composeml-0.1.6.tar.gz
Upload date: Oct 22, 2019
Size: 22.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for composeml-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`430453acd512d5b18601f10f33bd6ddab26ea8cf051a60e3f65a5d0265ed2707`
MD5	`8d04e879dfa574a284e43c5f2a949425`
BLAKE2b-256	`df693fe202f76d4a6042d6638bc41bb05fe0f28d07965a1c489b160ee3cddcd4`

See more details on using hashes here.

File details

Details for the file composeml-0.1.6-py3-none-any.whl.

File metadata

Download URL: composeml-0.1.6-py3-none-any.whl
Upload date: Oct 22, 2019
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for composeml-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8cf157c6077c6b18c8a5322eeaad4c2e6ca955d982a82983a5ed88b38e8ce7fd`
MD5	`061983eb0ce179d6df460e09662394c9`
BLAKE2b-256	`c6607fdddc1eb712a90e637ffba18c7e314c3da8ee07925905255dcb230504ff`