Skip to main content

automatically generate prediction problems and labels for supervised learning.

Project description


Tests PyPI Version PyPI Downloads

Automatically formulating machine learning tasks for temporal datasets

Trane is a software package for automatically generating prediction problems and generating labels for supervised learning. Trane is a system designed to advance the automation of the machine learning problem solving pipeline.

Trane About Video


To install Trane, run the following command:

python -m pip install trane

Prediction Problems

In data science, people usually have a few records of an entity and want to predict what will happen to that entity in the future. Trane is designed to generate time-related prediction problems. Trane transforms data meta information into lists of relevant prediction problems and cutoff times. Prediction problems are structured in a formal language described in Operations below. Cutoff times are defined as the last time in the data used for training the classifier. Data after the cutoff time is used for evaluating the classifiers accuracy. Cutoff times are necessary to prevent the classifier from training to test data.


A bank wants to predict how many transactions over 100$ a customer will make in the next year. Assume we have all the transaction records for each client from 2015 to 2017. We want to build a machine learning method to solve the prediction problem. Here is the example database.

User_id Time Transaction_id Amount
u1 2015 1-2015-1 10
u1 2015 1-2015-2 200
u2 2015 2-2015-1 50
u1 2016 1-2016-1 10
u1 2017 1-2017-1 1000
u1 2017 1-2017-2 20
u2 2017 2-2017-1 10

First, we seperate the data by entity. Here the entity is user_id. User u1 for example, has

User_id Time Transaction_id Amount
u1 2015 1-2015-1 10
u1 2015 1-2015-2 200
u1 2016 1-2016-1 10
u1 2017 1-2017-1 1000
u1 2017 1-2017-2 20

Let's consider a cutoff time equal to 2016. The data from 2015-2016 will be used as training data in the machine learning model. Data after 2016, that is data from 2016-2017 will be used to evaluate the trained model. Trane outputs a tuple of (entity, cutoff, label) for each prediction problem. A prediction problem is applied to entity data to generate the label. The data from Trane can be fed directly into Feature Tools to perform feature engineering.

Prediction Problem Generation

As shown in the example, a prediction problem is a sequence of operations applied to data as well as a cutoff time.

In Trane, we generate prediction problems with four operations: Filter Operations, Row Operations, Transformation Operations and Aggregation Operations. Filter operations are applied on the filter_column. Row, Transformation and Aggregation Operations are applied on the label_generating_column.


The workflow of using Trane on a database is as follows:

  • Data scientist writes a meta.json describing columns and data types in the new database.
  • PredictionProblemGenerator reads the meta data and generates possible prediction problems. The prediction problems are saved to problems.json.
  • The data scientist can change parameters to the prediction problems in problems.json.
  • The labeler applies prediction problems in problems.json to the database data.csv

Built-in Operations

  • FilterOp
    • IdentityFilterOp
    • GreaterFilterOp
  • RowOp
    • IdentityRowOp
    • GreaterRowOp
  • TransformationOp
    • IdentityTransformationOp
    • DiffTransformationOp
  • AggregationOp
    • FirstAggregationOp
    • CountAggregationOp
    • SumAggregationOp
    • LastAggregationOp
    • LMFAggregationOp

Quick Usage

We have tutorial notebooks here.


We started working on Trane in 2015. In its first iteration in 2016, we showed that it is possible to formally specify prediction problems using a language and then also created algorithms to generate prediction problems automatically. With other tools to synthesize features and generate models given a prediction problem - we were able to solve problems end-to-end. You can read our paper here. Ben Schreck's thesis goes even further to see if we can learn and filter uninteresting problems.

This repository is a second iteration where we are focusing on usability, apis and showing more use cases and ultimately taking it to real world datasets. Stay tuned for more demos and examples.

You can find the related theses here:

Citing Trane

If you use Trane, please consider citing the following paper:

Ben Schreck, Kalyan Veeramachaneni. What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems. IEEE DSAA 2016, 440-451

BibTeX entry:

  title={What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems},
  author={Schreck, Benjamin and Veeramachaneni, Kalyan},
  booktitle={Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on},

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trane-0.3.0.tar.gz (25.1 kB view hashes)

Uploaded source

Built Distribution

trane-0.3.0-py3-none-any.whl (28.5 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page