Skip to main content

automatically generate prediction problems and labels for supervised learning.

Project description

Trane Logo

Tests Status Code Coverage PyPI Version PyPI Downloads


Trane is a software package that automatically generates problems for temporal datasets and produces labels for supervised learning. Its goal is to streamline the machine learning problem-solving process.

Install

Install Trane using pip:

python -m pip install trane

Usage

Here's a quick demonstration of Trane in action:

import trane

data, metadata = trane.load_airbnb()
problem_generator = trane.ProblemGenerator(
  metadata=metadata,
  entity_columns=["location"]
)
problems = problem_generator.generate()

for problem in problems[:5]:
    print(problem)

A few of the generated problems:

==================================================
Generated 40 total problems
--------------------------------------------------
Classification problems: 5
Regression problems: 35
==================================================
For each <location> predict if there exists a record
For each <location> predict if there exists a record with <location> equal to <str>
For each <location> predict if there exists a record with <location> not equal to <str>
For each <location> predict if there exists a record with <rating> equal to <str>
For each <location> predict if there exists a record with <rating> not equal to <str>

With Trane's LLM add-on (pip install trane[llm]), we can determine the relevant problems with OpenAI:

from trane.llm import analyze

instructions = "determine 5 most relevant problems about user's booking preferences. Do not include 'predict the first/last X' problems"
context = "Airbnb data listings in major cities, including information about hosts, pricing, location, and room type, along with over 5 million historical reviews."
relevant_problems = analyze(
    problems=problems,
    instructions=instructions,
    context=context,
    model="gpt-3.5-turbo-16k"
)
for problem in relevant_problems:
    print(problem)
    print(f'Reasoning: {problem.get_reasoning()}\n')

Output

For each <location> predict if there exists a record
Reasoning: This problem can help identify locations with missing data or locations that have not been booked at all.

For each <location> predict the first <location> in all related records
Reasoning: Predicting the first location in all related records can provide insights into the most frequently booked locations for each city.

For each <location> predict the first <rating> in all related records
Reasoning: Predicting the first rating in all related records can provide insights into the average satisfaction level of guests for each location.

For each <location> predict the last <location> in all related records
Reasoning: Predicting the last location in all related records can provide insights into the most recent bookings for each city.

For each <location> predict the last <rating> in all related records
Reasoning: Predicting the last rating in all related records can provide insights into the recent satisfaction level of guests for each location.

Community

Cite Trane

If you find Trane beneficial, consider citing our paper:

Ben Schreck, Kalyan Veeramachaneni. What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems. IEEE DSAA 2016, 440-451.

BibTeX entry:

@inproceedings{schreck2016would,
  title={What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems},
  author={Schreck, Benjamin and Veeramachaneni, Kalyan},
  booktitle={Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on},
  pages={440--451},
  year={2016},
  organization={IEEE}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trane-0.8.0.tar.gz (4.4 MB view details)

Uploaded Source

Built Distribution

trane-0.8.0-py3-none-any.whl (4.4 MB view details)

Uploaded Python 3

File details

Details for the file trane-0.8.0.tar.gz.

File metadata

  • Download URL: trane-0.8.0.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for trane-0.8.0.tar.gz
Algorithm Hash digest
SHA256 677514a691ba5a49a4b4569605a23990005549cd7943c71c8fc8e4ccef60684f
MD5 1ce664566a94b7eb49792eb64af887cf
BLAKE2b-256 2c8777b9b61a74c9b66392b9b383efdd2c572bffd4b40fd4d64e9fdb3f19a805

See more details on using hashes here.

File details

Details for the file trane-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: trane-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 4.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for trane-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9f69b86da4bd3226a1b25bb7f6fafb91ae47b9e7ef21a9dc99d4e200f6c9a8b5
MD5 7fd6e736471214a7059e6ce19fe38a18
BLAKE2b-256 f2f01755d68322eca0c1344c5786650ec0d4d1f2d141d1b3e9135fff28090d64

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page