automatically generate prediction problems and labels for supervised learning.
Project description
Trane is a software package that automatically generates problems for temporal datasets and produces labels for supervised learning. Its goal is to streamline the machine learning problem-solving process.
Install
Install Trane using pip:
python -m pip install trane
Usage
Here's a quick demonstration of Trane in action:
import trane
data, metadata = trane.load_airbnb()
problem_generator = trane.ProblemGenerator(
metadata=metadata,
entity_columns=["location"]
)
problems = problem_generator.generate()
for problem in problems[:5]:
print(problem)
A few of the generated problems:
==================================================
Generated 40 total problems
--------------------------------------------------
Classification problems: 5
Regression problems: 35
==================================================
For each <location> predict if there exists a record
For each <location> predict if there exists a record with <location> equal to <str>
For each <location> predict if there exists a record with <location> not equal to <str>
For each <location> predict if there exists a record with <rating> equal to <str>
For each <location> predict if there exists a record with <rating> not equal to <str>
With Trane's LLM add-on (pip install trane[llm]
), we can determine the relevant problems with OpenAI:
from trane.llm import analyze
instructions = "determine 5 most relevant problems about user's booking preferences. Do not include 'predict the first/last X' problems"
context = "Airbnb data listings in major cities, including information about hosts, pricing, location, and room type, along with over 5 million historical reviews."
relevant_problems = analyze(
problems=problems,
instructions=instructions,
context=context,
model="gpt-3.5-turbo-16k"
)
for problem in relevant_problems:
print(problem)
print(f'Reasoning: {problem.get_reasoning()}\n')
Output
For each <location> predict if there exists a record
Reasoning: This problem can help identify locations with missing data or locations that have not been booked at all.
For each <location> predict the first <location> in all related records
Reasoning: Predicting the first location in all related records can provide insights into the most frequently booked locations for each city.
For each <location> predict the first <rating> in all related records
Reasoning: Predicting the first rating in all related records can provide insights into the average satisfaction level of guests for each location.
For each <location> predict the last <location> in all related records
Reasoning: Predicting the last location in all related records can provide insights into the most recent bookings for each city.
For each <location> predict the last <rating> in all related records
Reasoning: Predicting the last rating in all related records can provide insights into the recent satisfaction level of guests for each location.
Community
- Questions or Issues? Create a GitHub issue.
- Want to Chat? Join our Slack community.
Cite Trane
If you find Trane beneficial, consider citing our paper:
Ben Schreck, Kalyan Veeramachaneni. What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems. IEEE DSAA 2016, 440-451.
BibTeX entry:
@inproceedings{schreck2016would,
title={What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems},
author={Schreck, Benjamin and Veeramachaneni, Kalyan},
booktitle={Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on},
pages={440--451},
year={2016},
organization={IEEE}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file trane-0.8.0.tar.gz
.
File metadata
- Download URL: trane-0.8.0.tar.gz
- Upload date:
- Size: 4.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 677514a691ba5a49a4b4569605a23990005549cd7943c71c8fc8e4ccef60684f |
|
MD5 | 1ce664566a94b7eb49792eb64af887cf |
|
BLAKE2b-256 | 2c8777b9b61a74c9b66392b9b383efdd2c572bffd4b40fd4d64e9fdb3f19a805 |
File details
Details for the file trane-0.8.0-py3-none-any.whl
.
File metadata
- Download URL: trane-0.8.0-py3-none-any.whl
- Upload date:
- Size: 4.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f69b86da4bd3226a1b25bb7f6fafb91ae47b9e7ef21a9dc99d4e200f6c9a8b5 |
|
MD5 | 7fd6e736471214a7059e6ce19fe38a18 |
|
BLAKE2b-256 | f2f01755d68322eca0c1344c5786650ec0d4d1f2d141d1b3e9135fff28090d64 |