Skip to main content

Context-Aware Automated Feature Engineering (CAAFE) is an automated machine learning tool that uses large language models for feature engineering in tabular datasets. It generates Python code for new features along with explanations for their utility, enhancing interpretability.

Project description

Usage

DEMO VIDEO

CAAFE lets you semi-automate your feature engineering process based on your explanations on the dataset and with the help of language models. It is based on the paper "LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering" by Hollmann, Müller, and Hutter (2023). CAAFE systematically verifies the generated features to ensure that only features that are actually useful are added to the dataset.

To use CAAFE, first create a CAAFEClassifier object with the desired parameters:

caafe_clf = CAAFEClassifier(base_classifier=clf_no_feat_eng,
                      llm_model="gpt-4",
                      iterations=2)

Then, fit the classifier to your training data:

caafe_clf.fit_pandas(df_train,
               target_column_name=target_column_name,
               dataset_description=dataset_description,
              disable_caafe=False
              )

Finally, use the classifier to make predictions on your test data:

pred = caafe_clf.predict(df_test)

You can also try out the demo at: https://colab.research.google.com/drive/1mCA8xOAJZ4MaB_alZvyARTMjhl6RZf0a

For a minimal example of how to use CAAFE on your dataset, use CAFE_minimal.ipynb. To reproduce the experiments from the paper, use CAAFE.ipynb.

Paper

Hollmann, N., Müller, S., & Hutter, F. (2023). LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering https://arxiv.org/abs/2305.03403

License

CC BY-NC-SA 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caafe-0.1.2.tar.gz (19.2 kB view hashes)

Uploaded Source

Built Distribution

caafe-0.1.2-py3-none-any.whl (21.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page