Automatic Feature Engineering and Selection Linear Prediction Model

These details have not been verified by PyPI

Project links

Project description

`autofeat` library

Linear Prediction Models with Automated Feature Engineering and Selection

This library contains the AutoFeatRegressor and AutoFeatClassifier models with a similar interface as scikit-learn models:

fit() function to fit the model parameters
predict() function to predict the target variable given the input
predict_proba() function to predict probabilities of the target variable given the input (classifier only)
score() function to calculate the goodness of the fit (R^2/accuracy)
fit_transform() and transform() functions, which extend the given data by the additional features that were engineered and selected by the model

When calling the fit() function, internally the fit_transform() function will be called, so if you're planing to call transform() on the same data anyways, just call fit_transform() right away. transform() is mostly useful if you've split your data into training and test data and did not call fit_transform() on your whole dataset. The predict() and score() functions can either be given data in the format of the original dataframe that was used when calling fit()/fit_transform() or they can be given an already transformed dataframe.

In addition, only the feature selection part is also available in the FeatureSelector model.

Furthermore (as of version 2.0.0), minimal feature selection (removing zero variance and redundant features), engineering (simple product and ratio of features), and scaling (power transform to make features more normally distributed) is also available in the AutoFeatLight model.

The AutoFeatRegressor, AutoFeatClassifier, and FeatureSelector models need to be fit on data without NaNs, as they internally call the sklearn LassoLarsCV model, which can not handle NaNs. When calling transform(), NaNs (but not np.inf) are okay.

The autofeat examples notebook contains a simple usage example - try it out! :) Additional examples can be found in the autofeat benchmark notebooks for regression (which also contains the code to reproduce the results from the paper mentioned below) and classification, as well as the testing scripts.

Please keep in mind that since the AutoFeatRegressor and AutoFeatClassifier models can generate very complex features, they might overfit on noise in the dataset, i.e., find some new features that lead to good prediction on the training set but result in a poor performance on new test samples. While this usually only happens for datasets with very few samples, we suggest you carefully inspect the features found by autofeat and use those that make sense to you to train your own models.

Depending on the number of feateng_steps (default 2) and the number of input features, autofeat can generate a very huge feature matrix (before selecting the most appropriate features from this large feature pool). By specifying in feateng_cols those columns that you expect to be most valuable in the feature engineering part, the number of features can be greatly reduced. Additionally, transformations can be limited to only those feature transformations that make sense for your data. Last but not least, you can subsample the data used for training the model to limit the memory requirements. After the model was fit, you can call transform() on your whole dataset to generate only those few features that were selected during fit()/fit_transform().

Installation

You can either download the code from here and include the autofeat folder in your $PYTHONPATH or install (the library components only) via pip:

$ pip install autofeat

The library requires Python 3! Other dependencies: numpy, pandas, scikit-learn, sympy, joblib, pint and numba.

Paper

For further details on the model and implementation please refer to the paper - and of course if any of this code was helpful for your research, please consider citing it:

@inproceedings{horn2019autofeat,
  title={The autofeat Python Library for Automated Feature Engineering and Selection},
  author={Horn, Franziska and Pack, Robert and Rieger, Michael},
  booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases},
  pages={111--120},
  year={2019},
  organization={Springer}
}

If you don't like reading, you can also watch a video of my talk at the PyData conference about automated feature engineering and selection with autofeat.

The code is intended for research purposes.

If you have any questions please don't hesitate to send me an email and of course if you should find any bugs or want to contribute other improvements, pull requests are very welcome!

Acknowledgments

This project was made possible thanks to support by BASF.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.1.3

Jul 4, 2024

This version

2.1.2

Jul 28, 2023

2.1.1

Jun 25, 2023

2.1.0

May 14, 2023

2.0.10

Oct 28, 2021

2.0.9

Jul 12, 2021

2.0.8

Jun 3, 2021

2.0.7

Jun 2, 2021

2.0.6

Jun 2, 2021

2.0.5

Jan 15, 2021

2.0.4

Nov 30, 2020

2.0.3

Nov 11, 2020

2.0.2

Nov 10, 2020

2.0.1

Nov 10, 2020

2.0.0

Nov 7, 2020

1.1.3

Jul 21, 2020

1.1.2

Feb 28, 2020

1.1.1

Feb 25, 2020

1.1.0

Feb 24, 2020

1.0.0

Feb 24, 2020

0.2.5

May 12, 2019

0.2.4

May 12, 2019

0.2.3

May 11, 2019

0.2.2

May 9, 2019

0.2.1

May 9, 2019

0.2.0

May 2, 2019

0.1.1

Jan 23, 2019

0.1

Jan 22, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autofeat-2.1.2.tar.gz (25.1 kB view details)

Uploaded Jul 28, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autofeat-2.1.2-py3-none-any.whl (25.2 kB view details)

Uploaded Jul 28, 2023 Python 3

File details

Details for the file autofeat-2.1.2.tar.gz.

File metadata

Download URL: autofeat-2.1.2.tar.gz
Upload date: Jul 28, 2023
Size: 25.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.5.1 CPython/3.10.2 Darwin/22.5.0

File hashes

Hashes for autofeat-2.1.2.tar.gz
Algorithm	Hash digest
SHA256	`d2fa3c618b7c9c51467577b9dc0a0e1d10db7c25c196e47de32634856870b11f`
MD5	`a8e9576051b06489b7687e26990c8ba3`
BLAKE2b-256	`0c996a64d96dc056a1c3c9822c02ad010053d678d644a6e5462ee263833fe981`

See more details on using hashes here.

File details

Details for the file autofeat-2.1.2-py3-none-any.whl.

File metadata

Download URL: autofeat-2.1.2-py3-none-any.whl
Upload date: Jul 28, 2023
Size: 25.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.5.1 CPython/3.10.2 Darwin/22.5.0

File hashes

Hashes for autofeat-2.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2125d31fb59d084e8ff31687caa182bb3fb2927e5e762222cec9397bcb68b0f4`
MD5	`ecd8def4f6cbf699b505499741134e0c`
BLAKE2b-256	`17e35a1cf2cd0035a7e02d4e022aead457256e6917ba77eedc3550ea9fe29530`

See more details on using hashes here.

autofeat 2.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

`autofeat` library

Linear Prediction Models with Automated Feature Engineering and Selection

Installation

Paper

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

autofeat 2.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

autofeat library

Linear Prediction Models with Automated Feature Engineering and Selection

Installation

Paper

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`autofeat` library