Creates new DataFrame columns by applying strategically selected operations.
Project description
FeaturesCreation
Efficiently creates new DataFrame columns by applying strategically selected operations, optimizing result relevance and significance. It offers a wide range of functions, intelligent operation selection, and seamless integration with popular data analysis libraries, empowering users to enhance data manipulation effortlessly.
How it works: Transformation Process
The FeaturesCreation library offers a powerful transformation process that allows users to efficiently create new DataFrame columns with strategically selected operations.
- Instantiation and Fitting:
First, you need to instantiate the FeaturesCreation class and specify the classifier you want to use for selecting operations. For example, fe_cr = FeaturesCreation().
Then, you fit the FeaturesCreation instance to your data by calling fe_cr.fit(x, y, classifier, n_new_features), where x represents the feature data (input), y is the target column (output), classifier is the chosen classifier (e.g., LGBMClassifier), and n_new_features is the desired number of new features to create.
- Transformation Selection:
During the fitting process, the FeaturesCreation class intelligently selects the most relevant and significant transformations to apply to the data. It leverages the provided classifier to evaluate the importance of each potential transformation and selects the top operations that yield the best results.
- Application of Transformations:
After fitting, the selected transformations are ready to be applied to the original DataFrame. To apply these transformations, call fe_cr.apply_transformation(df, transformations), where df is the original DataFrame, and transformations contains the chosen operations.
- Resulting DataFrame:
The apply_transformation method returns a new DataFrame with the original data and the newly created columns resulting from the applied transformations.
DataFrame Before Transformations
Consider the original DataFrame as follows:
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |
---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 |
1 | 4.9 | 3.0 | 1.4 | 0.2 |
2 | 4.7 | 3.2 | 1.3 | 0.2 |
3 | 4.6 | 3.1 | 1.5 | 0.2 |
4 | 5.0 | 3.6 | 1.4 | 0.2 |
DataFrame After Transformations
Now, let's apply the transformations to the original DataFrame. The resulting DataFrame will be the newly created columns based on the selected operations:
sepal length (cm)__mod__petal length (cm) | sepal length (cm)__truediv__petal length (cm) | sepal width (cm)__truediv__petal width (cm) | |
---|---|---|---|
0 | 0.9 | 3.642857 | 17.5 |
1 | 0.7 | 3.500000 | 15.0 |
2 | 0.8 | 3.615385 | 16.0 |
3 | 0.1 | 3.066667 | 15.5 |
4 | 0.8 | 3.571429 | 18.0 |
The new columns are named in the format "feature1__operation__feature2" and contain the transformed values generated by applying the specified operations to the original data.
Examples
Examples can be found in examples/.
# Instantiate the FeaturesCreation class and the classifier
fe_cr = FeaturesCreation()
classifier = LGBMClassifier(verbose=-1)
# Define the number of new features to create
n_new_features = 3
# Separate the features (X) and the target column (y)
x, y = df.drop(columns=[target_column]), df[target_column]
# Create new transformations using FeaturesCreation.fit()
transformations = fe_cr.fit(x, y, classifier, n_new_features)
# Apply the transformations to the DataFrame using FeaturesCreation.apply_transformation()
transformed_df = fe_cr.apply_transformation(df, transformations)
# Concatenate the transformed DataFrame with the original DataFrame
transformed_df = pd.concat([df, transformed_df], axis=1)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for features_creation-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea7b323e41806fe84f9bf931e35f8b6d6c2b4d7baf2d5c75dfd597aa0941292d |
|
MD5 | 4c498fdb8a07c3b086b3d6c83d56a83c |
|
BLAKE2b-256 | 2023c25ae4649b1b330379f337120d5b461ed1f61ef81abd312738ec3f97b4a6 |