Skip to main content

Creates new DataFrame columns by applying strategically selected operations.

Project description

FeaturesCreation

Efficiently creates new DataFrame columns by applying strategically selected operations, optimizing result relevance and significance. It offers a wide range of functions, intelligent operation selection, and seamless integration with popular data analysis libraries, empowering users to enhance data manipulation effortlessly.

How it works: Transformation Process

The FeaturesCreation library offers a powerful transformation process that allows users to efficiently create new DataFrame columns with strategically selected operations.

  1. Instantiation and Fitting:

First, you need to instantiate the FeaturesCreation class and specify the classifier you want to use for selecting operations. For example, fe_cr = FeaturesCreation().

Then, you fit the FeaturesCreation instance to your data by calling fe_cr.fit(x, y, classifier, n_new_features), where x represents the feature data (input), y is the target column (output), classifier is the chosen classifier (e.g., LGBMClassifier), and n_new_features is the desired number of new features to create.

  1. Transformation Selection:

During the fitting process, the FeaturesCreation class intelligently selects the most relevant and significant transformations to apply to the data. It leverages the provided classifier to evaluate the importance of each potential transformation and selects the top operations that yield the best results.

  1. Application of Transformations:

After fitting, the selected transformations are ready to be applied to the original DataFrame. To apply these transformations, call fe_cr.apply_transformation(df, transformations), where df is the original DataFrame, and transformations contains the chosen operations.

  1. Resulting DataFrame:

The apply_transformation method returns a new DataFrame with the original data and the newly created columns resulting from the applied transformations.

DataFrame Before Transformations

Consider the original DataFrame as follows:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

DataFrame After Transformations

Now, let's apply the transformations to the original DataFrame. The resulting DataFrame will be the newly created columns based on the selected operations:

sepal length (cm)__mod__petal length (cm) sepal length (cm)__truediv__petal length (cm) sepal width (cm)__truediv__petal width (cm)
0 0.9 3.642857 17.5
1 0.7 3.500000 15.0
2 0.8 3.615385 16.0
3 0.1 3.066667 15.5
4 0.8 3.571429 18.0

The new columns are named in the format "feature1__operation__feature2" and contain the transformed values generated by applying the specified operations to the original data.

Examples

Examples can be found in examples/.

# Instantiate the FeaturesCreation class and the classifier
fe_cr = FeaturesCreation()
classifier = LGBMClassifier(verbose=-1)

# Define the number of new features to create
n_new_features = 3

# Separate the features (X) and the target column (y)
x, y = df.drop(columns=[target_column]), df[target_column]

# Create new transformations using FeaturesCreation.fit()
transformations = fe_cr.fit(x, y, classifier, n_new_features)

# Apply the transformations to the DataFrame using FeaturesCreation.apply_transformation()
transformed_df = fe_cr.apply_transformation(df, transformations)

# Concatenate the transformed DataFrame with the original DataFrame
transformed_df = pd.concat([df, transformed_df], axis=1)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

features_creation-0.1.0.tar.gz (4.9 kB view hashes)

Uploaded Source

Built Distribution

features_creation-0.1.0-py3-none-any.whl (6.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page