Skip to main content

No project description provided

Project description

feat_engine

feat_engine is a comprehensive feature engineering library designed to simplify and streamline feature processing tasks for machine learning models. It offers a wide range of tools for encoding categorical features, handling missing values, scaling and normalizing data, dimensionality reduction, visualizing data, and more.

Features

The feat_engine package provides the following modules to help with various stages of feature engineering:

  • encode_category.py: Encode categorical features with methods like one-hot encoding and label encoding.
  • group_features.py: Group and aggregate features based on categorical or time-based columns.
  • handle_missing_values.py: Handle missing data by filling or dropping missing values and visualizing missing data patterns.
  • handle_outliers.py: Detect and handle outliers using methods like Z-Score, IQR, Isolation Forest, DBSCAN, Winsorization, and more.
  • interact_features.py: Create interaction features, polynomial combinations, and more from existing features.
  • normalize_scaling.py: Apply normalization and scaling techniques including Min-Max scaling, Z-score standardization, and robust scaling.
  • reduce_dimension.py: Reduce the dimensionality of feature sets using methods such as PCA, LDA, t-SNE, UMAP, and autoencoders.
  • target_based_features.py: Create target-based encodings like target mean encoding, smoothed target mean encoding, and cross-validated target encoding.
  • temporal_features.py: Extract and transform time-based features, including creating rolling windows, lag features, and cyclical transformations.
  • transform_features.py: Apply mathematical transformations such as logarithmic, square root, power transformations, and more.
  • visualize_data.py: Visualize datasets using correlation heatmaps, distribution plots, missing value heatmaps, outlier detection, and more.

Installation

You can install the package by cloning the repository and installing dependencies:

git clone https://github.com/your-username/feat_engine.git
cd feat_engine
pip install -r requirements.txt

Usage

Here's a brief overview of how to use some of the key features from feat_engine:

Encoding Categorical Features

from feat_engine.encode_category import CategoryEncoder
import pandas as pd

df = pd.DataFrame({
    'category': ['A', 'B', 'C', 'A', 'B']
})

encoder = CategoryEncoder()
encoded_df = encoder.one_hot_encode(df, 'category')
print(encoded_df)

Handling Missing Values

from feat_engine.handle_missing_values import MissingValueHandler

df = pd.DataFrame({
    'feature1': [1, 2, None, 4],
    'feature2': [None, 2, 3, 4]
})

mv_handler = MissingValueHandler()
filled_df = mv_handler.fill_missing_values(df, method='mean')
print(filled_df)

Scaling Features

from feat_engine.normalize_scaling import FeatureScaler

scaler = FeatureScaler()
scaled_df = scaler.min_max_scale(df, columns=['feature1', 'feature2'])
print(scaled_df)

Reducing Dimensions

from feat_engine.reduce_dimension import DimensionReducer

df = pd.DataFrame({
    'feature1': [1, 2, 3, 4],
    'feature2': [2, 3, 4, 5],
    'feature3': [3, 4, 5, 6]
})

reducer = DimensionReducer()
reduced_df = reducer.pca(df, n_components=2)
print(reduced_df)

Visualizing Data

from feat_engine.visualize_data import DataVisualizer

visualizer = DataVisualizer()
visualizer.plot_correlation_heatmap(df)

Testing

The package includes test cases to ensure functionality. Run tests with:

pytest tests/

Make sure to set the backend for matplotlib to Agg during testing to avoid issues with Tkinter in non-GUI environments.

import matplotlib
matplotlib.use('Agg')

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feat_engine-1.0.0.tar.gz (42.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feat_engine-1.0.0-py3-none-any.whl (54.5 kB view details)

Uploaded Python 3

File details

Details for the file feat_engine-1.0.0.tar.gz.

File metadata

  • Download URL: feat_engine-1.0.0.tar.gz
  • Upload date:
  • Size: 42.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10

File hashes

Hashes for feat_engine-1.0.0.tar.gz
Algorithm Hash digest
SHA256 0b2614c1282bb9117b33a24f3ba7ebc168c83a4f9f92d6a20f6594380d2dab65
MD5 07f7054df0e9dc03cc2c9de019982cfa
BLAKE2b-256 7cd46cadb0b145f30956576e9fdfd7e88d85d9d489235adecfa27c43d39550b0

See more details on using hashes here.

File details

Details for the file feat_engine-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: feat_engine-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 54.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10

File hashes

Hashes for feat_engine-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba024fa8bedf4c5d15685c33b6efa85b81e278dfb843bbe63a38f1c0c490a8e3
MD5 4acd85ca2b156a53e0bd74b000386fb3
BLAKE2b-256 1b554cd2b0cfe5eec2ae495c354ba4f88a7e1b546c1b60175cab0cebaa34153b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page