Skip to main content

No project description provided

Project description

feat_engine

feat_engine is a comprehensive feature engineering library designed to simplify and streamline feature processing tasks for machine learning models. It offers a wide range of tools for encoding categorical features, handling missing values, scaling and normalizing data, dimensionality reduction, visualizing data, and more.

Features

The feat_engine package provides the following modules to help with various stages of feature engineering:

  • encode_category.py: Encode categorical features with methods like one-hot encoding and label encoding.
  • group_features.py: Group and aggregate features based on categorical or time-based columns.
  • handle_missing_values.py: Handle missing data by filling or dropping missing values and visualizing missing data patterns.
  • handle_outliers.py: Detect and handle outliers using methods like Z-Score, IQR, Isolation Forest, DBSCAN, Winsorization, and more.
  • interact_features.py: Create interaction features, polynomial combinations, and more from existing features.
  • normalize_scaling.py: Apply normalization and scaling techniques including Min-Max scaling, Z-score standardization, and robust scaling.
  • reduce_dimension.py: Reduce the dimensionality of feature sets using methods such as PCA, LDA, t-SNE, UMAP, and autoencoders.
  • target_based_features.py: Create target-based encodings like target mean encoding, smoothed target mean encoding, and cross-validated target encoding.
  • temporal_features.py: Extract and transform time-based features, including creating rolling windows, lag features, and cyclical transformations.
  • transform_features.py: Apply mathematical transformations such as logarithmic, square root, power transformations, and more.
  • visualize_data.py: Visualize datasets using correlation heatmaps, distribution plots, missing value heatmaps, outlier detection, and more.

Installation

You can install the package by cloning the repository and installing dependencies:

git clone https://github.com/your-username/feat_engine.git
cd feat_engine
pip install -r requirements.txt

Usage

Here's a brief overview of how to use some of the key features from feat_engine:

Encoding Categorical Features

from feat_engine.encode_category import CategoryEncoder
import pandas as pd

df = pd.DataFrame({
    'category': ['A', 'B', 'C', 'A', 'B']
})

encoder = CategoryEncoder()
encoded_df = encoder.one_hot_encode(df, 'category')
print(encoded_df)

Handling Missing Values

from feat_engine.handle_missing_values import MissingValueHandler

df = pd.DataFrame({
    'feature1': [1, 2, None, 4],
    'feature2': [None, 2, 3, 4]
})

mv_handler = MissingValueHandler()
filled_df = mv_handler.fill_missing_values(df, method='mean')
print(filled_df)

Scaling Features

from feat_engine.normalize_scaling import FeatureScaler

scaler = FeatureScaler()
scaled_df = scaler.min_max_scale(df, columns=['feature1', 'feature2'])
print(scaled_df)

Reducing Dimensions

from feat_engine.reduce_dimension import DimensionReducer

df = pd.DataFrame({
    'feature1': [1, 2, 3, 4],
    'feature2': [2, 3, 4, 5],
    'feature3': [3, 4, 5, 6]
})

reducer = DimensionReducer()
reduced_df = reducer.pca(df, n_components=2)
print(reduced_df)

Visualizing Data

from feat_engine.visualize_data import DataVisualizer

visualizer = DataVisualizer()
visualizer.plot_correlation_heatmap(df)

Testing

The package includes test cases to ensure functionality. Run tests with:

pytest tests/

Make sure to set the backend for matplotlib to Agg during testing to avoid issues with Tkinter in non-GUI environments.

import matplotlib
matplotlib.use('Agg')

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feat_engine-0.1.0.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feat_engine-0.1.0-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file feat_engine-0.1.0.tar.gz.

File metadata

  • Download URL: feat_engine-0.1.0.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10

File hashes

Hashes for feat_engine-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5c1a50d90f411eb85d848d2b273af86480dc566ae6401736335e39fc3826c008
MD5 a021f8c0aab66d7a9f1457a4d344ac72
BLAKE2b-256 df65da9090b514892df1c329496dab09c62a4105e8bce5948ac81463d643a3c7

See more details on using hashes here.

File details

Details for the file feat_engine-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: feat_engine-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10

File hashes

Hashes for feat_engine-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a999a9764f7742a57a5f5821287142d048cc42c202eb81fb4ceb2cdac50bfcf9
MD5 76a2b5c707a06faa1d439a4218b19dd8
BLAKE2b-256 4d6b253fd1e4a2e96889ec69d2b3ed77e810cd15c959ae64fcd869e6c0163f5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page