No project description provided
Project description
feat_engine
feat_engine is a comprehensive feature engineering library designed to simplify and streamline feature processing tasks for machine learning models. It offers a wide range of tools for encoding categorical features, handling missing values, scaling and normalizing data, dimensionality reduction, visualizing data, and more.
Features
The feat_engine package provides the following modules to help with various stages of feature engineering:
encode_category.py: Encode categorical features with methods like one-hot encoding and label encoding.group_features.py: Group and aggregate features based on categorical or time-based columns.handle_missing_values.py: Handle missing data by filling or dropping missing values and visualizing missing data patterns.handle_outliers.py: Detect and handle outliers using methods like Z-Score, IQR, Isolation Forest, DBSCAN, Winsorization, and more.interact_features.py: Create interaction features, polynomial combinations, and more from existing features.normalize_scaling.py: Apply normalization and scaling techniques including Min-Max scaling, Z-score standardization, and robust scaling.reduce_dimension.py: Reduce the dimensionality of feature sets using methods such as PCA, LDA, t-SNE, UMAP, and autoencoders.target_based_features.py: Create target-based encodings like target mean encoding, smoothed target mean encoding, and cross-validated target encoding.temporal_features.py: Extract and transform time-based features, including creating rolling windows, lag features, and cyclical transformations.transform_features.py: Apply mathematical transformations such as logarithmic, square root, power transformations, and more.visualize_data.py: Visualize datasets using correlation heatmaps, distribution plots, missing value heatmaps, outlier detection, and more.
Installation
You can install the package by cloning the repository and installing dependencies:
git clone https://github.com/your-username/feat_engine.git
cd feat_engine
pip install -r requirements.txt
Usage
Here's a brief overview of how to use some of the key features from feat_engine:
Encoding Categorical Features
from feat_engine.encode_category import CategoryEncoder
import pandas as pd
df = pd.DataFrame({
'category': ['A', 'B', 'C', 'A', 'B']
})
encoder = CategoryEncoder()
encoded_df = encoder.one_hot_encode(df, 'category')
print(encoded_df)
Handling Missing Values
from feat_engine.handle_missing_values import MissingValueHandler
df = pd.DataFrame({
'feature1': [1, 2, None, 4],
'feature2': [None, 2, 3, 4]
})
mv_handler = MissingValueHandler()
filled_df = mv_handler.fill_missing_values(df, method='mean')
print(filled_df)
Scaling Features
from feat_engine.normalize_scaling import FeatureScaler
scaler = FeatureScaler()
scaled_df = scaler.min_max_scale(df, columns=['feature1', 'feature2'])
print(scaled_df)
Reducing Dimensions
from feat_engine.reduce_dimension import DimensionReducer
df = pd.DataFrame({
'feature1': [1, 2, 3, 4],
'feature2': [2, 3, 4, 5],
'feature3': [3, 4, 5, 6]
})
reducer = DimensionReducer()
reduced_df = reducer.pca(df, n_components=2)
print(reduced_df)
Visualizing Data
from feat_engine.visualize_data import DataVisualizer
visualizer = DataVisualizer()
visualizer.plot_correlation_heatmap(df)
Testing
The package includes test cases to ensure functionality. Run tests with:
pytest tests/
Make sure to set the backend for matplotlib to Agg during testing to avoid issues with Tkinter in non-GUI environments.
import matplotlib
matplotlib.use('Agg')
Contributing
Contributions are welcome! Feel free to open issues or submit pull requests.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file feat_engine-1.0.0.tar.gz.
File metadata
- Download URL: feat_engine-1.0.0.tar.gz
- Upload date:
- Size: 42.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b2614c1282bb9117b33a24f3ba7ebc168c83a4f9f92d6a20f6594380d2dab65
|
|
| MD5 |
07f7054df0e9dc03cc2c9de019982cfa
|
|
| BLAKE2b-256 |
7cd46cadb0b145f30956576e9fdfd7e88d85d9d489235adecfa27c43d39550b0
|
File details
Details for the file feat_engine-1.0.0-py3-none-any.whl.
File metadata
- Download URL: feat_engine-1.0.0-py3-none-any.whl
- Upload date:
- Size: 54.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba024fa8bedf4c5d15685c33b6efa85b81e278dfb843bbe63a38f1c0c490a8e3
|
|
| MD5 |
4acd85ca2b156a53e0bd74b000386fb3
|
|
| BLAKE2b-256 |
1b554cd2b0cfe5eec2ae495c354ba4f88a7e1b546c1b60175cab0cebaa34153b
|