Skip to main content

No project description provided

Project description

feat_engine

feat_engine is a comprehensive feature engineering library designed to simplify and streamline feature processing tasks for machine learning models. It offers a wide range of tools for encoding categorical features, handling missing values, scaling and normalizing data, dimensionality reduction, visualizing data, and more.

Features

The feat_engine package provides the following modules to help with various stages of feature engineering:

  • encode_category.py: Encode categorical features with methods like one-hot encoding and label encoding.
  • group_features.py: Group and aggregate features based on categorical or time-based columns.
  • handle_missing_values.py: Handle missing data by filling or dropping missing values and visualizing missing data patterns.
  • handle_outliers.py: Detect and handle outliers using methods like Z-Score, IQR, Isolation Forest, DBSCAN, Winsorization, and more.
  • interact_features.py: Create interaction features, polynomial combinations, and more from existing features.
  • normalize_scaling.py: Apply normalization and scaling techniques including Min-Max scaling, Z-score standardization, and robust scaling.
  • reduce_dimension.py: Reduce the dimensionality of feature sets using methods such as PCA, LDA, t-SNE, UMAP, and autoencoders.
  • target_based_features.py: Create target-based encodings like target mean encoding, smoothed target mean encoding, and cross-validated target encoding.
  • temporal_features.py: Extract and transform time-based features, including creating rolling windows, lag features, and cyclical transformations.
  • transform_features.py: Apply mathematical transformations such as logarithmic, square root, power transformations, and more.
  • visualize_data.py: Visualize datasets using correlation heatmaps, distribution plots, missing value heatmaps, outlier detection, and more.

Installation

You can install the package by cloning the repository and installing dependencies:

git clone https://github.com/your-username/feat_engine.git
cd feat_engine
pip install -r requirements.txt

Usage

Here's a brief overview of how to use some of the key features from feat_engine:

Encoding Categorical Features

from feat_engine.encode_category import CategoryEncoder
import pandas as pd

df = pd.DataFrame({
    'category': ['A', 'B', 'C', 'A', 'B']
})

encoder = CategoryEncoder()
encoded_df = encoder.one_hot_encode(df, 'category')
print(encoded_df)

Handling Missing Values

from feat_engine.handle_missing_values import MissingValueHandler

df = pd.DataFrame({
    'feature1': [1, 2, None, 4],
    'feature2': [None, 2, 3, 4]
})

mv_handler = MissingValueHandler()
filled_df = mv_handler.fill_missing_values(df, method='mean')
print(filled_df)

Scaling Features

from feat_engine.normalize_scaling import FeatureScaler

scaler = FeatureScaler()
scaled_df = scaler.min_max_scale(df, columns=['feature1', 'feature2'])
print(scaled_df)

Reducing Dimensions

from feat_engine.reduce_dimension import DimensionReducer

df = pd.DataFrame({
    'feature1': [1, 2, 3, 4],
    'feature2': [2, 3, 4, 5],
    'feature3': [3, 4, 5, 6]
})

reducer = DimensionReducer()
reduced_df = reducer.pca(df, n_components=2)
print(reduced_df)

Visualizing Data

from feat_engine.visualize_data import DataVisualizer

visualizer = DataVisualizer()
visualizer.plot_correlation_heatmap(df)

Testing

The package includes test cases to ensure functionality. Run tests with:

pytest tests/

Make sure to set the backend for matplotlib to Agg during testing to avoid issues with Tkinter in non-GUI environments.

import matplotlib
matplotlib.use('Agg')

Documentation

For more information on the package, see Read The Docs.

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feat_engine-1.0.1.tar.gz (45.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feat_engine-1.0.1-py3-none-any.whl (57.6 kB view details)

Uploaded Python 3

File details

Details for the file feat_engine-1.0.1.tar.gz.

File metadata

  • Download URL: feat_engine-1.0.1.tar.gz
  • Upload date:
  • Size: 45.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10

File hashes

Hashes for feat_engine-1.0.1.tar.gz
Algorithm Hash digest
SHA256 d2ab973589ba0a072182423bb4bbb79e3ff8092ae5f85a2e6ed53eccc1d408ee
MD5 1e8df7d5eb7cca7c1971c1d88c2d659d
BLAKE2b-256 ee669fe7552c3afb29cf190a0e05a7e2e2a24ac0daa16f8de42974a15a7599c8

See more details on using hashes here.

File details

Details for the file feat_engine-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: feat_engine-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 57.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10

File hashes

Hashes for feat_engine-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 78d7b3d9a0a602ac154e1377c2500ecd0f824f4ce7b63012990a3ef47883db82
MD5 52c38c35d2bd156dc6d8552f15cfa1b1
BLAKE2b-256 902eb4db0585d3b87f3412eb1d72188d5dd5117ba0f6bf41ce7a25bb78d78692

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page