No project description provided

These details have not been verified by PyPI

Project description

Feature-Gen: A Robust Feature Engineering Framework

Feature-Gen is a Python-based library designed to simplify and optimize feature engineering for classification tasks. By integrating genetic algorithms, ensemble learning, and advanced feature transformations, Feature-Gen enables the discovery of feature subsets that maximize model performance while ensuring interpretability. It supports efficient processing through multithreading and multiprocessing, making it scalable for large datasets.

Key Features

Automated Feature Engineering: Automatically identifies and optimizes feature subsets for classification tasks.
Advanced Transformations: Includes transformations like logarithmic, square, cubic, sigmoid, and tanh to uncover complex, non-linear relationships.
Multi-objective Optimization: Leverages the NSGA-II genetic algorithm to optimize both classification accuracy and feature subset size.
Ensemble Learning Integration: Combines Logistic Regression, SVM, and XGBoost to ensure diverse model perspectives.
Flexible Ensemble Methods: Supports strategies like Majority Voting, Weighted Averaging, and Greedy Selection for robust feature evaluation.
Scalable Architecture: Uses multithreading and multiprocessing to handle large datasets efficiently.
Extensive Validation: Tested on over 100 datasets, demonstrating robustness and adaptability across domains.

Installation

Install the library directly from PyPI:

pip install feature-gen

Getting Started

Example Usage

The following example demonstrates how to use Feature-Gen to perform feature engineering:

# Example Dataset
import pandas as pd
from feature_gen.feature_gen_master import FeatureGenMaster
from feature_gen.implementation.constants import EnsembleMethod
from sklearn.datasets import load_wine

# Load and prepare dataset
data = load_wine()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Show the current features
print(df.columns)

# Initialize FeatureGenMaster
f_g = FeatureGenMaster(df, 'target')

# Start the feature engineering process
f_g.start(
    ensemble_methods=[EnsembleMethod.GREEDY, EnsembleMethod.WEIGHTED_AVERAGING]
)

# Retrieve results
print("Best New Features:", f_g.get_best_new_features())
print("Best Original Features:", f_g.get_best_original_features())
print("All Ensemble Methods Scores:", f_g.get_all_ensemble_methods_scores())

The following example demonstrates how to use Feature-Gen to perform feature engineering with full control over the library

import pandas as pd
from sklearn.datasets import load_wine

from feature_gen.feature_gen_master import FeatureGenMaster
from feature_gen.implementation.constants import EnsembleMethod

all_ensemble_methods = [
    EnsembleMethod.GREEDY,
    EnsembleMethod.WEIGHTED_MAJORITY_VOTING
]

# Load the Iris dataset
data = load_wine()

# Create a DataFrame with the features and target
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Show the current features
print(df.columns)

f_g = FeatureGenMaster(df, 'target', min_number_of_target_unique_values=20)

f_g.start(
    ensemble_methods=all_ensemble_methods,
    random_state=42,
    max_iter=50,
    C=1e5,
    solver='liblinear',
    gamma=1,
    n_components=100,
    sgd_loss='hinge',
    sgd_max_iter=1000,
    sgd_tol=1e-2,
    xgb_n_estimators=100,
    generations_num=2,
    bootstrap_samples_count=1,
    first_population_size=4
)

print('Best new features', f_g.get_best_new_features())
print('Best original features', f_g.get_best_original_features())
print('Best all features', f_g.get_all_best_features())
print("All ensemble methods scores", f_g.get_all_ensemble_methods_scores())

Framework Architecture

1. Micro-Step Genetic Algorithm

Bootstrap Sampling: Generates three independent bootstrap samples to ensure robustness and diversity.
Population Initialization: Creates a population of binary chromosomes representing feature subsets.
Evaluation Metrics:
- Maximizes the F1 score of an ensemble model (Logistic Regression, SVM, XGBoost).
- Minimizes the number of selected features for interpretability.
Genetic Operations:
- Selection: Binary tournament selection to choose the best chromosomes.
- Crossover: Uniform crossover for generating offspring.
- Mutation: Flip-bit mutation to introduce diversity.
- Population Update: Combines parents and offspring using NSGA-II for multi-objective optimization.

2. Macro-Step Genetic Algorithm

Feature Aggregation: Combines feature subsets from the micro-step using union logic.
Global Optimization: Refines the macro-feature set using NSGA-II.
Final Feature Set: Outputs an optimal feature set balancing accuracy and interpretability.

Core Features and Functionality

Multithreading and Multiprocessing:
- Uses multithreading for concurrent evaluations and multiprocessing for parallelizing resource-intensive tasks.
- Ensures scalability and efficient execution for large datasets.
Built-in Ensemble Methods:
- Supports flexible aggregation strategies like Majority Voting, Weighted Averaging, and Greedy Selection.
Advanced Feature Transformations:
- Includes transformations such as logarithmic, sigmoid, and tanh to capture non-linear relationships.
Extensive Validation:
- Tested on over 100 datasets, ensuring robustness and reliability.

Strengths

Robust Optimization: Balances competing objectives through the micro-macro genetic algorithm.
Integration of Transformations: Enhances predictive performance by uncovering non-linear relationships.
Generalizability: Ensures applicability across linear, boundary-based, and non-linear problems.
Interpretability: Achieves significant feature set reductions without compromising accuracy.

Future Directions

Scalability Enhancements: Expand support for distributed systems to handle even larger datasets.
Dynamic Transformation Framework: Introduce dataset-specific transformation selection for enhanced adaptability.
Additional Ensemble Methods: Integrate more aggregation strategies to improve robustness and flexibility.
User Interface: Develop visualization tools for better insights into feature engineering results.

Resources

Documentation: Available on PyPI: Feature-Gen Documentation
Source Code: Hosted on your development repository.

Contributing

Contributions are welcome! For major changes, please open an issue to discuss proposed updates. Ensure all pull requests align with the project's goals and include relevant tests.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.4

Jan 31, 2025

0.1.3

Jan 30, 2025

0.1.2

Dec 20, 2024

0.1.1

Dec 20, 2024

0.1.0

Dec 19, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_gen-0.1.4.tar.gz (14.6 kB view details)

Uploaded Jan 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

feature_gen-0.1.4-py3-none-any.whl (14.8 kB view details)

Uploaded Jan 31, 2025 Python 3

File details

Details for the file feature_gen-0.1.4.tar.gz.

File metadata

Download URL: feature_gen-0.1.4.tar.gz
Upload date: Jan 31, 2025
Size: 14.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for feature_gen-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`5af135e8488c06bd2f4fcef65a5ec6fef9493fac3c2a96b717d7b1e1956713a3`
MD5	`3162a5d156bbc86a7045cb40c2b10061`
BLAKE2b-256	`99a8df02ae61830d8cce4fed5bb0f4b39faa91f38420a6d7b0e4ccf90a137124`

See more details on using hashes here.

File details

Details for the file feature_gen-0.1.4-py3-none-any.whl.

File metadata

Download URL: feature_gen-0.1.4-py3-none-any.whl
Upload date: Jan 31, 2025
Size: 14.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for feature_gen-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a792dea31fa9b00e0f2dcfb55998fac2fbb1a3ad1130f3a4752f12a1d588197a`
MD5	`36ab9079527768f7718d8e3a8932c127`
BLAKE2b-256	`ff7c5a846dbc0498cfeb360dceff6c1754ffbc262af167b708e432e2e3499b8c`

See more details on using hashes here.

feature-gen 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Feature-Gen: A Robust Feature Engineering Framework

Key Features

Installation

Getting Started

Example Usage

Framework Architecture

1. Micro-Step Genetic Algorithm

2. Macro-Step Genetic Algorithm

Core Features and Functionality

Strengths

Future Directions

Resources

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes