Machine learning-specific feature engineering utilities including models and evaluation tools.
Project description
dsr-feature-eng-ml
Comprehensive machine learning model evaluation and feature engineering framework.
Features
- Model Evaluation: Automatic hyperparameter tuning and model comparison for Decision Trees, Random Forests, and Logistic Regression
- Data Balancing: Support for imbalanced dataset handling (upsampling, downsampling, balanced class weights)
- Feature Importance: Automatic feature selection and importance ranking
- Data Splitting: Intelligent train/validation/test splitting with automatic feature scaling
- Result Tracking: Comprehensive model configuration and performance metrics tracking
Installation
pip install dsr-feature-eng-ml
Quick Start
import pandas as pd
from dsr_feature_eng_ml import DataSplits, ModelEvaluation
# Load your data
df = pd.read_csv('data.csv')
# Create data splits (with automatic scaling)
data_splits = DataSplits.from_data_source(
src=df,
features_to_include=['feature1', 'feature2', 'feature3'],
target_column='target',
test_size=0.2,
valid_size=0.25,
random_state=42,
scale_features=True
)
# Evaluate models
results = ModelEvaluation.evaluate_dataset(
data_splits=data_splits,
dtree_param_grid={'max_depth': [5, 10, 20]},
rf_param_grid={'n_estimators': [50, 100]},
lr_param_grid={'C': [0.1, 1.0, 10.0]},
cv=5,
n_iter=50,
max_iter=1000,
scoring='f1',
n_jobs=-1,
viable_f1_gap=0.01,
report_title='Model Evaluation',
perform_dtree_feature_selection=True,
perform_rf_feature_selection=True
)
Key Components
DataSplits
Manages train/validation/test splits with automatic feature scaling:
- Fits scaler on training data only (prevents data leakage)
- Transforms validation and test sets consistently
- Supports upsampling and downsampling for class imbalance
ModelEvaluation
Orchestrates comprehensive model evaluation:
- Evaluates multiple model types in parallel
- Supports four balancing strategies
- Tracks best performing models
- Generates detailed evaluation reports
Model Classes
- DecisionTree: Decision Tree classifier with feature importance
- RandomForest: Random Forest classifier with ensemble methods
- LogisticRegression: Logistic Regression with convergence control
Requirements
- Python >= 3.9
- pandas
- numpy
- scikit-learn >= 1.0
- dsr-data-tools
- dsr-utils
Architecture
The library uses a modular approach:
evaluation/: Core evaluation pipeline (DataSplits, ModelEvaluation, ModelResults)models/: Model implementations and hyperparameter tuningenums.py: Enumeration types for model states and configurationsconstants.py: Global configuration and defaults
License
MIT License - see LICENSE file for details
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dsr_feature_eng_ml-0.0.2.tar.gz.
File metadata
- Download URL: dsr_feature_eng_ml-0.0.2.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8cfe403b6a60b3ffc8d67031d0729a2eff27248a99c13bd8ea0fcce48ddcf0ad
|
|
| MD5 |
9733c53f7faa9592dd5cc425de8fc505
|
|
| BLAKE2b-256 |
e9c30cc5315f7eb8ccbf2a72a619f56f450ffa7d3f52c6d3e4e870da06ba4d54
|
File details
Details for the file dsr_feature_eng_ml-0.0.2-py3-none-any.whl.
File metadata
- Download URL: dsr_feature_eng_ml-0.0.2-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc69538ed36c7f1ac5eac821a7aa42f88231afe0f367ed5bfa4a3681aaeea61f
|
|
| MD5 |
cab9b35002486a3eaf8f8bc28b9d1ea5
|
|
| BLAKE2b-256 |
6506627bdf8cc0e11d7c39e7716106cade3a57b8690a6fd31d51c27e7ed8ace5
|