Skip to main content

Machine learning-specific feature engineering utilities including models and evaluation tools.

Project description

dsr-feature-eng-ml

Comprehensive machine learning model evaluation and feature engineering framework.

Features

  • Model Evaluation: Automatic hyperparameter tuning and model comparison for Decision Trees, Random Forests, and Logistic Regression
  • Data Balancing: Support for imbalanced dataset handling (upsampling, downsampling, balanced class weights)
  • Feature Importance: Automatic feature selection and importance ranking
  • Data Splitting: Intelligent train/validation/test splitting with automatic feature scaling
  • Result Tracking: Comprehensive model configuration and performance metrics tracking

Installation

pip install dsr-feature-eng-ml

Quick Start

import pandas as pd
from dsr_feature_eng_ml import DataSplits, ModelEvaluation

# Load your data
df = pd.read_csv('data.csv')

# Create data splits (with automatic scaling)
data_splits = DataSplits.from_data_source(
    src=df,
    features_to_include=['feature1', 'feature2', 'feature3'],
    target_column='target',
    test_size=0.2,
    valid_size=0.25,
    random_state=42,
    scale_features=True
)

# Evaluate models
results = ModelEvaluation.evaluate_dataset(
    data_splits=data_splits,
    dtree_param_grid={'max_depth': [5, 10, 20]},
    rf_param_grid={'n_estimators': [50, 100]},
    lr_param_grid={'C': [0.1, 1.0, 10.0]},
    cv=5,
    n_iter=50,
    max_iter=1000,
    scoring='f1',
    n_jobs=-1,
    viable_f1_gap=0.01,
    report_title='Model Evaluation',
    perform_dtree_feature_selection=True,
    perform_rf_feature_selection=True
)

Key Components

DataSplits

Manages train/validation/test splits with automatic feature scaling:

  • Fits scaler on training data only (prevents data leakage)
  • Transforms validation and test sets consistently
  • Supports upsampling and downsampling for class imbalance

ModelEvaluation

Orchestrates comprehensive model evaluation:

  • Evaluates multiple model types in parallel
  • Supports four balancing strategies
  • Tracks best performing models
  • Generates detailed evaluation reports

Model Classes

  • DecisionTree: Decision Tree classifier with feature importance
  • RandomForest: Random Forest classifier with ensemble methods
  • LogisticRegression: Logistic Regression with convergence control

Requirements

  • Python >= 3.9
  • pandas
  • numpy
  • scikit-learn >= 1.0
  • dsr-data-tools
  • dsr-utils

Architecture

The library uses a modular approach:

  • evaluation/: Core evaluation pipeline (DataSplits, ModelEvaluation, ModelResults)
  • models/: Model implementations and hyperparameter tuning
  • enums.py: Enumeration types for model states and configurations
  • constants.py: Global configuration and defaults

License

MIT License - see LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsr_feature_eng_ml-0.0.2.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dsr_feature_eng_ml-0.0.2-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file dsr_feature_eng_ml-0.0.2.tar.gz.

File metadata

  • Download URL: dsr_feature_eng_ml-0.0.2.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for dsr_feature_eng_ml-0.0.2.tar.gz
Algorithm Hash digest
SHA256 8cfe403b6a60b3ffc8d67031d0729a2eff27248a99c13bd8ea0fcce48ddcf0ad
MD5 9733c53f7faa9592dd5cc425de8fc505
BLAKE2b-256 e9c30cc5315f7eb8ccbf2a72a619f56f450ffa7d3f52c6d3e4e870da06ba4d54

See more details on using hashes here.

File details

Details for the file dsr_feature_eng_ml-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for dsr_feature_eng_ml-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cc69538ed36c7f1ac5eac821a7aa42f88231afe0f367ed5bfa4a3681aaeea61f
MD5 cab9b35002486a3eaf8f8bc28b9d1ea5
BLAKE2b-256 6506627bdf8cc0e11d7c39e7716106cade3a57b8690a6fd31d51c27e7ed8ace5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page