Machine learning-specific feature engineering utilities including models and evaluation tools.
Project description
dsr-feature-eng-ml
Comprehensive machine learning model evaluation and feature engineering framework.
Version 1.0.0: This release is breaking and not backward-compatible with prior 0.x versions.
Release scope: Regression workflows have been tested. Classification workflows are implemented but not yet tested; a follow-up release will expand validation and coverage.
Features
- Model Evaluation: Automatic hyperparameter tuning and model comparison for Decision Trees, Random Forests, and Logistic Regression
- Data Balancing: Support for imbalanced dataset handling (upsampling, downsampling, balanced class weights)
- Feature Importance: Automatic feature selection and importance ranking
- Data Splitting: Intelligent train/validation/test splitting with automatic feature scaling
- Result Tracking: Comprehensive model configuration and performance metrics tracking
Installation
pip install dsr-feature-eng-ml
Quick Start
import pandas as pd
from dsr_feature_eng_ml import DataSplits, ModelEvaluation
# Load your data
df = pd.read_csv('data.csv')
# Create data splits (with automatic scaling)
data_splits = DataSplits.from_data_source(
src=df,
features_to_include=['feature1', 'feature2', 'feature3'],
target_column='target',
test_size=0.2,
valid_size=0.25,
random_state=42,
scale_features=True
)
# Evaluate models
results = ModelEvaluation.evaluate_dataset(
data_splits=data_splits,
dtree_param_grid={'max_depth': [5, 10, 20]},
rf_param_grid={'n_estimators': [50, 100]},
lr_param_grid={'C': [0.1, 1.0, 10.0]},
cv=5,
n_iter=50,
max_iter=1000,
scoring='f1',
n_jobs=-1,
viable_f1_gap=0.01,
report_title='Model Evaluation',
perform_dtree_feature_selection=True,
perform_rf_feature_selection=True
)
Key Components
DataSplits
Manages train/validation/test splits with automatic feature scaling:
- Fits scaler on training data only (prevents data leakage)
- Transforms validation and test sets consistently
- Supports upsampling and downsampling for class imbalance
ModelEvaluation
Orchestrates comprehensive model evaluation:
- Evaluates multiple model types in parallel
- Supports four balancing strategies
- Tracks best performing models
- Generates detailed evaluation reports
Model Classes
- DecisionTree: Decision Tree classifier with feature importance
- RandomForest: Random Forest classifier with ensemble methods
- LogisticRegression: Logistic Regression with convergence control
Requirements
- Python >= 3.10
- pandas
- numpy
- scikit-learn >= 1.5.0
- seaborn >= 0.13.0
- dsr-data-tools >= 1.0.0
- dsr-utils >= 1.0.0
Architecture
The library uses a modular approach:
evaluation/: Core evaluation pipeline (DataSplits, ModelEvaluation, ModelResults)models/: Model implementations and hyperparameter tuningenums.py: Enumeration types for model states and configurationsconstants.py: Global configuration and defaults
Preferences and Overrides
You can override library defaults (like constants used in evaluation and reporting) without changing code in the library.
Precedence (highest to lowest)
- Runtime override via
set_pref() - Environment variables prefixed with
DSR_FEML_ - User config file in
~/.config/dsr-feature-eng-ml/config.tomlor~/Library/Application Support/dsr-feature-eng-ml/config.toml - Project-level
./dsr_feature_eng_ml.toml - In-library default value
Examples
- Runtime (Python):
from dsr_feature_eng_ml import set_pref set_pref("REPORT_WIDTH", 120) set_pref("SCORE_FORMAT", ".3f")
- Environment (shell):
export DSR_FEML_REPORT_WIDTH=120 export DSR_FEML_SCORE_FORMAT=.3f export DSR_FEML_DEFAULT_ACCEPTABLE_GAP=0.03
- Config file (TOML):
[constants] REPORT_WIDTH = 120 SCORE_FORMAT = ".3f" DEFAULT_ACCEPTABLE_GAP = 0.03
How it works
constants.pydefines defaults and resolves effective values through the preferences system:from dsr_feature_eng_ml.preferences import resolve_constant SCORE_FORMAT = resolve_constant("SCORE_FORMAT", ".4f") REPORT_WIDTH = resolve_constant("REPORT_WIDTH", 100)
- Most code should continue to import these constants (e.g.,
from dsr_feature_eng_ml import REPORT_WIDTH).
Should I call resolve_constant() directly?
- No for typical usage: import constants as usual, they already reflect preferences at import time.
- Yes if you need late-binding (e.g., react to
set_pref()after modules are imported). In that case, callget_pref("REPORT_WIDTH", 100)orresolve_constant("REPORT_WIDTH", 100)where you need the value.
This keeps defaults centralized while giving users clean override hooks at runtime, via environment, or via config files.
License
MIT License - see LICENSE file for details
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dsr_feature_eng_ml-1.0.0.tar.gz.
File metadata
- Download URL: dsr_feature_eng_ml-1.0.0.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1bff5a1b00e994c064be42c649b710940892c72b500e102a8d486bc8df511be3
|
|
| MD5 |
62470a0d02d3eec44d8088223f43e97d
|
|
| BLAKE2b-256 |
a524fd6358d827dc3b24d896209111bc00bd8e308e888b31cdda9ce5df06e750
|
Provenance
The following attestation bundles were made for dsr_feature_eng_ml-1.0.0.tar.gz:
Publisher:
python-publish.yml on scottroberts140/dsr-feature-eng-ml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dsr_feature_eng_ml-1.0.0.tar.gz -
Subject digest:
1bff5a1b00e994c064be42c649b710940892c72b500e102a8d486bc8df511be3 - Sigstore transparency entry: 929593339
- Sigstore integration time:
-
Permalink:
scottroberts140/dsr-feature-eng-ml@8bee377e8992c8856514621e4304a622b8334c92 -
Branch / Tag:
refs/tags/v1.0.2 - Owner: https://github.com/scottroberts140
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@8bee377e8992c8856514621e4304a622b8334c92 -
Trigger Event:
release
-
Statement type:
File details
Details for the file dsr_feature_eng_ml-1.0.0-py3-none-any.whl.
File metadata
- Download URL: dsr_feature_eng_ml-1.0.0-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81def7f6e6a6d7955caeb1eafe43f296e2032f44927f475476554ffd9ff0af86
|
|
| MD5 |
e3bb9120f3561fd8ef853bff70cc776d
|
|
| BLAKE2b-256 |
2687aec0f4e1edf745f1bab24634032ffe9ce2fcd56f915801c8afc2feb61f2d
|
Provenance
The following attestation bundles were made for dsr_feature_eng_ml-1.0.0-py3-none-any.whl:
Publisher:
python-publish.yml on scottroberts140/dsr-feature-eng-ml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dsr_feature_eng_ml-1.0.0-py3-none-any.whl -
Subject digest:
81def7f6e6a6d7955caeb1eafe43f296e2032f44927f475476554ffd9ff0af86 - Sigstore transparency entry: 929593345
- Sigstore integration time:
-
Permalink:
scottroberts140/dsr-feature-eng-ml@8bee377e8992c8856514621e4304a622b8334c92 -
Branch / Tag:
refs/tags/v1.0.2 - Owner: https://github.com/scottroberts140
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@8bee377e8992c8856514621e4304a622b8334c92 -
Trigger Event:
release
-
Statement type: