Machine learning-specific feature engineering utilities including models and evaluation tools.

These details have not been verified by PyPI

Project description

dsr-feature-eng-ml

Comprehensive machine learning model evaluation and feature engineering framework.

Version 1.0.0: This release is breaking and not backward-compatible with prior 0.x versions.

Release scope: Regression workflows have been tested. Classification workflows are implemented but not yet tested; a follow-up release will expand validation and coverage.

Features

Model Evaluation: Automatic hyperparameter tuning and model comparison for Decision Trees, Random Forests, and Logistic Regression
Data Balancing: Support for imbalanced dataset handling (upsampling, downsampling, balanced class weights)
Feature Importance: Automatic feature selection and importance ranking
Data Splitting: Intelligent train/validation/test splitting with automatic feature scaling
Result Tracking: Comprehensive model configuration and performance metrics tracking

Installation

pip install dsr-feature-eng-ml

Quick Start

import pandas as pd
from dsr_feature_eng_ml import DataSplits, ModelEvaluation

# Load your data
df = pd.read_csv('data.csv')

# Create data splits (with automatic scaling)
data_splits = DataSplits.from_data_source(
    src=df,
    features_to_include=['feature1', 'feature2', 'feature3'],
    target_column='target',
    test_size=0.2,
    valid_size=0.25,
    random_state=42,
    scale_features=True
)

# Evaluate models
results = ModelEvaluation.evaluate_dataset(
    data_splits=data_splits,
    dtree_param_grid={'max_depth': [5, 10, 20]},
    rf_param_grid={'n_estimators': [50, 100]},
    lr_param_grid={'C': [0.1, 1.0, 10.0]},
    cv=5,
    n_iter=50,
    max_iter=1000,
    scoring='f1',
    n_jobs=-1,
    viable_f1_gap=0.01,
    report_title='Model Evaluation',
    perform_dtree_feature_selection=True,
    perform_rf_feature_selection=True
)

Key Components

DataSplits

Manages train/validation/test splits with automatic feature scaling:

Fits scaler on training data only (prevents data leakage)
Transforms validation and test sets consistently
Supports upsampling and downsampling for class imbalance

ModelEvaluation

Orchestrates comprehensive model evaluation:

Evaluates multiple model types in parallel
Supports four balancing strategies
Tracks best performing models
Generates detailed evaluation reports

Model Classes

DecisionTree: Decision Tree classifier with feature importance
RandomForest: Random Forest classifier with ensemble methods
LogisticRegression: Logistic Regression with convergence control

Requirements

Python >= 3.10
pandas
numpy
scikit-learn >= 1.5.0
seaborn >= 0.13.0
dsr-data-tools >= 1.0.0
dsr-utils >= 1.0.0

Architecture

The library uses a modular approach:

evaluation/: Core evaluation pipeline (DataSplits, ModelEvaluation, ModelResults)
models/: Model implementations and hyperparameter tuning
enums.py: Enumeration types for model states and configurations
constants.py: Global configuration and defaults

Preferences and Overrides

You can override library defaults (like constants used in evaluation and reporting) without changing code in the library.

Precedence (highest to lowest)

Runtime override via set_pref()
Environment variables prefixed with DSR_FEML_
User config file in ~/.config/dsr-feature-eng-ml/config.toml or ~/Library/Application Support/dsr-feature-eng-ml/config.toml
Project-level ./dsr_feature_eng_ml.toml
In-library default value

Examples

Runtime (Python):

from dsr_feature_eng_ml import set_pref
set_pref("REPORT_WIDTH", 120)
set_pref("SCORE_FORMAT", ".3f")

Environment (shell):

export DSR_FEML_REPORT_WIDTH=120
export DSR_FEML_SCORE_FORMAT=.3f
export DSR_FEML_DEFAULT_ACCEPTABLE_GAP=0.03

Config file (TOML):

[constants]
REPORT_WIDTH = 120
SCORE_FORMAT = ".3f"
DEFAULT_ACCEPTABLE_GAP = 0.03

How it works

constants.py defines defaults and resolves effective values through the preferences system:

from dsr_feature_eng_ml.preferences import resolve_constant
SCORE_FORMAT = resolve_constant("SCORE_FORMAT", ".4f")
REPORT_WIDTH = resolve_constant("REPORT_WIDTH", 100)

Most code should continue to import these constants (e.g., from dsr_feature_eng_ml import REPORT_WIDTH).

Should I call resolve_constant() directly?

No for typical usage: import constants as usual, they already reflect preferences at import time.
Yes if you need late-binding (e.g., react to set_pref() after modules are imported). In that case, call get_pref("REPORT_WIDTH", 100) or resolve_constant("REPORT_WIDTH", 100) where you need the value.

This keeps defaults centralized while giving users clean override hooks at runtime, via environment, or via config files.

License

MIT License - see LICENSE file for details

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.2.3

Apr 11, 2026

1.2.2

Apr 11, 2026

1.2.1

Apr 11, 2026

1.2.0

Apr 11, 2026

1.1.0

Feb 10, 2026

This version

1.0.0

Feb 9, 2026

0.0.2

Dec 19, 2025

0.0.1

Dec 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsr_feature_eng_ml-1.0.0.tar.gz (15.7 kB view details)

Uploaded Feb 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dsr_feature_eng_ml-1.0.0-py3-none-any.whl (12.9 kB view details)

Uploaded Feb 9, 2026 Python 3

File details

Details for the file dsr_feature_eng_ml-1.0.0.tar.gz.

File metadata

Download URL: dsr_feature_eng_ml-1.0.0.tar.gz
Upload date: Feb 9, 2026
Size: 15.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dsr_feature_eng_ml-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`1bff5a1b00e994c064be42c649b710940892c72b500e102a8d486bc8df511be3`
MD5	`62470a0d02d3eec44d8088223f43e97d`
BLAKE2b-256	`a524fd6358d827dc3b24d896209111bc00bd8e308e888b31cdda9ce5df06e750`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dsr_feature_eng_ml-1.0.0.tar.gz:

Publisher: python-publish.yml on scottroberts140/dsr-feature-eng-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dsr_feature_eng_ml-1.0.0.tar.gz
- Subject digest: 1bff5a1b00e994c064be42c649b710940892c72b500e102a8d486bc8df511be3
- Sigstore transparency entry: 929593339
- Sigstore integration time: Feb 9, 2026
Source repository:
- Permalink: scottroberts140/dsr-feature-eng-ml@8bee377e8992c8856514621e4304a622b8334c92
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/scottroberts140
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@8bee377e8992c8856514621e4304a622b8334c92
- Trigger Event: release

File details

Details for the file dsr_feature_eng_ml-1.0.0-py3-none-any.whl.

File metadata

Download URL: dsr_feature_eng_ml-1.0.0-py3-none-any.whl
Upload date: Feb 9, 2026
Size: 12.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dsr_feature_eng_ml-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`81def7f6e6a6d7955caeb1eafe43f296e2032f44927f475476554ffd9ff0af86`
MD5	`e3bb9120f3561fd8ef853bff70cc776d`
BLAKE2b-256	`2687aec0f4e1edf745f1bab24634032ffe9ce2fcd56f915801c8afc2feb61f2d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dsr_feature_eng_ml-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on scottroberts140/dsr-feature-eng-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dsr_feature_eng_ml-1.0.0-py3-none-any.whl
- Subject digest: 81def7f6e6a6d7955caeb1eafe43f296e2032f44927f475476554ffd9ff0af86
- Sigstore transparency entry: 929593345
- Sigstore integration time: Feb 9, 2026
Source repository:
- Permalink: scottroberts140/dsr-feature-eng-ml@8bee377e8992c8856514621e4304a622b8334c92
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/scottroberts140
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@8bee377e8992c8856514621e4304a622b8334c92
- Trigger Event: release

dsr-feature-eng-ml 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

dsr-feature-eng-ml

Features

Installation

Quick Start

Key Components

DataSplits

ModelEvaluation

Model Classes

Requirements

Architecture

Preferences and Overrides

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance