Skip to main content

Guided baseline ML workflows (regression / classification / clustering) for engineers

Project description

EnginML

EnginML Logo

An educational Python package designed to help engineers and students learn and apply baseline machine learning workflows (regression, classification, clustering) even with minimal programming experience. This package reproduces the tutorial in AI in Civil Engineering (2025) by M. Z. Naser.

Overview

This educational package is designed to make machine learning accessible to engineers and students with minimal programming experience. It provides:

  • Simple, one-line functions for common ML tasks
  • Automatic data loading from CSV and Excel files
  • Built-in model training and evaluation
  • Visualization of results and model explanations
  • HTML report generation
  • Command-line interface for easy use

Installation

# Basic installation
pip install EnginML

# With all optional dependencies
pip install EnginML[full]

Quick Start

Command Line Usage

The simplest way to use EnginML is through the command line:

# For regression
enginml your_data.csv --task regression --target your_target_column

# For classification
enginml your_data.csv --task classification --target your_target_column

# For clustering
enginml your_data.csv --task clustering --n-clusters 3

This will automatically:

  1. Load your data
  2. Train an appropriate model
  3. Evaluate the model
  4. Generate an HTML report with visualizations

Python API Usage

import pandas as pd
from EnginML import fit_regression, save_report

# Load your data
df = pd.read_csv('your_data.csv')

# Prepare features and target
X = df.drop(columns=['target_column']).values
y = df['target_column'].values

# Fit a regression model
result = fit_regression(X, y, model='random_forest')

# Print metrics
print(result['metrics'])

# Save HTML report
save_report(result, X, y, feature_names=df.columns[:-1])

Features

Supported Tasks

  • Regression: Predict continuous values

    • Models: Random Forest, K-Nearest Neighbors
    • Metrics: R², MAE
  • Classification: Predict categorical values

    • Models: Random Forest, K-Nearest Neighbors
    • Metrics: Accuracy, F1 Score
  • Clustering: Group similar data points

    • Models: K-Means, BIRCH, Gaussian Mixture
    • Metrics: Silhouette Score, Davies-Bouldin Index

Explainability

The package includes SHAP (SHapley Additive exPlanations) integration for model interpretability, helping engineers understand which features are most important for predictions.

Visualization

Automatic generation of relevant plots for each task type:

  • Regression: Actual vs. Predicted, Residuals, Feature Importance
  • Classification: Feature Importance, SHAP Summary
  • Clustering: Cluster Assignments

Advanced Usage

Customizing Models

from EnginML import fit_classification

# Use K-Nearest Neighbors instead of Random Forest
result = fit_classification(X, y, model='knn')

Disabling SHAP Explanations

result = fit_regression(X, y, explain=False)

Custom Report Path

save_report(result, X, y, output_path='custom_path/my_report.html')

Future Features & Enhancements

We're always thinking about how to make EnginML even more helpful for engineers learning ML! Here are some ideas we're exploring:

  • More Model Options: Add other popular and easy-to-understand models (e.g., Linear Regression, Logistic Regression, Decision Trees).
  • Data Preprocessing Guidance: Include optional, guided steps for common preprocessing tasks like handling missing values or scaling features.
  • Interactive Visualizations: Enhance reports with more interactive plots (e.g., using Plotly or Bokeh).
  • Hyperparameter Tuning Basics: Introduce a simple way to experiment with basic hyperparameter tuning for selected models.
  • Time Series Forecasting: Add a basic module for introductory time series analysis and forecasting tasks.
  • Anomaly Detection: Include simple methods for identifying outliers or anomalies in datasets.
  • Model Comparison: Allow users to easily train and compare the performance of multiple models on the same dataset.
  • Code Generation Snippets: Offer snippets of the underlying scikit-learn code used, helping users transition to more direct library usage.
  • Expanded Documentation & Tutorials: Add more examples and detailed explanations for different engineering domains.

Requirements

  • Python 3.9+
  • NumPy
  • pandas
  • scikit-learn
  • matplotlib

Optional Dependencies

  • SHAP (for model explanations)
  • Jinja2 (for HTML reports)
  • XGBoost (for additional models)

License

MIT

Citation

This package is created by Amir Rafe (amir.rafe@usu.edu) based on the paper:

Naser, M. Z. (2025). A step-by-step tutorial on machine learning for engineers unfamiliar with programming. AI in Civil Engineering. https://doi.org/10.1007/s43503-025-00053-x

If you use this package in your research, please cite the original paper.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

enginml-0.1.3.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

enginml-0.1.3-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file enginml-0.1.3.tar.gz.

File metadata

  • Download URL: enginml-0.1.3.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for enginml-0.1.3.tar.gz
Algorithm Hash digest
SHA256 652e17404c44f6487e9a2cd9534467eb369b55a369c74102d33623bbf2316d12
MD5 6d52abfca343d181b7c81f3cb1cc8565
BLAKE2b-256 6b206ab408caa09f29d7060824b4f289457a904577afb77d2b56377c206dbe4c

See more details on using hashes here.

File details

Details for the file enginml-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: enginml-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for enginml-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 cc594a7d089a9a52da5ddfac2098f7b7a818587aa0d64d9bca7554ded568a0d7
MD5 ade362216fbc3670799c1ec38ce85454
BLAKE2b-256 39035cccbb1cdf85773f6b31520469b142357fd09197d74767eaf7363212b8dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page