Guided baseline ML workflows (regression / classification / clustering) for engineers
Project description
EnginML
An educational Python package designed to help engineers and students learn and apply baseline machine learning workflows (regression, classification, clustering) even with minimal programming experience. This package reproduces the tutorial in AI in Civil Engineering (2025) by M. Z. Naser.
Overview
This educational package is designed to make machine learning accessible to engineers and students with minimal programming experience. It provides:
- Simple, one-line functions for common ML tasks
- Automatic data loading from CSV and Excel files
- Built-in model training and evaluation
- Visualization of results and model explanations
- HTML report generation
- Command-line interface for easy use
Installation
# Basic installation
pip install EnginML
# With all optional dependencies
pip install EnginML[full]
Quick Start
Command Line Usage
The simplest way to use EnginML is through the command line:
# For regression
enginml your_data.csv --task regression --target your_target_column
# For classification
enginml your_data.csv --task classification --target your_target_column
# For clustering
enginml your_data.csv --task clustering --n-clusters 3
This will automatically:
- Load your data
- Train an appropriate model
- Evaluate the model
- Generate an HTML report with visualizations
Python API Usage
import pandas as pd
from EnginML import fit_regression, save_report
# Load your data
df = pd.read_csv('your_data.csv')
# Prepare features and target
X = df.drop(columns=['target_column']).values
y = df['target_column'].values
# Fit a regression model
result = fit_regression(X, y, model='random_forest')
# Print metrics
print(result['metrics'])
# Save HTML report
save_report(result, X, y, feature_names=df.columns[:-1])
Features
Supported Tasks
-
Regression: Predict continuous values
- Models: Random Forest, K-Nearest Neighbors
- Metrics: R², MAE
-
Classification: Predict categorical values
- Models: Random Forest, K-Nearest Neighbors
- Metrics: Accuracy, F1 Score
-
Clustering: Group similar data points
- Models: K-Means, BIRCH, Gaussian Mixture
- Metrics: Silhouette Score, Davies-Bouldin Index
Explainability
The package includes SHAP (SHapley Additive exPlanations) integration for model interpretability, helping engineers understand which features are most important for predictions.
Visualization
Automatic generation of relevant plots for each task type:
- Regression: Actual vs. Predicted, Residuals, Feature Importance
- Classification: Feature Importance, SHAP Summary
- Clustering: Cluster Assignments
Advanced Usage
Customizing Models
from EnginML import fit_classification
# Use K-Nearest Neighbors instead of Random Forest
result = fit_classification(X, y, model='knn')
Disabling SHAP Explanations
result = fit_regression(X, y, explain=False)
Custom Report Path
save_report(result, X, y, output_path='custom_path/my_report.html')
Future Features & Enhancements
We're always thinking about how to make EnginML even more helpful for engineers learning ML! Here are some ideas we're exploring:
- More Model Options: Add other popular and easy-to-understand models (e.g., Linear Regression, Logistic Regression, Decision Trees).
- Data Preprocessing Guidance: Include optional, guided steps for common preprocessing tasks like handling missing values or scaling features.
- Interactive Visualizations: Enhance reports with more interactive plots (e.g., using Plotly or Bokeh).
- Hyperparameter Tuning Basics: Introduce a simple way to experiment with basic hyperparameter tuning for selected models.
- Time Series Forecasting: Add a basic module for introductory time series analysis and forecasting tasks.
- Anomaly Detection: Include simple methods for identifying outliers or anomalies in datasets.
- Model Comparison: Allow users to easily train and compare the performance of multiple models on the same dataset.
- Code Generation Snippets: Offer snippets of the underlying scikit-learn code used, helping users transition to more direct library usage.
- Expanded Documentation & Tutorials: Add more examples and detailed explanations for different engineering domains.
Requirements
- Python 3.9+
- NumPy
- pandas
- scikit-learn
- matplotlib
Optional Dependencies
- SHAP (for model explanations)
- Jinja2 (for HTML reports)
- XGBoost (for additional models)
License
MIT
Citation
This package is created by Amir Rafe (amir.rafe@usu.edu) based on the paper:
Naser, M. Z. (2025). A step-by-step tutorial on machine learning for engineers unfamiliar with programming. AI in Civil Engineering. https://doi.org/10.1007/s43503-025-00053-x
If you use this package in your research, please cite the original paper.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file enginml-0.1.3.tar.gz.
File metadata
- Download URL: enginml-0.1.3.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
652e17404c44f6487e9a2cd9534467eb369b55a369c74102d33623bbf2316d12
|
|
| MD5 |
6d52abfca343d181b7c81f3cb1cc8565
|
|
| BLAKE2b-256 |
6b206ab408caa09f29d7060824b4f289457a904577afb77d2b56377c206dbe4c
|
File details
Details for the file enginml-0.1.3-py3-none-any.whl.
File metadata
- Download URL: enginml-0.1.3-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc594a7d089a9a52da5ddfac2098f7b7a818587aa0d64d9bca7554ded568a0d7
|
|
| MD5 |
ade362216fbc3670799c1ec38ce85454
|
|
| BLAKE2b-256 |
39035cccbb1cdf85773f6b31520469b142357fd09197d74767eaf7363212b8dd
|