A collection of reusable machine learning pipeline helpers
Project description
semiq-ml - Machine Learning Workflow Simplifier
Welcome to the semiq-ml documentation. This package provides helper functions and classes to simplify common machine learning workflows, including baseline model training, evaluation, and hyperparameter tuning.
Overview
semiq-ml is designed to:
- Quickly compare multiple machine learning models on your dataset
- Automate hyperparameter tuning with Optuna
- Provide consistent preprocessing and evaluation
- Support both classification and regression tasks
- Handle categorical features correctly, especially for tree-based models
- Offer flexible model selection with 'all', 'trees', or 'gbm' options
Key Components
BaselineModel
The BaselineModel class automates the training and evaluation of multiple ML models, providing:
- Automatic handling of preprocessing (scaling, encoding, imputation)
- Performance comparison across standard algorithms
- Support for common evaluation metrics
- Special handling for boosting libraries (LightGBM, XGBoost, CatBoost)
- Visualization of ROC curves and precision-recall curves
- Flexible model selection with 'all', 'trees', or 'gbm' options
OptunaOptimizer
The OptunaOptimizer class enhances the BaselineModel by adding:
- Efficient hyperparameter tuning with Optuna
- Smart parameter space sampling for all supported models
- Detailed tuning results and best parameter reporting
- Visualization of optimization history and parameter importance
- Flexible control over trials and cross-validation
Getting Started
Please refer to these guides to get started with semiq-ml:
- Installation Guide - Setup instructions and requirements
- Basic Usage Examples - Simple examples to get you started
- API Reference - Complete documentation of all classes and methods
Example Usage
The following example demonstrates a typical semiq-ml workflow:
# Import required libraries
from semiq_ml import BaselineModel
from semiq_ml.tuning import OptunaOptimizer
import pandas as pd
from sklearn.model_selection import train_test_split
# 1. Load your dataset
data = pd.read_csv('your_data.csv')
X = data.drop('target', axis=1) # Features
y = data['target'] # Target variable
# 2. Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# 3. Train and evaluate baseline models
baseline = BaselineModel(
task_type="classification", # Use "regression" for regression tasks
metric="f1_weighted", # Choose an appropriate evaluation metric
models="trees" # Only use tree-based models (options: 'all', 'trees', 'gbm')
)
baseline.fit(X_train, y_train)
results = baseline.get_results()
print(results)
# 4. Tune the best performing model with OptunaOptimizer
best_model_name = results.iloc[0]['model']
tuner = OptunaOptimizer(
task_type="classification",
metric="f1_weighted",
n_trials=20 # Number of parameter combinations to try
)
tuned_model = tuner.tune_model(best_model_name, X_train, y_train)
tuning_results = tuner.get_tuning_results()
print(tuning_results)
For more examples and advanced usage, see the Basic Usage Examples guide.
Support
If you encounter issues or have questions about semiq-ml:
- Bug Reports: Please open an issue with a detailed description of the problem, steps to reproduce it, and your environment details.
- Feature Requests: Submit your ideas through the issue tracker using the "Feature Request" template.
- Questions: For usage questions, reach out via GitHub Discussions
Contributing
We welcome contributions to semiq-ml! Here's how you can help:
- Code Contributions: Fork the repository, create a feature branch, and submit a pull request.
- Documentation: Help improve or translate documentation.
- Bug Reports: Report bugs or suggest features via the issue tracker.
Please review our Contributing Guidelines for more details on code style, testing requirements, and the pull request process.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semiq_ml-0.2.4.tar.gz.
File metadata
- Download URL: semiq_ml-0.2.4.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a449f6b293c712212bf13777bb4a03510ede1efd7f83955f239f91902946fa74
|
|
| MD5 |
3cadb1b103191dc1bc1f6b4b60f8645a
|
|
| BLAKE2b-256 |
9e5cef598f416b7f1a465538098551f35bfe53566deba8cfde53f49c00019c18
|
File details
Details for the file semiq_ml-0.2.4-py3-none-any.whl.
File metadata
- Download URL: semiq_ml-0.2.4-py3-none-any.whl
- Upload date:
- Size: 23.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
358d2b19a2732e6df9e20a94882f9a25df41a20816e172450f9c50e93ced87ab
|
|
| MD5 |
4b5b21c5ea06429f442bd2f5b8ab0ec5
|
|
| BLAKE2b-256 |
c85698954d680b5b7cc486c55c1c3df0d402b8fb3beb02ba23d4ffc4f2768e04
|