A collection of reusable machine learning pipeline helpers

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.12

Project description

semiq-ml - Machine Learning Workflow Simplifier

Welcome to the semiq-ml documentation. This package provides helper functions and classes to simplify common machine learning workflows, including baseline model training, evaluation, hyperparameter tuning, and image processing.

Overview

semiq-ml is designed to:

Quickly compare multiple machine learning models on your dataset
Automate hyperparameter tuning with Optuna
Provide consistent preprocessing and evaluation
Support both classification and regression tasks
Handle categorical features correctly, especially for tree-based models
Offer flexible model selection with 'all', 'trees', or 'gbm' options
Simplify image dataset preparation for computer vision tasks

Key Components

BaselineModel

The BaselineModel class automates the training and evaluation of multiple ML models, providing:

Automatic handling of preprocessing (scaling, encoding, imputation)
Performance comparison across standard algorithms
Support for common evaluation metrics
Special handling for boosting libraries (LightGBM, XGBoost, CatBoost)
Visualization of ROC curves and precision-recall curves
Flexible model selection with 'all', 'trees', or 'gbm' options

OptunaOptimizer

The OptunaOptimizer class enhances the BaselineModel by adding:

Efficient hyperparameter tuning with Optuna
Smart parameter space sampling for all supported models
Detailed tuning results and best parameter reporting
Visualization of optimization history and parameter importance
Flexible control over trials and cross-validation
GPU acceleration for XGBoost, LightGBM, and CatBoost (set gpu=True)

GPU

If you have a compatible GPU and the required libraries installed, you can enable GPU acceleration for supported models (XGBoost, LightGBM, CatBoost) by passing gpu=True to the OptunaOptimizer constructor. This will automatically inject the correct GPU parameters for each library during hyperparameter search.

Example:

from semiq_ml.tuning import OptunaOptimizer

tuner = OptunaOptimizer(
    task_type="classification",
    metric="f1_weighted",
    n_trials=20,
    gpu=True  # Enable GPU acceleration for supported models
)

If gpu=True, the following parameters are set automatically:

XGBoost: tree_method='gpu_hist', predictor='gpu_predictor'
LightGBM: device='gpu', gpu_platform_id=0, gpu_device_id=0
CatBoost: task_type='GPU', devices='0'

Image Module

The image module provides utilities for working with image datasets:

Easy scanning of directory structures to create image DataFrames
Automatic label inference from directory hierarchies
Convenient image loading with resizing, normalization and format conversion
Batch image loading from DataFrames with detailed control over transformations
Image visualization tools for single images or batches, with optional label and prediction display
Image sampling utilities for exploring large datasets

Getting Started

Please refer to these guides to get started with semiq-ml:

Installation Guide - Setup instructions and requirements
Basic Usage Examples - Simple examples to get you started
API Reference - Complete documentation of all classes and methods

Example Usage

The following example demonstrates a typical semiq-ml workflow:

# Import required libraries
from semiq_ml import BaselineModel
from semiq_ml.tuning import OptunaOptimizer
import pandas as pd
from sklearn.model_selection import train_test_split

# 1. Load your dataset
data = pd.read_csv('your_data.csv')
X = data.drop('target', axis=1)  # Features
y = data['target']               # Target variable

# 2. Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 3. Train and evaluate baseline models
baseline = BaselineModel(
    task_type="classification",  # Use "regression" for regression tasks
    metric="f1_weighted",        # Choose an appropriate evaluation metric
    models="trees"               # Only use tree-based models (options: 'all', 'trees', 'gbm')
)
baseline.fit(X_train, y_train)
results = baseline.get_results()
print(results)

# 4. Tune the best performing model with OptunaOptimizer
best_model_name = results.iloc[0]['model']
tuner = OptunaOptimizer(
    task_type="classification", 
    metric="f1_weighted",
    n_trials=20,                 # Number of parameter combinations to try
    gpu=True                     # Enable GPU acceleration for supported models
)
tuned_model = tuner.tune_model(best_model_name, X_train, y_train)
tuning_results = tuner.get_tuning_results()
print(tuning_results)

Image Processing Example

# Import the image module
import semiq_ml.image as img_utils
import matplotlib.pyplot as plt

# Create a DataFrame from a directory of images (e.g., for classification)
# Assumes a folder structure like: dataset/class_name/image.jpg
image_df = img_utils.path_to_dataframe_with_labels('path/to/dataset')
print(f"Found {len(image_df)} images with labels: {image_df['label'].unique()}")

# Load images with preprocessing
images, labels = img_utils.load_images_from_dataframe(
    image_df,
    size=(224, 224),  # Resize all images to 224x224
    normalize=True,   # Normalize pixel values to [0,1]
    show_progress=True
)

# Display a sample of images with labels
img_utils.display_images(
    images[:5], 
    labels=labels[:5],
    n_cols=5
)

For more examples and advanced usage, see the Basic Usage Examples guide.

Support

If you encounter issues or have questions about semiq-ml:

Bug Reports: Please open an issue with a detailed description of the problem, steps to reproduce it, and your environment details.
Feature Requests: Submit your ideas through the issue tracker using the "Feature Request" template.
Questions: For usage questions, reach out via GitHub Discussions

Contributing

We welcome contributions to semiq-ml! Here's how you can help:

Code Contributions: Fork the repository, create a feature branch, and submit a pull request.
Documentation: Help improve or translate documentation.
Bug Reports: Report bugs or suggest features via the issue tracker.

Please review our Contributing Guidelines for more details on code style, testing requirements, and the pull request process.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.12

Release history Release notifications | RSS feed

This version

0.4.9

Jul 26, 2025

0.4.5

Jun 12, 2025

0.4.0

Jun 11, 2025

0.3.6

Jun 11, 2025

0.3.5

Jun 10, 2025

0.3.4

Jun 10, 2025

0.3.3

Jun 10, 2025

0.3.2

Jun 10, 2025

0.3.1

Jun 10, 2025

0.3.0

Jun 10, 2025

0.2.5

Jun 10, 2025

0.2.4

Jun 10, 2025

0.2.3

Jun 10, 2025

0.2.2

Jun 10, 2025

0.2.0

Jun 9, 2025

0.2.0a1 pre-release

Jun 3, 2025

0.1.3

Jun 2, 2025

0.1.2

Jun 2, 2025

0.1.1

Jun 2, 2025

0.1.1a0 pre-release

Jun 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semiq_ml-0.4.9.tar.gz (26.9 kB view details)

Uploaded Jul 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

semiq_ml-0.4.9-py3-none-any.whl (25.6 kB view details)

Uploaded Jul 26, 2025 Python 3

File details

Details for the file semiq_ml-0.4.9.tar.gz.

File metadata

Download URL: semiq_ml-0.4.9.tar.gz
Upload date: Jul 26, 2025
Size: 26.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for semiq_ml-0.4.9.tar.gz
Algorithm	Hash digest
SHA256	`9d28574ddeb323dea675ed0f5cd267fa4140c0af267c6f7ddf90db04cffa1348`
MD5	`a11d6b9a742f87916edb4508b722c4c8`
BLAKE2b-256	`77252ab7019c479e6fbc148e33e76977b8f076faa7cdada4ae8df6ee09a84edb`

See more details on using hashes here.

File details

Details for the file semiq_ml-0.4.9-py3-none-any.whl.

File metadata

Download URL: semiq_ml-0.4.9-py3-none-any.whl
Upload date: Jul 26, 2025
Size: 25.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for semiq_ml-0.4.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`13d9ca01244cb723ed9a026be540a5341b21932c1bcf88fbad8385d86ad9afcb`
MD5	`08fb6e833ebc3c144ce184c1bcec5674`
BLAKE2b-256	`3f1873f073c504e92d36bbf500fc4858d0a5cf494ac47da325cfe3e518e72e11`

See more details on using hashes here.

semiq-ml 0.4.9

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

semiq-ml - Machine Learning Workflow Simplifier

Overview

Key Components

BaselineModel

OptunaOptimizer

GPU

Image Module

Getting Started

Example Usage

Image Processing Example

Support

Contributing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes