A simple machine learning benchmarking library

These details have not been verified by PyPI

Project links

Project description

mlbench-lite

A comprehensive machine learning benchmarking library that provides an easy way to compare multiple ML models on your dataset. Built with scikit-learn, XGBoost, LightGBM, CatBoost, and pandas for seamless integration into your ML workflow.

🚀 Features

Comprehensive Model Support: 20+ ML models from multiple libraries
Flexible Model Selection: Choose specific models, categories, or exclude models
Multiple ML Libraries: scikit-learn, XGBoost, LightGBM, CatBoost
Simple API: One function call to benchmark multiple models
Comprehensive Metrics: Returns Accuracy, Precision, Recall, and F1 scores
Custom Dataset: Includes the load_clover dataset for testing
Easy Integration: Works seamlessly with scikit-learn datasets
Pandas Output: Results returned as a clean pandas DataFrame
Reproducible: Consistent results with random state control
Model Information: Get detailed info about available models

📦 Installation

pip install mlbench-lite

🎯 Quick Start

from mlbench_lite import benchmark, load_clover

# Load the clover dataset
X, y = load_clover(return_X_y=True)

# Benchmark all available models
results = benchmark(X, y)
print(results)

Output:

                 Model           Category  Accuracy  Precision  Recall      F1
0        Random Forest  Tree-based Models    0.9500     0.9565  0.9512  0.9505
1                  SVM        SVM Models    0.9250     0.9337  0.9255  0.9254
2  Logistic Regression    Linear Models    0.9125     0.9131  0.9117  0.9115
3              XGBoost           XGBoost    0.9000     0.9024  0.9000  0.8997
4            LightGBM          LightGBM    0.8875     0.8891  0.8875  0.8873

📚 API Reference

`benchmark(X, y, test_size=0.2, random_state=42, models=None, model_categories=None, exclude_models=None)`

Benchmark multiple machine learning models on a dataset.

Parameters:

X (array-like): Training vectors of shape (n_samples, n_features)
y (array-like): Target values of shape (n_samples,)
test_size (float, optional): Proportion of dataset for testing (default: 0.2)
random_state (int, optional): Random seed for reproducibility (default: 42)
models (list of str, optional): Specific models to use. If None, uses all available models.
model_categories (list of str, optional): Categories of models to use. If None, uses all categories.
exclude_models (list of str, optional): Models to exclude from benchmarking.

Returns:

pandas.DataFrame: Results with columns:
- Model: Name of the model
- Category: Category of the model
- Accuracy: Accuracy score
- Precision: Precision score (macro-averaged)
- Recall: Recall score (macro-averaged)
- F1: F1 score (macro-averaged)

`list_available_models()`

List all available models and their categories.

Returns:

dict: Dictionary with model categories as keys and lists of model names as values

`get_model_info()`

Get detailed information about available models.

Returns:

pandas.DataFrame: DataFrame with model information including category, name, and description

`load_clover(return_X_y=False)`

Load the custom clover dataset.

Parameters:

return_X_y (bool, default=False): If True, returns (data, target) instead of a Bunch object

Returns:

Bunch or tuple: Dataset object with data, target, feature_names, target_names, and DESCR

💡 Code Examples

1. Basic Usage with All Models

from mlbench_lite import benchmark, load_clover

# Load the clover dataset
X, y = load_clover(return_X_y=True)
print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")

# Benchmark all available models
results = benchmark(X, y)
print("\nBenchmark Results:")
print(results)

# Get the best model
best_model = results.iloc[0]
print(f"\n🏆 Best Model: {best_model['Model']} (Accuracy: {best_model['Accuracy']:.4f})")

2. Model Selection - Specific Models

from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Benchmark only specific models
results = benchmark(X, y, models=['Random Forest', 'XGBoost', 'LightGBM', 'Logistic Regression'])
print("Selected Models Results:")
print(results)

3. Model Selection - By Categories

from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Benchmark only tree-based models
results = benchmark(X, y, model_categories=['Tree-based Models'])
print("Tree-based Models Results:")
print(results)

# Benchmark multiple categories
results = benchmark(X, y, model_categories=['Linear Models', 'SVM Models'])
print("\nLinear and SVM Models Results:")
print(results)

4. Exclude Specific Models

from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Exclude slow models
results = benchmark(X, y, exclude_models=['Gaussian Process', 'Multi-layer Perceptron'])
print("Results without slow models:")
print(results)

5. List Available Models

from mlbench_lite import list_available_models, get_model_info

# List all available models by category
models = list_available_models()
print("Available Models by Category:")
for category, model_list in models.items():
    print(f"\n{category}:")
    for model in model_list:
        print(f"  - {model}")

# Get detailed model information
model_info = get_model_info()
print("\nDetailed Model Information:")
print(model_info)

6. Advanced Model Selection

from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Complex selection: specific models from specific categories, excluding some
results = benchmark(
    X, y,
    models=['Random Forest', 'XGBoost', 'SVM (RBF)', 'Logistic Regression'],
    exclude_models=['SVM (Linear)']
)
print("Custom Selection Results:")
print(results)

7. Using with Scikit-learn Datasets

from mlbench_lite import benchmark
from sklearn.datasets import load_wine, load_breast_cancer

# Test with Wine dataset
print("=== Wine Dataset ===")
X, y = load_wine(return_X_y=True)
results = benchmark(X, y)
print(results)

# Test with Breast Cancer dataset
print("\n=== Breast Cancer Dataset ===")
X, y = load_breast_cancer(return_X_y=True)
results = benchmark(X, y)
print(results)

8. Custom Test Size

from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Use 30% of data for testing
results = benchmark(X, y, test_size=0.3)
print("Results with 30% test size:")
print(results)

# Use 10% of data for testing
results = benchmark(X, y, test_size=0.1)
print("\nResults with 10% test size:")
print(results)

9. Reproducible Results

from mlbench_lite import benchmark, load_clover

X, y = load_clover(return_X_y=True)

# Set random seed for reproducible results
results1 = benchmark(X, y, random_state=123)
results2 = benchmark(X, y, random_state=123)

print("Results with random_state=123:")
print(results1)
print(f"\nResults are identical: {results1.equals(results2)}")

# Different random state produces different results
results3 = benchmark(X, y, random_state=456)
print(f"\nDifferent random state produces different results: {not results1.equals(results3)}")

10. Working with Synthetic Data

from mlbench_lite import benchmark
from sklearn.datasets import make_classification

# Create synthetic dataset
X, y = make_classification(
    n_samples=1000,
    n_features=20,
    n_informative=15,
    n_classes=4,
    random_state=42
)

print(f"Synthetic dataset shape: {X.shape}")
print(f"Number of classes: {len(set(y))}")

results = benchmark(X, y)
print("\nBenchmark Results:")
print(results)

11. Analyzing Results

from mlbench_lite import benchmark, load_clover
import pandas as pd

X, y = load_clover(return_X_y=True)
results = benchmark(X, y)

# Display results with better formatting
print("Detailed Results:")
print("=" * 60)
for idx, row in results.iterrows():
    print(f"{row['Model']:20} | Acc: {row['Accuracy']:.4f} | "
          f"Prec: {row['Precision']:.4f} | Rec: {row['Recall']:.4f} | "
          f"F1: {row['F1']:.4f}")

# Find models with accuracy > 0.9
high_accuracy = results[results['Accuracy'] > 0.9]
print(f"\nModels with accuracy > 0.9: {len(high_accuracy)}")

# Calculate average metrics
avg_metrics = results[['Accuracy', 'Precision', 'Recall', 'F1']].mean()
print(f"\nAverage metrics across all models:")
for metric, value in avg_metrics.items():
    print(f"  {metric}: {value:.4f}")

12. Comparing Different Datasets

from mlbench_lite import benchmark, load_clover
from sklearn.datasets import load_wine, load_breast_cancer

datasets = [
    ("Clover", load_clover(return_X_y=True)),
    ("Wine", load_wine(return_X_y=True)),
    ("Breast Cancer", load_breast_cancer(return_X_y=True))
]

print("Dataset Comparison:")
print("=" * 80)

for name, (X, y) in datasets:
    print(f"\n{name} Dataset:")
    print(f"  Shape: {X.shape}, Classes: {len(set(y))}")
    
    results = benchmark(X, y)
    best_acc = results.iloc[0]['Accuracy']
    best_model = results.iloc[0]['Model']
    
    print(f"  Best Model: {best_model} (Accuracy: {best_acc:.4f})")
    
    # Show top 2 models
    print("  Top 2 Models:")
    for idx, row in results.head(2).iterrows():
        print(f"    {row['Model']}: {row['Accuracy']:.4f}")

🔬 Models Included

The library includes 20+ machine learning models from multiple categories:

Linear Models

Logistic Regression: Linear model for classification using logistic function
Ridge Classifier: Linear classifier with L2 regularization
SGD Classifier: Linear classifier using Stochastic Gradient Descent
Perceptron: Simple linear classifier
Passive Aggressive: Online learning algorithm for classification

Tree-based Models

Decision Tree: Non-parametric supervised learning method
Random Forest: Ensemble of decision trees with bagging
Extra Trees: Extremely randomized trees ensemble
Gradient Boosting: Boosting ensemble method using gradient descent
AdaBoost: Adaptive boosting ensemble method
Bagging Classifier: Bootstrap aggregating ensemble method

SVM Models

SVM (RBF): Support Vector Machine with RBF kernel
SVM (Linear): Support Vector Machine with linear kernel

Neighbors

K-Nearest Neighbors: Instance-based learning algorithm

Naive Bayes

Gaussian Naive Bayes: Naive Bayes classifier for Gaussian features
Multinomial Naive Bayes: Naive Bayes classifier for multinomial features
Bernoulli Naive Bayes: Naive Bayes classifier for binary features

Discriminant Analysis

Linear Discriminant Analysis: Linear dimensionality reduction and classification
Quadratic Discriminant Analysis: Quadratic classifier with Gaussian assumptions

Neural Networks

Multi-layer Perceptron: Feedforward artificial neural network

Gaussian Process

Gaussian Process: Probabilistic classifier using Gaussian processes

Advanced Gradient Boosting

XGBoost: Extreme gradient boosting framework (if installed)
LightGBM: Light gradient boosting machine (if installed)
CatBoost: Categorical boosting framework (if installed)

All models use their default parameters with appropriate random seeds for reproducibility.

📊 Clover Dataset Details

The load_clover function provides a custom synthetic dataset:

Samples: 400
Features: 4
Classes: 4

Features:

leaf_length: Length of the leaf in cm
leaf_width: Width of the leaf in cm
petiole_length: Length of the petiole in cm
leaflet_count: Number of leaflets per leaf

Classes:

white_clover: Trifolium repens
red_clover: Trifolium pratense
crimson_clover: Trifolium incarnatum
alsike_clover: Trifolium hybridum

🛠️ Requirements

Core Dependencies

Python >= 3.8
scikit-learn >= 1.0.0
pandas >= 1.3.0
numpy >= 1.20.0

Optional Dependencies (for additional models)

xgboost >= 1.5.0 (for XGBoost models)
lightgbm >= 3.2.0 (for LightGBM models)
catboost >= 1.0.0 (for CatBoost models)
scikit-optimize >= 0.9.0 (for advanced optimization)

Note: The library works with just the core dependencies. Optional dependencies are automatically installed when you install the package, but models from unavailable libraries will be skipped gracefully.

🧪 Testing

Run the test suite to verify everything works:

# Run all tests
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=mlbench_lite

# Quick functionality test
python -c "from mlbench_lite import benchmark, load_clover; X, y = load_clover(return_X_y=True); results = benchmark(X, y); print(results)"

🚀 Development

Setup Development Environment

git clone https://github.com/Arefin994/mlbench-lite.git
cd mlbench-lite
pip install -e ".[dev]"

Code Quality

# Format code
black mlbench_lite tests

# Lint code
flake8 mlbench_lite tests

# Type checking
mypy mlbench_lite

Building for Distribution

# Build package
python -m build

# Upload to PyPI
twine upload dist/*

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📈 Changelog

2.0.0 (2024-01-XX)

MAJOR UPDATE: Added 20+ machine learning models
NEW: Flexible model selection (specific models, categories, exclusions)
NEW: Support for XGBoost, LightGBM, and CatBoost
NEW: Model information and listing functions
NEW: Comprehensive model categories (Linear, Tree-based, SVM, etc.)
IMPROVED: Enhanced API with more parameters
IMPROVED: Better error handling and graceful degradation
IMPROVED: Updated documentation with extensive examples

0.1.0 (2024-01-XX)

Initial release
Basic benchmarking functionality
Support for Logistic Regression, Random Forest, and SVM
Comprehensive metrics (Accuracy, Precision, Recall, F1)
Custom clover dataset
Full test coverage
PyPI ready

🆘 Support

If you encounter any issues or have questions:

Check the Issues page
Create a new issue with detailed information
Include code examples and error messages

🙏 Acknowledgments

Built with scikit-learn
Uses pandas for data handling
Inspired by the need for simple ML benchmarking tools

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.0.1

Feb 7, 2026

3.0.0

Feb 7, 2026

2.0.3

Sep 18, 2025

This version

2.0.2

Sep 18, 2025

2.0.1

Sep 18, 2025

2.0.0

Sep 18, 2025

0.1.0

Sep 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlbench_lite-2.0.2.tar.gz (17.2 kB view details)

Uploaded Sep 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlbench_lite-2.0.2-py3-none-any.whl (12.5 kB view details)

Uploaded Sep 18, 2025 Python 3

File details

Details for the file mlbench_lite-2.0.2.tar.gz.

File metadata

Download URL: mlbench_lite-2.0.2.tar.gz
Upload date: Sep 18, 2025
Size: 17.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for mlbench_lite-2.0.2.tar.gz
Algorithm	Hash digest
SHA256	`4d0161509b54044d321757ffb31c51d040464d66ceefcf1fc9bf18574d57689c`
MD5	`016054bb67a03ff1b7fd4d92e414823c`
BLAKE2b-256	`2633970e97e3e14b11c65d5e26c8911c140ea075bebba757dfd839a7b5f85c85`

See more details on using hashes here.

File details

Details for the file mlbench_lite-2.0.2-py3-none-any.whl.

File metadata

Download URL: mlbench_lite-2.0.2-py3-none-any.whl
Upload date: Sep 18, 2025
Size: 12.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for mlbench_lite-2.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`617dbcbfd8d64cd2c85cad64f38a37ed4718f7d7d1c816f3826e9b6c5b398b2b`
MD5	`ecb7a1ad575f4708a900b54434458439`
BLAKE2b-256	`ddc19d6429db600f6121cf627e7a0b7223aaa76b8f8155575ad2443af9513760`

See more details on using hashes here.

mlbench-lite 2.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mlbench-lite

🚀 Features

📦 Installation

🎯 Quick Start

📚 API Reference

benchmark(X, y, test_size=0.2, random_state=42, models=None, model_categories=None, exclude_models=None)

list_available_models()

get_model_info()

load_clover(return_X_y=False)

💡 Code Examples

1. Basic Usage with All Models

2. Model Selection - Specific Models

3. Model Selection - By Categories

4. Exclude Specific Models

5. List Available Models

6. Advanced Model Selection

7. Using with Scikit-learn Datasets

8. Custom Test Size

9. Reproducible Results

10. Working with Synthetic Data

11. Analyzing Results

12. Comparing Different Datasets

🔬 Models Included

Linear Models

Tree-based Models

SVM Models

Neighbors

Naive Bayes

Discriminant Analysis

Neural Networks

Gaussian Process

Advanced Gradient Boosting

📊 Clover Dataset Details

🛠️ Requirements

Core Dependencies

Optional Dependencies (for additional models)

🧪 Testing

🚀 Development

Setup Development Environment

Code Quality

Building for Distribution

🤝 Contributing

📄 License

📈 Changelog

2.0.0 (2024-01-XX)

0.1.0 (2024-01-XX)

🆘 Support

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`benchmark(X, y, test_size=0.2, random_state=42, models=None, model_categories=None, exclude_models=None)`

`list_available_models()`

`get_model_info()`

`load_clover(return_X_y=False)`