A Python library to automate machine learning model training and tuning using a simple configuration file.
Project description
aydie-mllib
aydie-mllib is a Python library designed to automate and simplify the process of training and tuning machine learning models. By leveraging a simple YAML configuration file, you can easily test multiple algorithms, perform hyperparameter tuning with GridSearchCV, and find the best model for your data without writing repetitive boilerplate code.
This library is built to be extensible and supports any model that follows the scikit-learn API, including popular libraries like XGBoost.
Features
- Configuration-Driven: Define your entire model training pipeline in a single YAML file.
- Automated Grid Search: Automatically performs hyperparameter tuning for multiple models.
- Model Agnostic: Works with any scikit-learn compatible model (e.g.,
RandomForestRegressor,SVR,XGBClassifier). - Find the Best: Compares the tuned models and returns the one with the highest score.
- Easy to Use: Includes a helper function to generate a sample configuration file to get you started instantly.
Installation
You can install aydie-mllib directly from PyPI:
pip install aydie-mllib
Or, install it directly from the source for the latest version:
git clone [https://github.com/aydiegithub/aydie-mllib.git](https://github.com/aydiegithub/aydie-mllib.git)
cd aydie-mllib
pip install .
Quickstart Guide
Here's how to get up and running with aydie-mllib in just a few steps.
1. Generate the Configuration File
First, create a Python script to generate a sample model_config.yaml file. This will be the blueprint for your training pipeline.
generate_config.py
from aydie_mllib.config import generate_sample_model_config
# This will create a 'config' directory and place 'model_config.yaml' inside it.
file_path = generate_sample_model_config(export_dir="config")
print(f"Sample config file has been generated at: {file_path}")
2. Customize model_config.yaml
Now, open the newly created config/model_config.yaml and customize it for the models you want to test. Let's set it up to compare a RandomForestRegressor and an XGBRegressor.
grid_search:
class: GridSearchCV
module: sklearn.model_selection
params:
cv: 5
verbose: 1
model_selection:
module_0:
class: RandomForestRegressor
module: sklearn.ensemble
params:
n_estimators: 100
random_state: 42
search_param_grid:
n_estimators:
- 100
- 200
max_depth:
- 5
- 10
- null
module_1:
class: XGBRegressor
module: xgboost
params:
objective: reg:squarederror
search_param_grid:
n_estimators:
- 50
- 100
learning_rate:
- 0.05
- 0.1
3. Find the Best Model
Finally, use the ModelBuilder to load your configuration, train the models, and find the best one.
run_training.py
import pandas as pd
from sklearn.model_selection import train_test_split
from aydie_mllib import ModelBuilder
# --- 1. Load your data ---
# As an example, let's create some dummy data
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
X = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(X.shape[1])])
# --- 2. Initialize the ModelBuilder ---
# Point it to your configuration file
model_builder = ModelBuilder(model_config_path="config/model_config.yaml")
# --- 3. Get the best model ---
# The get_best_model method runs the entire pipeline
best_model_detail = model_builder.get_best_model(X=X, y=y, base_accuracy=0.6)
# --- 4. Print the results ---
print("\n--- Best Model Found ---")
print(f"Model Class: {best_model_detail.best_model.__class__.__name__}")
print(f"Best Score (R^2): {best_model_detail.best_score:.4f}")
print(f"Best Parameters: {best_model_detail.best_parameters}")
# You can now use this best model for predictions
# best_model = best_model_detail.best_model
# predictions = best_model.predict(X)
How it Works
The library is centered around the ModelBuilder class, which orchestrates the entire process based on your model_config.yaml file.
grid_searchsection: Defines the hyperparameter search strategy. By default, it usessklearn.model_selection.GridSearchCV. You can customize its parameters likecv(cross-validation folds).model_selectionsection: This is a dictionary where each key (e.g.,module_0) represents a model to be evaluated.module: The Python module where the model class is located (e.g.,sklearn.ensembleorxgboost).class: The name of the model class (e.g.,RandomForestRegressor).params: A dictionary of fixed parameters that will be passed to the model's constructor.search_param_grid: The dictionary of hyperparameters to be tuned by the grid search.
Connect with Me
- 🌐 Website: aydie.in
- 💼 LinkedIn: @aydiemusic
- 🐦 X (Twitter): @aydiemusic
- 📸 Instagram: @aydiemusic
- 📺 YouTube: @aydiemusic
- 📧 Contact: business@aydie.in
Contributing
Contributions are welcome! If you have ideas for improvements or find a bug, please open an issue or submit a pull request.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aydie_mllib-1.2.1.tar.gz.
File metadata
- Download URL: aydie_mllib-1.2.1.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0598e7c0da5ce9d5f1432c7412bc8358b6ef0464f1abc42280cba5957f5e8f8c
|
|
| MD5 |
193f36ce1055bc490b7aed5aafcf5477
|
|
| BLAKE2b-256 |
96d647681189694131840d458eceb023de3114c7afc0b924ff06c7cff2ca2aca
|
Provenance
The following attestation bundles were made for aydie_mllib-1.2.1.tar.gz:
Publisher:
publish.yaml on aydiegithub/aydie-mllib
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aydie_mllib-1.2.1.tar.gz -
Subject digest:
0598e7c0da5ce9d5f1432c7412bc8358b6ef0464f1abc42280cba5957f5e8f8c - Sigstore transparency entry: 297437816
- Sigstore integration time:
-
Permalink:
aydiegithub/aydie-mllib@4502258796700f03de0eab8d6114968546d3eb09 -
Branch / Tag:
refs/tags/v1.2.1 - Owner: https://github.com/aydiegithub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@4502258796700f03de0eab8d6114968546d3eb09 -
Trigger Event:
release
-
Statement type:
File details
Details for the file aydie_mllib-1.2.1-py3-none-any.whl.
File metadata
- Download URL: aydie_mllib-1.2.1-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1f902096858e0a8355cc5f21b4cf85468a7e876eb452dbadd0dd6c08e9a800f
|
|
| MD5 |
19c5677c9a404f217262880ea0a41c04
|
|
| BLAKE2b-256 |
8e19deadd0fb8ea6bc657a2f99dc12cb243b6932e14809d4246408dff18bda60
|
Provenance
The following attestation bundles were made for aydie_mllib-1.2.1-py3-none-any.whl:
Publisher:
publish.yaml on aydiegithub/aydie-mllib
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
aydie_mllib-1.2.1-py3-none-any.whl -
Subject digest:
c1f902096858e0a8355cc5f21b4cf85468a7e876eb452dbadd0dd6c08e9a800f - Sigstore transparency entry: 297437848
- Sigstore integration time:
-
Permalink:
aydiegithub/aydie-mllib@4502258796700f03de0eab8d6114968546d3eb09 -
Branch / Tag:
refs/tags/v1.2.1 - Owner: https://github.com/aydiegithub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@4502258796700f03de0eab8d6114968546d3eb09 -
Trigger Event:
release
-
Statement type: