A base library for ML training (supervised) with environment setup and logging.
Project description
Machine Learning Training Base (ml-training-base)
ml-training-base is a Python library providing base classes and utilities for supervised machine learning projects. It includes:
- A configurable logging setup for both console and file outputs.
- Base classes for data loaders (
BaseSupervisedDataLoader). - An environment setup class for deterministic training (
TrainingEnvironment), ensuring reproducible runs. - A base trainer class (
BaseSupervisedTrainer) that outlines a typical training workflow in supervised learning.
By using these abstractions, you can quickly spin up a new ML pipeline with consistent structure and easily extend or override specific components to suit your needs.
Table of Contents
Features
- Reusable Base Classes: Standard building blocks for data loading, training, callbacks, and environment management.
- Logging Utilities: Automatically configure logging to both console and file, with customizable logging paths.
- Deterministic Environment Setup: Control Python, NumPy, and TensorFlow seeds for reproducible ML experiments.
- Clear Project Structure: Easily extend or override abstract methods in your own data loaders, trainers, or environment logic.
Installation
You can install this package locally via:
pip install ml-training-base
Quick Start
- Install the package and its dependencies.
- Create a YAML configuration file (e.g.
config.yaml) with your environment, logging, and data settings. - Import the classes in your script or Jupyter notebook:
import logging
from ml_training_base.data.utils.logging_utils import configure_logger
from ml_training_base.training.environment.environment import TrainingEnvironment
from ml_training_base.training.trainer import BaseSupervisedTrainer
- Set up your environment and trainer:
# For example, a custom trainer that inherits from BaseSupervisedTrainer
class MyCustomTrainer(BaseSupervisedTrainer):
def _setup_model(self):
# Initialize your model here, e.g., a TensorFlow/Keras or PyTorch model
pass
def _build_model(self):
# Compile or build your model
pass
def _setup_callbacks(self):
# Setup your training callbacks, checkpointing, etc.
pass
def _train(self):
# Implement your training loop or model.fit(...) call
pass
def _save_model(self):
# Save trained model to disk
pass
def _evaluate(self):
# Evaluate your model on the test set
pass
# Usage:
trainer = MyCustomTrainer(
config_path="path/to/config.yaml",
training_env=TrainingEnvironment(logger=logging.getLogger(__name__))
)
trainer.run()
Package Structure
ml-training-base/
│
├── src/
│ └── ml_training_base/
│ ├── __init__.py
│ ├── data/
│ │ ├── __init__.py
│ │ └── utils/
│ │ ├── __init__.py
│ │ └── logging_utils.py
│ ├── training/
│ │ ├── __init__.py
│ │ ├── environment/
│ │ │ ├── __init__.py
│ │ │ ├── base_environment.py
│ │ │ └── environment.py
│ │ ├── trainer.py
│ │ └── ...
│ └── ...
├── tests/
│ ├── __init__.py
│ ├── test_data_loader.py
│ ├── test_environment.py
│ ├── test_logging_utils.py
│ └── test_trainer.py
├── README.md
├── LICENSE
└── pyproject.toml
Key Modules
data/utils/logging_utils.py:- Contains
configure_logger(log_path)which sets up console/file logging.
- Contains
training/environment/base_environment.py:- Abstract base class
BaseEnvironmentfor environment setup tasks.
- Abstract base class
training/environment/training_environment.py:- Implementation of
TrainingEnvironment, enabling deterministic training (sets seeds, configures TensorFlow ops, etc.).
- Implementation of
training/trainer.py:- Contains
BaseSupervisedTrainer, an abstract class to streamline a typical training workflow (environment setup, model creation, training loop, evaluation).
- Contains
Configuration File
You can define your runtime settings (e.g., logger paths, environment determinism seeds, model hyperparameters) in a YAML file.
For example:
# Data Configuration and Hyperparameters
data:
x_data_path: 'data/processed/x_data'
y_data_path: 'data/processed/y_data'
logger_path: 'var/log/training.log'
batch_size: 32
test_split: 0.1
validation_split: 0.1
# Model Configuration and Hyperparameters
model:
attention_dim: 512
encoder_embedding_dim: 512
decoder_embedding_dim: 512
units: 512
encoder_num_layers: 2
decoder_num_layers: 4
# Training Configuration and Hyperparameters
training:
epochs: 100
early_stop_patience: 5
weight_decay: null
dropout_rate: 0.2
learning_rate: 1e-4
# Environment Configuration
env:
determinism:
python_seed: "44478977"
random_seed: 440651
numpy_seed: 110789
tf_seed: 61592
License
This project is licensed under the terms of the MIT License. Feel free to copy, modify, and distribute per its terms.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ml_training_base-0.3.0.tar.gz.
File metadata
- Download URL: ml_training_base-0.3.0.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f2f86928f146791c6fee72b40d37c142bd8e3225529a85ac728c8b5282e6316
|
|
| MD5 |
ae65608f9ef521a422926f59f09adc5f
|
|
| BLAKE2b-256 |
eafc14da7ccd5d869442df59c161c6f3741dd8cd68142ffe885674c18f1fe489
|
File details
Details for the file ml_training_base-0.3.0-py3-none-any.whl.
File metadata
- Download URL: ml_training_base-0.3.0-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d0263422834540254cb30269504141e7936355b5a7d128dd95cc56f0106a3ef
|
|
| MD5 |
8d07242df6e602012ee485b5d809c1f4
|
|
| BLAKE2b-256 |
2f173f6af9f91ce8ba9f8894073e1df5524f32e470e78a4a58bb1860f82732ef
|