A collection of utilities for machine learning applications.
Project description
iclearn
iclearn is a tool for standardizing distributed machine-learning workflows at ICHEC. It will allow us to develop a common set of performance benchmarking, profiling and optimization tools and apply them to ML workflows across scientific domains.
Design
The top-level library architecture is shown above. A machine learning experiment is defined via a YAML file and launched via the CLI. Resources addressed in the YAML are loaded from a range of libraries, which are built out per-domain (e.g. Earth Observation) or per-framework (e.g. PyTorch). Libraries can include ML models and datasets, but also specialized metrics calculators, output handlers and profiling tools.
Once resources are loaded a machine learning experiment is executed in a Session using supported frameworks, primarily the PyTorch ecosystem at the moment, but others are planned.
Practical integration of a third-party library is shown in the figure above. A config file is read through the CLI. Models, dataloaders and similar are loaded from third party libraries by 'provider' callbacks which take 'resource IDs' from the config and provide corresponding Python objects. The Python objects are derived from iclearn base classes and implement event handlers for different stages of a machine learning workflow, such as training steps, testing or inference.
A sample yaml file for a machine learning training session is shown below:
name: linear_train
dataloader:
batch_size: 64
dataset:
name: linear
model:
name: "torch.linear"
framework: "pytorch"
optimizer:
name: "torch.SGD"
learning_rate: 0.001
loss_function: "torch.MSELoss"
outputs:
- name: "logging"
- name: "plotting"
active: false
with_profiling: false
num_epochs: 10
num_batches: 0
This includes named PyTorch models or model elements, e.g. torch.linear and torch.SGD and their parameters, a named dataset linear and named output handlers plotting and logging.
A third party library may expose custom datasets my_library.my_dataset or output handlers my_library.mlflow, my_library.my_grid_plotter.
with a simple implementation via inheritance from iclearn templates, as shown below.
from iclearn.data import Dataloader, Splits
from iclearn.model import Model, Metrics
class MyModel(Model):
def __init__(metrics: Metrics):
super(metrics = metrics, MyOptimizer(MyLossFunc()))
def predict(self, x):
return ...
class MyDataloader(Dataloader):
def load_dataset(root: Path, name: str, splits):
return ...
def load_dataloader(name: str):
return...
As a real example of launching a CLI with a config you can train a simple built-in linear regression with:
iclearn train --config test/data/experiments/linear_train.yaml
In practice you would launch your own program that includes functionality for providing your custom library resources via callbacks, giving something like:
my_custom_pipeline train --config my_experiment.yaml
Installing
The package is available on PyPI, you can install the base package with:
pip install iclearn
Most functionality so far uses PyTorch, you can install the PyTorch add-ons with:
pip install 'iclearn[torch]'
License
This software is Copyright ICHEC 2024 and can be re-used under the terms of the GPL v3+. See the included LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iclearn-0.1.6.tar.gz.
File metadata
- Download URL: iclearn-0.1.6.tar.gz
- Upload date:
- Size: 47.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d552cb9abb415de3bebf04afee901674b9c54ff36659724f774783f6b99f48e
|
|
| MD5 |
461bff00e1146e22d263756b45bdad5d
|
|
| BLAKE2b-256 |
c9822e8adfa29d448b0823508e3ca287f1d5880ec8c1798ed10e75ed80a1d543
|
File details
Details for the file iclearn-0.1.6-py3-none-any.whl.
File metadata
- Download URL: iclearn-0.1.6-py3-none-any.whl
- Upload date:
- Size: 56.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60bb2730ba487a2feb20ad8a708bc9061f5d85d2237b78fe4644f3295aa35610
|
|
| MD5 |
ea9aa9928fe47065f89acb28540028ff
|
|
| BLAKE2b-256 |
f01cf9575727559fb1dad88408cd8c48ddf19d7e06d7b4ba08778d9bb0fd2e6f
|