Skip to main content

A collection of utilities for machine learning applications.

Project description

iclearn

iclearn is a tool for standardizing distributed machine-learning workflows at ICHEC. It will allow us to develop a common set of performance benchmarking, profiling and optimization tools and apply them to ML workflows across scientific domains.

Design

Top-level library architecture

The top-level library architecture is shown above. A machine learning experiment is defined via a YAML file and launched via the CLI. Resources addressed in the YAML are loaded from a range of libraries, which are built out per-domain (e.g. Earth Observation) or per-framework (e.g. PyTorch). Libraries can include ML models and datasets, but also specialized metrics calculators, output handlers and profiling tools.

Once resources are loaded a machine learning experiment is executed in a Session using supported frameworks, primarily the PyTorch ecosystem at the moment, but others are planned.

Library integration

Practical integration of a third-party library is shown in the figure above. A config file is read through the CLI. Models, dataloaders and similar are loaded from third party libraries by 'provider' callbacks which take 'resource IDs' from the config and provide corresponding Python objects. The Python objects are derived from iclearn base classes and implement event handlers for different stages of a machine learning workflow, such as training steps, testing or inference.

A sample yaml file for a machine learning training session is shown below:

name: linear_train
dataloader:
  batch_size: 64
  dataset:
    name: linear
model:
  name: "torch.linear"
  framework: "pytorch"
  optimizer:
    name: "torch.SGD"
    learning_rate: 0.001
  loss_function: "torch.MSELoss"
outputs:
  - name: "logging"
  - name: "plotting"
    active: false
with_profiling: false
num_epochs: 10
num_batches: 0

This includes named PyTorch models or model elements, e.g. torch.linear and torch.SGD and their parameters, a named dataset linear and named output handlers plotting and logging.

A third party library may expose custom datasets my_library.my_dataset or output handlers my_library.mlflow, my_library.my_grid_plotter.

with a simple implementation via inheritance from iclearn templates, as shown below.

from iclearn.data import Dataloader, Splits
from iclearn.model import Model, Metrics

class MyModel(Model):

    def __init__(metrics: Metrics):
        super(metrics = metrics, MyOptimizer(MyLossFunc()))
        
    def predict(self, x):
        return ...
        
class MyDataloader(Dataloader):

    def load_dataset(root: Path, name: str, splits):
        return ...
        
    def load_dataloader(name: str):
        return...

As a real example of launching a CLI with a config you can train a simple built-in linear regression with:

iclearn train --config test/data/experiments/linear_train.yaml

In practice you would launch your own program that includes functionality for providing your custom library resources via callbacks, giving something like:

my_custom_pipeline train  --config my_experiment.yaml

Installing

The package is available on PyPI, you can install the base package with:

pip install iclearn

Most functionality so far uses PyTorch, you can install the PyTorch add-ons with:

pip install 'iclearn[torch]'

License

This software is Copyright ICHEC 2024 and can be re-used under the terms of the GPL v3+. See the included LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iclearn-0.1.6.tar.gz (47.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iclearn-0.1.6-py3-none-any.whl (56.5 kB view details)

Uploaded Python 3

File details

Details for the file iclearn-0.1.6.tar.gz.

File metadata

  • Download URL: iclearn-0.1.6.tar.gz
  • Upload date:
  • Size: 47.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for iclearn-0.1.6.tar.gz
Algorithm Hash digest
SHA256 7d552cb9abb415de3bebf04afee901674b9c54ff36659724f774783f6b99f48e
MD5 461bff00e1146e22d263756b45bdad5d
BLAKE2b-256 c9822e8adfa29d448b0823508e3ca287f1d5880ec8c1798ed10e75ed80a1d543

See more details on using hashes here.

File details

Details for the file iclearn-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: iclearn-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 56.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for iclearn-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 60bb2730ba487a2feb20ad8a708bc9061f5d85d2237b78fe4644f3295aa35610
MD5 ea9aa9928fe47065f89acb28540028ff
BLAKE2b-256 f01cf9575727559fb1dad88408cd8c48ddf19d7e06d7b4ba08778d9bb0fd2e6f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page