Simple ML pipeline platform

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

IrisML

Proof of Concept for a simple framework to create a ML pipeline.

Features

Run a ML training/inference with a simple JSON configuration.
Modularized interfaces for task components.
Cache task outputs for faster experiments.

Getting started

Installation

Prerequisite: python 3.8+

# Install the core framework and standard tasks.
pip install irisml irisml-tasks irisml-tasks-training

Run an example job

# Install additional packages that are required for the example
pip install irisml-tasks-torchvision

# Run on local machine
irisml_run docs/examples/mobilenetv2_mnist_training.json

Available commands

# Run the specified pipeline. You can provide environment variables by "-e" option, which will be acceible through $env variable in the json config.
irisml_run <pipeline_json_path> [-e <ENV_NAME>=<env_value>] [--no_cache] [-v]

# Show information about the specified task. If <task_name> is not provided, shows a list of available tasks in the current environment.
irisml_show [<task_name>]

Pipeline definition

PipelineDefinition = {"tasks": <list of TaskDefinition>}

TaskDefinition = {
    "task": <task module name>,
    "name": <optional unique name of the task>,
    "inputs": <list of input objects>,
    "config": <config for the task. Use irisml_show command to find the available configurations.>
}

In the TaskDefinition.inputs and TaskDefinition.config, you cna use the following two variable.

$env.<variable_name> This variable will be replaced by the environment variable that was provided as arguments for irisml_run command.
$outputs.<task_name>.<field_name> This variable will be replaced by the outputs of the specified previous task.

It raises an exception on runtime if the specified variable was not found.

Pipeline cache

Using cache, you can modify and re-run a pipeline config with minimum cost. If the cache is enabled, IrisML will calculate hash values for all task inputs/configs and upload the task outputs to a specified storage. When it found a task with same hash values, it can download the cache and skip the task execution.

To enable cache, you must specify the cache storage location by setting IRISML_CACHE_URL environment variable. Currently Azure Blob Storage and local filesystem is supported.

To use Azure Blob Storage, a container URL must be provided. It the URL contains a SAS token, it will be used for authentication. Otherwise, interactive authentication and Managed Identity authentication will be used.

List of available tasks

To show the detailed help for each task, run the following command after installing the package.

irisml_show <task_name>

irisml-tasks

assert
download_azure_blob
get_dataset_stats
get_dataset_subset
get_fake_image_classification_dataset
get_fake_object_detection_dataset
get_item
load_state_dict
run_parallel
run_sequential
save_file
save_state_dict
search_grid_sequential
upload_azure_blob

irisml-tasks-training

This package contains tasks related to pytorch training

append_classifier
build_classification_prompt_dataset
build_zero_shot_classifier
create_classification_prompt_generator
evaluate_accuracy
evaluate_detection_average_precision
export_onnx
get_targets_from_dataset
load_state_dict
make_feature_extractor_model
make_image_text_contrastive_model
make_image_text_transform
predict
save_state_dict
split_image_text_model
train

irisml-tasks-torchvision

load_torchvision_dataset
create_torchvision_model
create_torchvision_transform

irisml-tasks-timm

Adapter for models in timm library

create_timm_model

irisml-tasks-azureml

run_azureml_child

irisml-tasks-fiftyone

launch_fiftyone

Development

Create a new task

To create a Task, you must define a module that contains a "Task" class. Here is a simple example:

# irisml/tasks/my_custom_task.py
import dataclasses
import irisml.core

class Task(irisml.core.TaskBase):  # The class name must be "Task".
  VERSION = '1.0.0'
  CACHE_ENABLED = True  # (default: True) This is optional.

  @dataclasses.dataclass
  class Inputs:  # You can remove this class if the task doesn't require inputs.
    int_value: int
    float_value: float

  @dataclasses.dataclass
  class Config:  # If there is no configuration, you can remove this class. All fields must be JSON-serializable.
    another_float: float
    child_dataclass: dataclass  # If you'd like to define a nested config, you can define another dataclass.

  @dataclasses.dataclass
  class Outputs:  # Can be removed if the task doesn't have outputs.
    float_value: float = 0  # If dry_run() is not implemented, Outputs fields must have default value or default factory.

  def execute(self, inputs: Inputs) -> Outputs:
    return self.Outputs(inputs.int_value * inputs.float_value * self.config.another_float)

  def dry_run(self, inputs: Inputs) -> Outputs:  # This method is optional.
    return self.Outputs(0)  # Must return immediately without actual processing.

Each Task must define "execute" method. The base class has empty implementation for Inputs, Config, Outputs and dry_run(). For the detail, please see the document for TaskBase class.

Related repositories

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.35

Dec 2, 2023

0.0.34

Nov 17, 2023

0.0.33

Jul 21, 2023

0.0.32

Jun 30, 2023

0.0.31

Jun 28, 2023

0.0.30

Jun 22, 2023

0.0.29

Jun 15, 2023

0.0.28

May 23, 2023

0.0.27

May 20, 2023

0.0.26

May 18, 2023

0.0.25

May 4, 2023

0.0.24

May 2, 2023

0.0.23

Apr 29, 2023

0.0.22

Apr 14, 2023

0.0.21

Mar 21, 2023

0.0.20

Mar 17, 2023

0.0.19

Mar 17, 2023

0.0.18

Mar 16, 2023

0.0.17

Feb 14, 2023

0.0.16

Dec 2, 2022

0.0.15

Nov 29, 2022

0.0.14

Nov 23, 2022

0.0.13

Oct 14, 2022

This version

0.0.12

Oct 7, 2022

0.0.11

Oct 4, 2022

0.0.10

Sep 19, 2022

0.0.9

Sep 13, 2022

0.0.8

Sep 12, 2022

0.0.7

Aug 20, 2022

0.0.6

Aug 17, 2022

0.0.5

Aug 17, 2022

0.0.4

Aug 2, 2022

0.0.3

Jul 16, 2022

0.0.2

Jul 16, 2022

0.0.1

Jun 29, 2022

0.0.0

Jun 22, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

irisml-0.0.12.tar.gz (20.6 kB view hashes)

Uploaded Oct 7, 2022 Source

Built Distribution

irisml-0.0.12-py3-none-any.whl (19.9 kB view hashes)

Uploaded Oct 7, 2022 Python 3

Hashes for irisml-0.0.12.tar.gz

Hashes for irisml-0.0.12.tar.gz
Algorithm	Hash digest
SHA256	`dc6e336804974907657e1e5db38c526710ed44b5250ad66562a80ffec566f10c`
MD5	`4e478de58b4e7c588df04085deda76c6`
BLAKE2b-256	`5e6c8d54b09d0304b21aafb0448328e32d2029d84627cf6ea8dee2b73fd7f6cb`

Hashes for irisml-0.0.12-py3-none-any.whl

Hashes for irisml-0.0.12-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e114ad588fc3ba94a30a89601c682dd143d2efa088d8dc609ffcc23d9de8e3d5`
MD5	`79c4bd2217ec451f44295d36964c9f70`
BLAKE2b-256	`e2c9ad716ca291464f6c030b25ebad27f14ac22b5445ada89e8d8708b9aeddc5`