Skip to main content

No project description provided

Project description

valohai-utils

Python helper library for the Valohai machine learning platform.

Install

pip install valohai-utils

Execution

Run locally

python mycode.py

Run in the cloud

vh yaml step mycode.py
vh exec run -a mystep

What does valohai-utils do?

  • Generates and updates the valohai.yaml configuration file based on the source code
  • Agnostic input handling (single file, multiple files, zip, tar)
  • Parse command-line parameters
  • Compress outputs
  • Download inputs for local experiments
  • Straightforward way to print metrics as Valohai metadata
  • Code parity between local vs. cloud

Parameters

Valohai parameters are variables & hyper-parameters that are parsed from the command-line. You define parameters in a dictionary:

default_parameters = {
    'iterations': 100,
    'learning_rate': 0.001
}

The dictionary is fed to valohai.prepare() method:

The values given are default values. You can override them from the command-line or using the Valohai web UI.

Example

import valohai

default_parameters = {
    'iterations': 10,
}

valohai.prepare(step="helloworld", default_parameters=default_parameters)

for i in range(valohai.parameters('iterations').value):
    print("Iteration %s" % i)

Inputs

Valohai inputs are the data files required by the experiment. They are automatically downloaded for you, if the data is from a public source. You define inputs with a dictionary:

default_inputs = {
    'input_name': 'http://example.com/1.png'
}

An input can also be a list of URLs or a folder:

default_inputs = {
    'input_name': [
        'http://example.com/1.png',
        'http://example.com/2.png'
    ],
    'input_folder': [
        's3://mybucket/images/*',
        'azure://mycontainer/images/*',
        'gs://mybucket/images/*'
    ]
}

Or it can be an archive full of files (uncompressed automagically on-demand):

default_inputs = {
    'images': 'http://example.com/myimages.zip'
}

The dictionary is fed to valohai.prepare() method.

The url(s) given are defaults. You can override them from the command-line or using the Valohai web UI.

Example

import csv
import valohai

default_inputs = {
    'myinput': 'https://pokemon-images-example.s3-eu-west-1.amazonaws.com/pokemon.csv'
}

valohai.prepare(step="test", default_inputs=default_inputs)

with open(valohai.inputs("myinput").path()) as csv_file:
    reader = csv.reader(csv_file, delimiter=',')

Outputs

Valohai outputs are the files that your step produces an end result.

When you are ready to save your output file, you can query for the correct path from the valohai-utils.

Example

image = Image.open(in_path)
new_image = image.resize((width, height))
out_path = valohai.outputs('resized').path('resized_image.png')
new_image.save(out_path)

Sometimes there are so many outputs that you may want to compress them into a single file.

In this case, once you have all your outputs saved, you can finalize the output with the compress() method.

Example

valohai.outputs('resized').compress("*.png", "images.zip", remove_originals=True)

Logging

You can log metrics using the Valohai metadata system and then render interactive graphs on the web interface. The valohai-utils logger will print JSON logs that Valohai will parse as metadata.

It is important for visualization that logs for single epoch are flushed out as a single JSON object.

Example

import valohai

for epoch in range(100):
    with valohai.metadata.logger() as logger:
        logger.log("epoch", epoch)
        logger.log("accuracy", accuracy)
        logger.log("loss", loss)

Example 2

import valohai

logger = valohai.logger()
for epoch in range(100):
    logger.log("epoch", epoch)
    logger.log("accuracy", accuracy)
    logger.log("loss", loss)
    logger.flush()

Distributed Workloads

valohai.distributed contains a toolset for running distributed tasks on Valohai.

import valohai

if valohai.distributed.is_distributed_task():

    # `master()` reports the same worker on all contexts
    master = valohai.distributed.master()
    master_url = f'tcp://{master.primary_local_ip}:1234'

    # `members()` contains all workers in the distributed task
    member_public_ips = ",".join([
        m.primary_public_ip
        for m
        in valohai.distributed.members()
    ])

    # `me()` has full details about the current worker context
    details = valohai.distributed.me()

    size = valohai.distributed.required_count
    rank = valohai.distributed.rank  # 0, 1, 2, etc. depending on run context

Full example

Preprocess step for resizing image files

This example step will do the following:

  1. Take image files (or an archive containing images) as input.
  2. Resize each image to the size provided by the width & height parameters.
  3. Compress the resized images into resized/images.zip Valohai output file.
import os
import valohai
from PIL import Image

default_parameters = {
    "width": 640,
    "height": 480,
}
default_inputs = {
    "images": [
        "https://dist.valohai.com/valohai-utils-tests/Example.jpg",
        "https://dist.valohai.com/valohai-utils-tests/planeshark.jpg",
    ],
}

valohai.prepare(step="resize", default_parameters=default_parameters, default_inputs=default_inputs)


def resize_image(in_path, out_path, width, height, logger):
    image = Image.open(in_path)
    logger.log("from_width", image.size[0])
    logger.log("from_height", image.size[1])
    logger.log("to_width", width)
    logger.log("to_height", height)
    new_image = image.resize((width, height))
    new_image.save(out_path)


if __name__ == '__main__':
    for image_path in valohai.inputs('images').paths():
        with valohai.metadata.logger() as logger:
            filename = os.path.basename(image_path)
            resize_image(
                in_path=image_path,
                out_path=valohai.outputs('resized').path(filename),
                width=valohai.parameters('width').value,
                height=valohai.parameters('height').value,
                logger=logger
            )
    valohai.outputs('resized').compress("*", "images.zip", remove_originals=True)

CLI command:

vh yaml step resize.py

Will produce this valohai.yaml config:

- step:
    name: resize
    image: python:3.7
    command: python ./resize.py {parameters}
    parameters:
    - name: width
      default: 640
      multiple-separator: ','
      optional: false
      type: integer
    - name: height
      default: 480
      multiple-separator: ','
      optional: false
      type: integer
    inputs:
    - name: images
      default:
      - https://dist.valohai.com/valohai-utils-tests/Example.jpg
      - https://dist.valohai.com/valohai-utils-tests/planeshark.jpg
      optional: false

Configuration

There are some environment variables that affect how valohai-utils works when not running within a Valohai execution context.

  • VH_FLAT_LOCAL_OUTPUTS
    • If set, flattens the local outputs directory structure into a single directory. This means that outputs from subsequent runs can clobber old files.

Development

If you wish to further develop valohai-utils, remember to install development dependencies and write tests for your additions.

Linting

Lints are run via pre-commit.

If you want pre-commit to check your commits via git hooks,

pip install pre-commit
pre-commit install

You can also run the lints manually with pre-commit run --all-files.

Testing

pip install -e . -r requirements-dev.txt
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valohai_utils-0.5.0.tar.gz (28.7 kB view details)

Uploaded Source

Built Distribution

valohai_utils-0.5.0-py3-none-any.whl (40.0 kB view details)

Uploaded Python 3

File details

Details for the file valohai_utils-0.5.0.tar.gz.

File metadata

  • Download URL: valohai_utils-0.5.0.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for valohai_utils-0.5.0.tar.gz
Algorithm Hash digest
SHA256 b6d46a3c880a35d45754df4dad1ab8e9baf10f63aa3584437fcde68aae92accb
MD5 1f1464664d23bac70895da404818e318
BLAKE2b-256 9e1ee133f2ff780b0432202f2909bca56891652660c9c1dacda982d13a169c5e

See more details on using hashes here.

File details

Details for the file valohai_utils-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for valohai_utils-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5238010918d924f28a9f76140344a06797ff65ee64c11dc1e3be32bb0785a0f5
MD5 89dbf965520b00a2dab5bc5aec6485ab
BLAKE2b-256 9c37c8613ed2b324486b13c01bf23aad0d5c43a723d638a7233a5de9a65791c8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page