No project description provided
Project description
valohai-utils
Python helper library for the Valohai machine learning platform.
Install
pip install valohai-utils
Execution
Run locally
python mycode.py
Run in the cloud
vh yaml step mycode.py
vh exec run -a mystep
What does valohai-utils do?
- Generates and updates the
valohai.yamlconfiguration file based on the source code - Agnostic input handling (single file, multiple files, zip, tar)
- Parse command-line parameters
- Compress outputs
- Download inputs for local experiments
- Straightforward way to print metrics as Valohai metadata
- Code parity between local vs. cloud
Parameters
Valohai parameters are variables & hyper-parameters that are parsed from the command-line. You define parameters in a dictionary:
default_parameters = {
'iterations': 100,
'learning_rate': 0.001
}
The dictionary is fed to valohai.prepare() method:
The values given are default values. You can override them from the command-line or using the Valohai web UI.
Example
import valohai
default_parameters = {
'iterations': 10,
}
valohai.prepare(step="helloworld", default_parameters=default_parameters)
for i in range(valohai.parameters('iterations').value):
print("Iteration %s" % i)
Inputs
Valohai inputs are the data files required by the experiment. They are automatically downloaded for you, if the data is from a public source. You define inputs with a dictionary:
default_inputs = {
'input_name': 'http://example.com/1.png'
}
An input can also be a list of URLs or a folder:
default_inputs = {
'input_name': [
'http://example.com/1.png',
'http://example.com/2.png'
],
'input_folder': [
's3://mybucket/images/*',
'azure://mycontainer/images/*',
'gs://mybucket/images/*'
]
}
Or it can be an archive full of files (uncompressed automagically on-demand):
default_inputs = {
'images': 'http://example.com/myimages.zip'
}
The dictionary is fed to valohai.prepare() method.
The url(s) given are defaults. You can override them from the command-line or using the Valohai web UI.
Example
import csv
import valohai
default_inputs = {
'myinput': 'https://pokemon-images-example.s3-eu-west-1.amazonaws.com/pokemon.csv'
}
valohai.prepare(step="test", default_inputs=default_inputs)
with open(valohai.inputs("myinput").path()) as csv_file:
reader = csv.reader(csv_file, delimiter=',')
Outputs
Valohai outputs are the files that your step produces an end result.
When you are ready to save your output file, you can query for the correct path from the valohai-utils.
Example
image = Image.open(in_path)
new_image = image.resize((width, height))
out_path = valohai.outputs('resized').path('resized_image.png')
new_image.save(out_path)
Sometimes there are so many outputs that you may want to compress them into a single file.
In this case, once you have all your outputs saved, you can finalize the output with the compress() method.
Example
valohai.outputs('resized').compress("*.png", "images.zip", remove_originals=True)
Logging
You can log metrics using the Valohai metadata system and then render interactive graphs on the web interface. The valohai-utils logger will print JSON logs that Valohai will parse as metadata.
It is important for visualization that logs for single epoch are flushed out as a single JSON object.
Example
import valohai
for epoch in range(100):
with valohai.metadata.logger() as logger:
logger.log("epoch", epoch)
logger.log("accuracy", accuracy)
logger.log("loss", loss)
Example 2
import valohai
logger = valohai.logger()
for epoch in range(100):
logger.log("epoch", epoch)
logger.log("accuracy", accuracy)
logger.log("loss", loss)
logger.flush()
Execution Info
valohai.execution contains information about the current execution context.
import valohai
execution_config = valohai.execution().config
print(f"Execution ID: {execution_config.id}")
print(f"Execution title: {execution_config.title}")
print(f"Execution counter: {execution_config.counter}")
Distributed Workloads
valohai.distributed contains a toolset for running distributed tasks on Valohai.
import valohai
if valohai.distributed.is_distributed_task():
# `master()` reports the same worker on all contexts
master = valohai.distributed.master()
master_url = f'tcp://{master.primary_local_ip}:1234'
# `members()` contains all workers in the distributed task
member_public_ips = ",".join([
m.primary_public_ip
for m
in valohai.distributed.members()
])
# `me()` has full details about the current worker context
details = valohai.distributed.me()
size = valohai.distributed.required_count
rank = valohai.distributed.rank # 0, 1, 2, etc. depending on run context
Full example
Preprocess step for resizing image files
This example step will do the following:
- Take image files (or an archive containing images) as input.
- Resize each image to the size provided by the width & height parameters.
- Compress the resized images into
resized/images.zipValohai output file.
import os
import valohai
from PIL import Image
default_parameters = {
"width": 640,
"height": 480,
}
default_inputs = {
"images": [
"https://dist.valohai.com/valohai-utils-tests/Example.jpg",
"https://dist.valohai.com/valohai-utils-tests/planeshark.jpg",
],
}
valohai.prepare(step="resize", default_parameters=default_parameters, default_inputs=default_inputs)
def resize_image(in_path, out_path, width, height, logger):
image = Image.open(in_path)
logger.log("from_width", image.size[0])
logger.log("from_height", image.size[1])
logger.log("to_width", width)
logger.log("to_height", height)
new_image = image.resize((width, height))
new_image.save(out_path)
if __name__ == '__main__':
for image_path in valohai.inputs('images').paths():
with valohai.metadata.logger() as logger:
filename = os.path.basename(image_path)
resize_image(
in_path=image_path,
out_path=valohai.outputs('resized').path(filename),
width=valohai.parameters('width').value,
height=valohai.parameters('height').value,
logger=logger
)
valohai.outputs('resized').compress("*", "images.zip", remove_originals=True)
CLI command:
vh yaml step resize.py
Will produce this valohai.yaml config:
- step:
name: resize
image: python:3.11-slim
command: python ./resize.py {parameters}
parameters:
- name: width
default: 640
multiple-separator: ","
optional: false
type: integer
- name: height
default: 480
multiple-separator: ","
optional: false
type: integer
inputs:
- name: images
default:
- https://dist.valohai.com/valohai-utils-tests/Example.jpg
- https://dist.valohai.com/valohai-utils-tests/planeshark.jpg
optional: false
Configuration
There are some environment variables that affect how valohai-utils works when not running within a Valohai execution context.
VH_FLAT_LOCAL_OUTPUTS- If set, flattens the local outputs directory structure into a single directory. This means that outputs from subsequent runs can clobber old files.
Development
If you wish to further develop valohai-utils, remember to install development dependencies and write tests for your additions.
Linting
Lints are run via pre-commit.
If you want pre-commit to check your commits via git hooks,
pip install pre-commit
pre-commit install
You can also run the lints manually with pre-commit run --all-files.
Testing
pip install -e . -r requirements-dev.txt
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file valohai_utils-0.7.0.tar.gz.
File metadata
- Download URL: valohai_utils-0.7.0.tar.gz
- Upload date:
- Size: 30.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e954ea2b01269eb685aa789743260f84554ac4195fed5570537cdce95c1b1967
|
|
| MD5 |
143b24d007c8ef06b34b64df1ed1eb29
|
|
| BLAKE2b-256 |
76528c11989e45b714d860c58bbafbd9b1a8ef1b241b27a5cc577e74b8f58a38
|
Provenance
The following attestation bundles were made for valohai_utils-0.7.0.tar.gz:
Publisher:
ci.yml on valohai/valohai-utils
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
valohai_utils-0.7.0.tar.gz -
Subject digest:
e954ea2b01269eb685aa789743260f84554ac4195fed5570537cdce95c1b1967 - Sigstore transparency entry: 186045619
- Sigstore integration time:
-
Permalink:
valohai/valohai-utils@223cb2b8884b4501cbb6e1fd59d96c42925b9770 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/valohai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@223cb2b8884b4501cbb6e1fd59d96c42925b9770 -
Trigger Event:
push
-
Statement type:
File details
Details for the file valohai_utils-0.7.0-py3-none-any.whl.
File metadata
- Download URL: valohai_utils-0.7.0-py3-none-any.whl
- Upload date:
- Size: 42.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
554f8a6b89de7d0c3dd6816925e591447e355926a69045c07ac340b167f8453c
|
|
| MD5 |
75be1a06a423b23070ae30abde250d5d
|
|
| BLAKE2b-256 |
9f468d0e32dbdd6192143b3804a5d0a81b5730eda8d707ad89542da3fbc7ffba
|
Provenance
The following attestation bundles were made for valohai_utils-0.7.0-py3-none-any.whl:
Publisher:
ci.yml on valohai/valohai-utils
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
valohai_utils-0.7.0-py3-none-any.whl -
Subject digest:
554f8a6b89de7d0c3dd6816925e591447e355926a69045c07ac340b167f8453c - Sigstore transparency entry: 186045622
- Sigstore integration time:
-
Permalink:
valohai/valohai-utils@223cb2b8884b4501cbb6e1fd59d96c42925b9770 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/valohai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@223cb2b8884b4501cbb6e1fd59d96c42925b9770 -
Trigger Event:
push
-
Statement type: