Skip to main content

trainML client SDK and command line utilities

Project description


trainML Python SDK and Command Line Tools

Provides programmatic access to trainML platform.

Installation

Python 3.8 or above is required.

pip install trainml

Authentication

Prerequisites

You must have a valid trainML account. On the account settings page click the Create button in the API Keys section. This will automatically download a credentials.json file. This file can only be generated once per API key. Treat this file as a password, as anyone with access to your API key will have the ability to create and control resources in your trainML account. You can deactivate any API key by clicking the Remove button.

Creating resources on the trainML platform requires a non-zero credit balance. To purchase credits or sign-up for automatic credit top-ups, visit the billing page.

Methods

Credentials File

The easiest way to authenticate is to place the credentials file downloaded into the .trainml folder of your home directory and ensure only you have access to it. From the directory that the credentials.json file was downloaded, run the following command:

mkdir -p ~/.trainml
mv credentials.json ~/.trainml/credentials.json
chmod 600 ~/.trainml/credentials.json

Environment Variables

You can also use environment variables TRAINML_USER and TRAINML_KEY and set them to their respective values from the credentials.json file.

export TRAINML_USER=<'user' field from credentials.json>
export TRAINML_KEY=<'key' field from credentials.json>
python create_job.py

Environment variables will override any credentials stored in ~/.trainml/credentials.json

Runtime Variables

API credentials can also be passed directly to the TrainML object constructor at runtime.

import trainml
trainml = trainml.TrainML(user="user field from credentials.json",key="key field from credentials.json>")
await trainml.jobs.create(...)

Passing credentials to the TrainML constructor will override all other methods for setting credentials.

Configuration

By default, all operations using the trainML SDK/CLI will use the Personal project for trainML account the API keys were generated from. To change the active project, run the configure command:

trainml configure

This command will output the currently configured active project (UNSET defaults to Personal) and allows you to specify any project you have access to as the new active project.

Current Active Project: Personal
Select Active Project: (My Other Project, Personal, Project Shared With Me) [Personal]:

Once you select a project, it will store the results of your selection in the config.json file in the TRAINML_CONFIG_DIR folder (~/.trainml by default). Once the active project is set, all subsequent operations will use the selected project.

This setting can also be overridden at runtime using the environment variable TRAINML_PROJECT:

TRAINML_PROJECT=<PROJECT ID> python create_job.py

or by instantiating the trainml client with the project keyword argument:

import trainml
trainml = trainml.TrainML(project="PROJECT ID")
await trainml.jobs.create(...)

You must specify the project ID (not name) when using the runtime options. The project ID can be found by running trainml project list.

Usage

Python SDK

The trainML SDK utilizes the asyncio library to ease the concurrent execution of long running tasks. An example of how to create a dataset from an S3 bucket and immediately run a training job on that dataset is the following:

from trainml.trainml import TrainML
import asyncio


trainml_client = TrainML()

# Create the dataset
dataset = asyncio.run(
    trainml_client.datasets.create(
        name="Example Dataset",
        source_type="aws",
        source_uri="s3://trainml-examples/data/cifar10",
    )
)

print(dataset)

# Watch the log output, attach will return when data transfer is complete
asyncio.run(dataset.attach())

# Create the job
job = asyncio.run(
    trainml_client.jobs.create(
        name="Example Training Job",
        type="training",
        gpu_type="GTX 1060",
        gpu_count=1,
        disk_size=10,
        workers=[
            "PYTHONPATH=$PYTHONPATH:$TRAINML_MODEL_PATH python -m official.vision.image_classification.resnet_cifar_main --num_gpus=1 --data_dir=$TRAINML_DATA_PATH --model_dir=$TRAINML_OUTPUT_PATH --enable_checkpoint_and_export=True --train_epochs=10 --batch_size=1024",
        ],
        data=dict(
            datasets=[dict(id=dataset.id, type="existing")],
            output_uri="s3://trainml-examples/output/resnet_cifar10",
            output_type="aws",
        ),
        model=dict(git_uri="git@github.com:trainML/test-private.git"),
    )
)
print(job)

# Watch the log output, attach will return when the training job stops
asyncio.run(job.attach())

# Cleanup job and dataset
asyncio.run(job.remove())
asyncio.run(dataset.remove())

See more examples in the examples folder

Command Line Interface

The command line interface is rooted in the trainml command. To see the available options, run:

trainml --help

To list all jobs:

trainml job list

To list all datasets:

trainml dataset list

To connect to a job that requires the connection capability:

trainml job connect <job ID or name>

To watch the realtime job logs:

trainml job attach <job ID or name>

To create and open a notebook job:

trainml job create notebook "My Notebook Job"

To create a multi-GPU notebook job on a specific GPU type with larger scratch directory space:

trainml job create notebook --gpu-type "RTX 3090" --gpu-count 4 --disk-size 50 "My Notebook Job"

To run the model training code in the train.py file in your local ~/model-code directory on the training data in your local ~/data directory:

trainml job create training --model-dir ~/model-code --data-dir ~/data "My Training Job" "python train.py"

Stop a job by job ID:

trainml job stop fe52527c-1f4b-468f-b57d-86db864cc089

Stop a job by name:

trainml job stop "My Notebook Job"

Restart a notebook job:

trainml job start "My Notebook Job"

Remove a job by job ID:

trainml job remove fe52527c-1f4b-468f-b57d-86db864cc089

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trainml-0.5.13.tar.gz (85.0 kB view details)

Uploaded Source

Built Distribution

trainml-0.5.13-py3-none-any.whl (152.8 kB view details)

Uploaded Python 3

File details

Details for the file trainml-0.5.13.tar.gz.

File metadata

  • Download URL: trainml-0.5.13.tar.gz
  • Upload date:
  • Size: 85.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for trainml-0.5.13.tar.gz
Algorithm Hash digest
SHA256 5eb3e1a0bb9c62e0a5f60327d1b6419121095b1374f97867ba34d7b05fd66b73
MD5 70c2dc58c5349b918fc860611eaca6f1
BLAKE2b-256 97e971a114ecb535cfc17ca2855258aa0801b96b5a6400a7b570ef25cbc8fc7d

See more details on using hashes here.

File details

Details for the file trainml-0.5.13-py3-none-any.whl.

File metadata

  • Download URL: trainml-0.5.13-py3-none-any.whl
  • Upload date:
  • Size: 152.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for trainml-0.5.13-py3-none-any.whl
Algorithm Hash digest
SHA256 a96bef65da86d00977c1c839c554e4d61698b92b6bdd753357489f6b20b510ea
MD5 2884686751a7df32c09422fdc1a10c45
BLAKE2b-256 09ff488187eb2d712c9209f51063926d37f8585b704f9cdff6542ec4083c0775

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page