Skip to main content

clearml-serving - Model-Serving Orchestration and Repository Solution

Project description

ClearML Serving - ML-Ops made easy

clearml-serving
Model-Serving Orchestration and Repository Solution

GitHub license PyPI pyversions PyPI version shields.io PyPI status Slack Channel

clearml-serving is a command line utility for the flexible orchestration of your model deployment.
clearml-serving can make use of a variety of serving engines (Nvidia Triton, OpenVino Model Serving, KFServing) setting them up for serving wherever you designate a ClearML Agent or on your ClearML Kubernetes cluster

Features:

  • Spin serving engines on your Kubernetes cluster or ClearML Agent machine from CLI
  • Full usage & performance metrics integrated with ClearML UI
  • Multi-model support in a single serving engine container
  • Automatically deploy new model versions
  • Support Canary model releases
  • Integrates to ClearML Model Repository
  • Deploy & upgrade endpoints directly from ClearML UI
  • Programmatic interface for endpoint/versions/metric control

Installing ClearML Serving

  1. Setup your ClearML Server or use the Free tier Hosting
  2. Connect your ClearML Worker(s) to your ClearML Server (see ClearML Agent / Kubernetes integration)
  3. Install clearml-serving (Note: clearml-serving is merely a control utility, it does not require any resources for actual serving)
pip install clearml-serving

Using ClearML Serving

Clearml-Serving will automatically serve published models from your ClearML model repository, so the first step is getting a model into your ClearML model repository.
Background: When using clearml in your training code, any model stored by your python code is automatically registered (and, optionally, uploaded) to the model repository. This auto-magic logging is key for continuous model deployment.
To learn more on training models and the ClearML model repository, see the ClearML documentation

Training a toy model with Keras (about 2 minutes on a laptop)

The main goal of clearml-serving is to seamlessly integrate with the development process and the model repository. This is achieved by combining ClearML's auto-magic logging which creates and uploads models directly from the python training code, with accessing these models as they are automatically added into the model repository using the ClearML Server's REST API and its pythonic interface.
Let's demonstrate this seamless integration by training a toy Keras model to classify images based on the MNIST dataset. Once we have a trained model in the model repository we will serve it using clearml-serving.

We'll also see how we can retrain another version of the model, and have the model serving engine automatically upgrade to the new model version.

Keras mnist toy train example (single epoch mock training):

  1. install tensorflow (and of course cleamrl)

    pip install "tensorflow>2" clearml
    
  2. Execute the training code

    cd examples/keras
    python keras_mnist.py
    

    Notice: The only required integration code with clearml are the following two lines:

    from clearml import Task
    task = Task.init(project_name="examples", task_name="Keras MNIST serve example", output_uri=True)
    

    This call will make sure all outputs are automatically logged to the ClearML Server, this includes: console, Tensorboard, cmdline arguments, git repo etc.
    It also means any model stored by the code will be automatically uploaded and logged in the ClearML model repository.

  3. Review the models in the ClearML web UI:
    Go to the "Projects" section of your ClearML server (free hosted or self-deployed).
    in the "examples" project, go to the Models tab (model repository).
    We should have a model named "Keras MNIST serve example - serving_model".
    Once a model-serving service is available, Right-clicking on the model and selecting "Publish" will trigger upgrading the model on the serving engine container.

Next we will spin the Serving Service and the serving-engine

Serving your models

In order to serve your models, clearml-serving will spawn a serving service which stores multiple endpoints and their configuration, collects metric reports, and updates models when new versions are published in the model repository.
In addition, a serving engine is launched, which is the container actually running the inference engine.
(Currently supported engines are Nvidia-Triton, coming soon are Intel OpenVIno serving-engine and KFServing)

Now that we have a published model in the ClearML model repository, we can spin a serving service and a serving engine.

Starting a Serving Service:

  1. Create a new serving instance.
    This is the control plane Task, we will see all its configuration logs and metrics in the "serving" project. We can have multiple serving services running in the same system.
    In this example we will make use of Nvidia-Triton engines.
clearml-serving triton --project "serving" --name "serving example"
  1. Add models to the serving engine with specific endpoints.
    Reminder: to view your model repository, login to your ClearML account, go to "examples" project and review the "Models" Tab
clearml-serving triton --endpoint "keras_mnist"  --model-project "examples" --model-name "Keras MNIST serve example - serving_model"
  1. Launch the serving service.
    The service will be launched on your "services" queue, which by default runs services on the ClearML server machine.
    (Read more on services queue here)
    We set our serving-engine to launch on the "default" queue,
clearml-serving launch --queue default
  1. Optional: If you do not have a machine connected to your ClearML cluster, either read more on our Kubernetes integration, or spin a bare-metal worker and connect it with your ClearML Server.
    clearml-serving is leveraging the orchestration capabilities of ClearML to launch the serving engine on the cluster.
    Read more on the ClearML Agent orchestration module here
    If you have not yet setup a ClearML worker connected to your clearml account, you can do this now using:
    pip install clearml-agent
    clearml-agent daemon --docker --queue default --detached
    

We are done! To test the new served model, you can curl to the new endpoint:

curl <serving-engine-ip>:8000/v2/models/keras_mnist/versions/1

Notice: If we re-run our keras training example and publish a new model in the repository, the engine will automatically update to the new model.

Further reading on advanced topics here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

clearml_serving-0.3.2-py3-none-any.whl (16.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page