Skip to main content

DeGirum AI Inference Software Package

Project description

PySDK - DeGirum AI Inference Software Development Kit

Version 0.7.0

Copyright DeGirum Corp. 2022-2023

The DeGirum AI Inference Software Development Kit, PySDK, is a Python package which provides APIs to perform inference on ML models.

PySDK consists of the following components:

  1. PySDK library, which includes the following essential modules:
    • degirum.zoo_manager - AI model zoo management classes
    • degirum.model module - AI model management classes
    • degirum.postprocessor module - AI inference result handling classes
  2. The CoreClient extension module implementing APIs to work with DeGirum AI accelerator cards installed locally
  3. The aiclient extension module implementing APIs to work with DeGirium AI servers running on remote systems
  4. The degirum.server module - helper module that allows launching AI server on a local AI hardware to be used by remote clients.
  5. The degirum console script implementing command-line interface to various PySDK management facilities.

The PySDK package provides the necessary tools to support the following use cases:

  • Local Inference: perform AI inferences on the local system using DeGirum AI accelerator hardware installed on the same system
  • AI Server Hosting: deploy the system with installed DeGirum AI accelerator hardware as AI server
  • AI Server Inference: connect to one or several AI servers and perform AI inferences on those servers remotely
  • Cloud Inference: connect to DeGirum Cloud Platform and perform AI inferences in the cloud

Installation

Basic Installation of PySDK Python Package

It is recommended to install PySDK in a virtual environment. Please see PySDK Supported Configurations page for a list of supported system configurations.

To install DeGirum PySDK from the DeGirum index server use the following command:

python -m pip install degirum --extra-index-url https://degirum.github.io/simple

To force reinstall the most recent PySDK version without reinstalling all dependencies use the following command:

python -m pip install degirum --upgrade --no-deps --force-reinstall --extra-index-url https://degirum.github.io/simple

Kernel Driver Installation

Linux Kernel Driver Installation

Hosts that have DeGirum Orca card installed need a driver to enable its functionality. The driver is distributed as a source package via a DeGirum aptitude repository. It can be built automatically after download.

  1. Add the following line to /etc/apt/sources.list to register the DeGirum APT repository:

    deb-src https://degirum.github.io/apt-repo ORCA main
    
  2. Download DeGirum public key by running the following command:

    wget -O - -q http://degirum.github.io/apt-repo/DeGirum.gpg.key | sudo apt-key add -
    
  3. Update package information from configured sources, and download prerequisites by running the following commands:

    sudo apt update
    sudo apt install dpkg-dev debhelper
    
  4. Finally, download and build DeGirum Linux driver package by running the following command:

    sudo apt-get --build source orca-driver
    

Kernel Driver Installation for Other Operating Systems

The current version of DeGirum software package supports only Linux kernel driver. Kernel driver support for other operating systems is under development. This document will be updated when kernel drivers for other operating systems will be released.

Quick Start

Note: This quick start guide covers the Local inference use case when you run AI inferences on the local host with DeGirum AI accelerator hardware installed on this local host as an option. See System Configuration for Specific Use Cases section for more use cases.

To start working with PySDK you import degirum package:

import degirum as dg

The main PySDK entry point is degirum.connect function, which creates and returns degirum.zoo_manager.ZooManager zoo manager object (for detailed explanation of PySDK concepts refer to Model Zoo Manager section):

zoo = dg.connect(dg.CLOUD, "https://cs.degirum.com", <your cloud API access token>")

When instantiated this way, zoo manager automatically connects to DeGirum Public cloud model zoo, and you have free access to all AI models from this public model zoo. However, to access the public cloud zoo you need a cloud API access token, which you can generate on DeGirum Cloud Portal site https://cs.degirum.com under Management | My Tokens main menu item (see more on that in Configuration for Cloud Inference section).

To see the list of all AI models available for inference in the public model zoo, use degirum.zoo_manager.ZooManager.list_models method. It returns a list of strings, where each string is a model name:

model_list = zoo.list_models()
print(model_list)

If you want to perform AI inference using some model, you need to load it using degirum.zoo_manager.ZooManager.load_model method. You provide the model name as a method argument. The model name should be one of model names returned by list_models() method, for example:

model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1")

The load_model() method returns a degirum.model.Model object, which can be used to perform AI inferences.

Before performing AI inferences, you may want to adjust some model parameters. The model class has a list of parameters which can be modified during runtime. These model parameters affect how the input data is pre-processed and how the inference results are post-processed. All model parameters have reasonable default values, so in the beginning you may skip this step.

Some usable model parameters are:

Property Name Description Possible Values
degirum.model.Model.image_backend image processing package to be used "auto", "pil", or "opencv";
"auto" tries PIL first
degirum.model.Model.input_pad_method how input image will be padded or cropped when resized "stretch", "letterbox", "crop-first", or "crop-last"
degirum.model.Model.input_crop_percentage percentage of image dimensions to crop around if "input_pad_method" is set to "crop-first" or "crop-last" Float value in [0..1] range
degirum.model.Model.output_confidence_threshold confidence threshold to reject results with low scores Float value in [0..1] range
degirum.model.Model.output_nms_threshold rejection threshold for non-max suppression Float value in [0..1] range
degirum.model.Model.overlay_color color to draw AI results Tuple in (R,G,B) format or list of tuples in (R,G,B) format
degirum.model.Model.overlay_font_scale font scale to print AI results Float value
degirum.model.Model.overlay_show_labels True to show class labels when drawing AI results True/False
degirum.model.Model.overlay_show_probabilities True to show class probabilities when drawing AI results True/False

For the complete list of model parameters see Model Parameters section.

Now you are ready to perform AI inference. To do an inference you either invoke degirum.model.Model.predict method or simply call the model, supplying the input image as an argument. The inference result is returned.

A model may accept input images in various formats:

  • as a string containing the file name of the image file on the local file system:

    result = model("./images/TwoCats.jpg")
    
  • as a string containing URL of the image file:

    result = model("https://degirum.github.io/images/samples/TwoCats.jpg")
    
  • as a PIL image object:

    from PIL import Image
    image = Image.open("./images/TwoCats.jpg")
    result = model(image)
    
  • as a numpy array (for example, returned by OpenCV):

    import cv2
    image = cv2.imread("./images/TwoCats.jpg")
    model.input_numpy_colorspace = "BGR" # set colorspace to match OpenCV-produced numpy array
    result = model(image)
    

The result object returned by the model (an object derived from degirum.postprocessor.InferenceResults class) contains the following information:

  • numeric inference results
  • graphical inference results
  • original image

Numeric results can be accessed by degirum.postprocessor.InferenceResults.results property. This property returns a list of result dictionaries, one dictionary per detected object or class. The format of this dictionary is model-dependent. For example, to iterate over all classification model inference results you may do this:

for r in result.results:
   print(f"Detected class: '{r['label']}' with probability {r['score']}")

Tip: if you just print your inference result object, all the numeric results will be pretty-printed in YAML format:

print(result)

Graphical results can be accessed by degirum.postprocessor.InferenceResults.image_overlay property. This property returns a graphical object containing an original image with all inference results draw over it. The graphical object type depends on the graphical package specified for the model image_backend (if you omit it, it will be PIL, if PIL is installed, otherwise OpenCV). Once you get this object, you may display it, print it, or save it to a file the way you like using the graphical package of your choice. For example, for PIL:

result_image = result.image_overlay
result_image.save("./images/TwoCatsResults.jpg")
result_image.show()

The original image can be accessed by degirum.postprocessor.InferenceResults.image property, which returns graphical object, whose type again depends on the graphical package specified for the model image_backend.

System Configuration for Specific Use Cases

The PySDK package can be used in the following use cases:

  • Local inference: running AI inferences on the local host with DeGirum AI accelerator hardware installed on this local host as an option
  • AI Server inference: running AI inferences on remote AI server host with DeGirum AI accelerator hardware installed on that remote host
  • Cloud Inference: running AI inferences on DeGirum Cloud Platform, which automatically manages a set of AI computing nodes equipped with different types of AI accelerator hardware including DeGirum Orca, Google EdgeTPU, and Intel Myriad
  • AI Server hosting: configuring the host with DeGirum AI accelerator hardware as AI server to be used for remote AI inferences

The following sections provide step-by-step instructions how to setup the system for the particular use case. For detailed explanation of PySDK concepts refer to Model Zoo Manager section.

Configuration for Local Inference

  1. Install PySDK package as described in Basic Installation of PySDK Python Package section.

  2. If your system is equipped with DeGirum AI accelerator hardware, install the kernel driver as described in Kernel Driver Installation section.

    Note: If your system is not equipped with any AI accelerator hardware, the set of AI models available for local inference will be limited only to CPU models.

  3. For local inferences, you call degirum.connect passing degirum.LOCAL constant or "@local" string as a first argument.

  4. If you plan to operate with DeGirum-hosted cloud model zoo, then you call degirum.connect providing the cloud model zoo URL in the "https://cs.degirum.com[/<zoo URL>]" format and the cloud API access token as the second and third arguments:

    zoo = dg.connect(dg.LOCAL, "https://cs.degirum.com/my_organization/my_zoo", 
       token="<your cloud API access token>")
    

    You can obtain the model zoo URL on DeGirum Cloud Portal site https://cs.degirum.com under Management | Models main menu item. If <zoo URL> suffix is not specified, then DeGirum Public cloud model zoo will be used.

    You can generate your token on DeGirum Cloud Portal site https://cs.degirum.com under Management | My Tokens main menu item.

  5. If you plan to operate with locally deployed model, then you call degirum.connect providing the the full path to the model JSON file as the second argument, omitting the third argument:

    zoo = dg.connect(dg.LOCAL, "full/path/to/model.json")
    

Configuration for AI Server Hosting

  1. Install the kernel driver as described in Kernel Driver Installation section.
  2. Follow instructions provided in Configuring and Launching AI Server section.

Configuration for AI Server Inference

Make sure your AI server host is already configured as described in the Configuration for AI Server Hosting section, and it is up and running.

  1. Install PySDK package as described in Basic Installation of PySDK Python Package section.

  2. If you plan to operate with a model zoo, locally deployed on the AI server host system, you call degirum.connect providing just the hostname or IP address of the AI server host you want to use for AI inference:

    zoo = dg.connect("192.168.0.118")
    
  3. If you plan to operate with AI server taking models from a cloud model zoo of your choice, you call degirum.connect providing the hostname or IP address of the AI server host as the first argument, the cloud zoo URL as the second argument, and the cloud API access token as the third argument. The cloud model zoo URL you specify in the "https://cs.degirum.com[/<zoo URL>]" format:

    zoo = dg.connect( "192.168.0.118", "https://cs.degirum.com/my_organization/my_zoo",
       token="<your cloud API access token>")
    

    You can obtain the model zoo URL on DeGirum Cloud Portal site https://cs.degirum.com under Management | Models main menu item. If <zoo URL> suffix is not specified, then DeGirum Public cloud model zoo will be used.

    You can generate your token on DeGirum Cloud Portal site https://cs.degirum.com under Management | My Tokens main menu item.

Configuration for Cloud Inference

Starting from ver. 0.3.0 the PySDK supports inferences on DeGirum Cloud Platform, and starting from ver. 0.5.0 the PySDK supports cloud model zoo access.

DeGirum Cloud Platform solves the Edge AI development problem by providing the toolchain to design, run, and evaluate ML models across different hardware platforms in the cloud.

DeGirum Cloud Platform includes the following components:

  1. DeGirum Cloud Device Farm accessible through the Cloud Application Server and PySDK
  2. DeGirum Cloud Portal Management site, cs.degirum.com
  3. Cloud model zoos hosted on DeGirum Cloud Platform.

The DeGirum Cloud Device Farm is a set of computing nodes with various AI hardware accelerators installed in those nodes, including: DeGirum Orca, Google Edge TPU, Intel® Movidius™ Myriad™ VPU. The farm nodes are hosted by DeGirum.

The Cloud Application Server provides web API to perform AI inference on the Cloud Farm devices. Starting from ver. 0.3.0 PySDK supports Application Server web API and provides the same level of functionality transparently to the end user: you may run exactly the same code on the Cloud Platform as it was designed for traditional use cases such as local inference or AI server inference.

The cloud model zoo is a collection of AI models, stored on DeGirum Cloud Platform. A registered DeGirum Cloud Platform user can create and maintain multiple cloud zoos with either private or public access. A model zoo with private access is visible to all users belonging to the same organization. A model zoo with public access is visible to all registered users of DeGirum Cloud Platform. DeGirum maintains the Public Model Zoo with extensive set of AI models available free of charge to all registered users.

The DeGirum Cloud Portal Management site provides GUI to get access to the various Cloud Platform assets such as:

  • cloud API access tokens, which are required to access the Cloud Application Server through PySDK;
  • cloud model zoos belonging to the user's organization;
  • Jupiter Lab container with per-user private storage to quickly run Python scripts using pre-installed PySDK;
  • PySDK documentation;
  • PySDK examples;
  • Graphit AI ™ visual AI model design tool;
  • Cloud Model Compiler and Model Parameters Wizard.

To get started with DeGirum Cloud Platform perform the following steps:

  1. Contact DeGirum Customer Support at support@degirum.com or click Request Access button on cs.degirum.com main page to request an invitation. Once the invitation request gets processed, you will receive the invitation e-mail.

  2. Follow the link in the invitation e-mail to activate your DeGirum Cloud Platform account. On the activation page you will be asked for the new password.

  3. Once your account is activated, you login into the Cloud Management Portal by clicking the Login button on cs.degirum.com main page. Use your e-mail as the login name.

  4. Once logged in, you will see the DeGirum Cloud Portal Management menu. Click on the Management | My Tokens menu item to open the Tokens Management page.

  5. Generate new API token by clicking the Generate new token button. You need to provide the token name (arbitrary string) and the token expiration date (or you may select Unlimited Token option).

  6. Once the token is generated, you copy it into a clipboard by clicking the Copy button. Please save the token string in some secure place: you will need this token string on later steps in order to access the Cloud Application Server via PySDK.

  7. If you develop new Python script with DeGirum PySDK or if you already have a Python script which uses DeGirum PySDK and which you want to run on DeGirum Cloud Platform, then find the line of code, which invokes degirum.connect method and change it the following way:

    zoo = dg.connect(dg.CLOUD, "https://cs.degirum.com", token = "<your cloud API access token>" )
    

Here "https://cs.degirum.com" is the URL of the DeGirum Cloud Application Server, and "<your cloud API access token>" is the token string you generated on the previous step.

All the rest of your code does not need any modifications compared to traditional use cases: PySDK support of the Cloud Platform is completely transparent to the end user.

If you specify just "https://cs.degirum.com" URL, then the DeGirum Public cloud model zoo will be used.

If you want to work with the cloud model zoo of your choice, then specify the URL in the "https://cs.degirum.com/<zoo URL>" format. Here <zoo URL> suffix is the name of the cloud model zoo in the form of <organization>/<zoo>. You can obtain the model zoo URL suffix on DeGirum Cloud Portal site https://cs.degirum.com under Management | Models main menu item: just select the model zoo you want to access to open the model zoo page and click copy button near the model zoo name to copy the model zoo URL suffix into the clipboard.

Model Zoo Manager

There are three main concepts in PySDK: the AI inference engine, the AI model zoo, and the AI model. The AI inference engines perform inferences of AI models, while AI model zoos are places where these models are stored.

PySDK supports the following AI inference types:

  1. Local inference: when the client application uses PySDK to directly communicate with the AI hardware accelerator installed on the same computer where this application runs.

  2. AI Server inference: when the AI hardware accelerator is controlled by the DeGirum AI Server software stack, and the client application communicates with that AI Server to perform AI inferences. The client application and the AI server can run on two different computers connected to the same local network.

  3. Cloud inference: when the client application communicates with the DeGirum Cloud Platform software over the Internet to perform AI inferences on the DeGirum Cloud Farm devices.

PySDK supports the following AI model zoo types:

  1. Local model zoo: when the set of AI models is located in some local directory on the computer with AI hardware accelerator. In the case of the local inference, the local model zoo is located on the computer where you run your application. In the case of AI server inference the local model zoo is located on the computer where AI Server software is installed.

  2. Cloud model zoo: when the set of AI models is located on the DeGirum Cloud Platform. You create and maintain cloud model zoos using DeGirum Cloud Platform web GUI. There are two types of cloud model zoos: public and private. A public model zoo is visible to all registers cloud users, while a private model zoo is visible only the the members of your organization.

Almost all combinations of AI inference type and model zoo type are supported by the PySDK:

  1. A cloud inference using a cloud model zoo
  2. An AI Server inference using a cloud model zoo
  3. An AI Server inference a using a local model zoo
  4. A local inference using a cloud model zoo
  5. A local inference using a particular model from a local model zoo

Note: the combination of a cloud inference with a local model zoo is not supported.

The PySDK starting point is degirum.connect function, which creates and returns degirum.zoo_manager.ZooManager model zoo manager object. This function has the following parameters, which specify the inference type and the model zoo to use:

  • inference_host_address: inference engine designator; it defines which inference engine to use.

    For AI Server-based inference it can be either the hostname or IP address of the AI Server host, optionally followed by the port number in the form host:port.

    For DeGirum Cloud Platform-based inference it is the string "@cloud" or degirum.CLOUD constant.

    For local inference it is the string "@local" or degirum.LOCAL constant.

  • zoo_url: model zoo URL string which defines the model zoo to operate with.

    For a cloud model zoo, it is specified in the following format: <cloud server prefix>[/<zoo suffix>]. The <cloud server prefix> part is the cloud platform root URL, typically https://cs.degirum.com. The optional <zoo suffix> part is the cloud zoo URL suffix in the form <organization>/<model zoo name>. You can confirm zoo URL suffix by visiting your cloud user account and opening the model zoo management page. If <zoo suffix> is not specified, then DeGirum public model zoo degirum/public is used.

    For AI Server-based inferences, you may omit both zoo_url and token parameters. In this case locally-deployed model zoo of the AI Server will be used.

    For local AI hardware inferences, if you want to use particular AI model from your local drive, then you specify zoo_url parameter equal to the path to that model's .json configuration file. The token parameter is not needed in this case.

  • token: cloud API access token used to access the cloud zoo. To obtain this token you need to open a user account on DeGirum cloud platform. Please login to your account and go to the token generation page to generate an API access token.

The function returns the model zoo manager object, which connects to the model zoo of your choice and provides the following functionality:

  • list and search models available in the connected model zoo;
  • create appropriate AI model handling objects to perform AI inferences;
  • request various AI model parameters.

Model Zoo URL Cheat Sheet

Inference Type Model Zoo Type connect() parameters
Cloud inference Cloud zoo zoo = dg.connect(dg.CLOUD, "https://cs.degirum.com[/<zoo URL>]", "<token>")
AI server inference Cloud zoo zoo = dg.connect("<hostname>", "https://cs.degirum.com[/<zoo URL>]", "<token>")
AI server inference Local zoo zoo = dg.connect("<hostname>")
Local inference Cloud zoo zoo = dg.connect(dg.LOCAL, "https://cs.degirum.com[/<zoo URL>]", "<token>")
Local inference Local file zoo = dg.connect(dg.LOCAL, "/path/to/model.json")

Cloud Model Caching

Each time you request a model for inference from a cloud model zoo on either local system or on AI server host, it gets downloaded to a local filesystem of that inference host into a some internal cache directory.

Cache directories are maintained per each cloud zoo URL.

If the model already exists in the cache directory, its checksum is verified against the checksum of the model in the cloud zoo. If these checksums mismatch, the model from the cloud zoo is downloaded and replaces the model in the cache. This mechanism guarantees that each time you request a cloud model for AI inference, you always get the most up to date model. It greatly simplifies the model deployment on a large quantity of distributed nodes, when each node automatically downloads the updated model from the cloud on the first model inference request.

Cache directories' root location is operating system specific:

  • For Windows it is %APPDATA%/DeGirum
  • For Linux it is ~/.local/share/DeFirum
  • For MacOS it is Library/Application Support/DeGirum

The cache size is limited (currently by 1GB, to avoid uncontrolled growth of model cache directory). Once it is exceeded, the least recently used models get evicted from the cache.

Listing and Searching AI Models

AI model is represented in a model zoo by a set of files stored in the model subdirectory, which is unique for each model. Each model subdirectory contains the following model files:

Model File Description
<model name>.json JSON file containing all model parameters. The name of this file is the name of the model. This file is mandatory.
<model name>.n2x DeGirum Orca binary file containing the model. This file is mandatory for DeGirum Orca models
<model name>.tflite TensorFlow Lite binary file containing the model. This file is mandatory for TFLite models
<model name>.blob OpenVINO binary file containing the model. This file is mandatory for OpenVINO models
<class dictionary>.json JSON file containing class labels for classification, detection, or segmentation models. This file is optional.

To obtain the list of available AI models, you may use degirum.zoo_manager.ZooManager.list_models method. This method accepts arguments which specify the model filtering criteria. All the arguments are optional. If a certain argument is omitted, then the corresponding filtering criterion is not applied. The following filters are available:

Argument Description Possible Values
model_family Model family name filter. Used as a search substring in the model name Any valid substring like "yolo", "mobilenet"
device Inference device filter: a string or a list of strings of device names "orca": DeGirum Orca device
"cpu" : host CPU
"edgetpu": Google EdgeTPU device
precision Model calculation precision filter: a string or a list of strings of model precision labels "quant": quantized model
"float": floating point model
pruned Model density filter: a string or a list of strings of model density labels "dense": dense model
"pruned": sparse/pruned model
runtime Runtime agent type filter: a string or a list of strings of runtime agent types "n2x": DeGirum N2X runtime
"tflite": Google TFLite runtime
"openvino": OpenVINO runtime

The method returns a list of model name strings. These model name strings are to be used later when you load AI models for inference.

The degirum.zoo_manager.ZooManager.list_models method returns the list of models, which was requested at the time you connected to a model zoo by calling degirum.connect. This list of models is then stored inside the Model Zoo manager object, so subsequent calls to list_models method would quickly return the model list without connecting to a remote model zoo. If you suspect that the remote model zoo contents changed, then to update the model list you need to create another instance of Zoo Manager object by calling degirum.connect.

Loading AI Models

Once you obtained the AI model name string, you may load this model for inference by calling degirum.zoo_manager.ZooManager.load_model method and supplying the model name string as its argument. For example:

model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1")

If a model with the supplied name string is found, the load_model() method returns model handling object of degirum.model.Model class. Otherwise, it throws an exception.

Models from Cloud Zoo

If you load the model from the cloud model zoo, this model will be downloaded first and stored in the cache directory associated with that cloud model zoo. If the model already exists in the cache, it will be loaded from that cache but only if the cached model checksum matches the model checksum in the cloud zoo. If checksums do not match, the model from the cloud zoo will be downloaded again into the cache.

Note: your inference host needs to have Internet access to work with cloud zoo models.

Note: Model Zoo manager does not provide any explicit method to download a model from the cloud zoo: the model is downloaded automatically when possible, if the model is not in the cache or the cached model checksum does not match the model checksum in the cloud zoo. However, the PySDK provides degirum download-zoo command to explicitly download the whole or part of a cloud model zoo of your choice to the local directory (see more in PySDK Command-Line Interface section).

Models from AI Server Local Zoo

If you load the model from the AI server local model zoo, the command to load the model will be sent to the AI server: the connected AI server will handle all model loading actions remotely.

Note: The AI Server lists the models that it serves and you can only load those models: the job of managing remote AI server model zoo is not handled by Model Zoo manager class and should be done different way. Please refer to Configuring and Launching AI Server section for details.

Models from Local Drive

In order to work with locally-deployed model, you need to download that model from some model zoo in advance, for example by using PySDK degirum download-zoo command (see more in PySDK Command-Line Interface section). This is the only option which allows you to perform AI inferences without any network connection.

Running AI Model Inference

Once you loaded an AI model and obtained model handling object, you can start doing AI inferences on your model. The following methods of degirum.model.Model class are available to perform AI inferences:

  • degirum.model.Model.predict and degirum.model.Model.__call__ to run prediction of a single data frame
  • degirum.model.Model.predict_batch to run prediction of a batch of frames
  • degirum.model.Model.predict_dir to run prediction of multiple files in a directory

The predict() and __call__ methods behave exactly the same way (actually, __call__ just calls predict()). They accept single argument - input data frame, perform AI inference of that data frame, and return inference result - an object derived from degirum.postprocessor.InferenceResults superclass.

The batch prediction methods, predict_batch() and predict_dir(), perform predictions of multiple frames in a pipelined manner, which is more efficient than just calling predict() method in a loop. These methods are described in details in Batch Inferences section.

Cloud Server Connection Issues

For cloud inferences the connection to the cloud server is performed at the beginning of each predict-style method call, and disconnection is performed at the end of that call. This greatly reduces performance since the cloud server connection/disconnection is relatively long activities. To overcome this problem you may use the model object inside the with block. When predict-style method is called inside the with block, the disconnection is not performed at the end of such call, so consecutive predict call does not perform reconnection as well, thus saving execution time.

The following code demonstrates the approach:

   # here model_name variable stores the model name
   # and data_list variable stores the list of input data frames to process
   
   # wrap the load_model() call in with block to avoid disconnections
   with zoo.load_model(model_name) as model:
      # perform prediction loop
      for data in data_list:
         result = model.predict(data)

Input Data Handling

PySDK model prediction methods support different types of input data. An exact input type depends on the model to be used. The following input data types are supported:

  • image input data
  • audio input data
  • raw tensor input data

The input data object you supply to model predict methods also depends on the number of inputs the model has. If the model has single data input, then the data objects you pass to model predict methods are single objects. IN rare cases the model may have multiple data inputs; in this case the data objects you pass to model predict methods are lists of objects: one object per corresponding input.

The number and the type of inputs of the model are described by the InputType property of the ModelInfo class returned by degirum.model.Model.model_info property (see Model Info section for details about model info properties). The InputType property returns the list of input data types, one type per model input. So the number of model inputs can be deduced by evaluating the length of the list returned by the InputType property.

The following sections describe details of input data handling for various model input types.

Image Input Data Handling

When dealing with model inputs of image type (InputType is equal to "Image"), the PySDK model prediction methods accept a wide variety of input data frame types:

  • the input frame can be the name of a file with frame data;
  • it can be the HTTP URL pointing to a file with frame data;
  • it can be a numpy array with frame data;
  • it can be a PIL Image object;
  • it can by bytes object containing raw frame data.

An AI model requires particular input tensor dimensions and data type which, in most of the cases, does not match the dimensions of the input frame. In this case, PySDK performs automatic conversion of the input frame to the format compatible with AI model input tensor, performing all the necessary conversions such as resizing, padding, colorspace conversion, and data type conversion.

PySDK performs input frame transformations using one of the two graphical packages (called backends): PIL or OpenCV. The backend is selected by degirum.model.Model.image_backend property. By default it is set to auto, meaning that PIL backend will be used first, and if it is not installed, then OpenCV backend will be used. You may explicitly select which backend to use by assigning either "pil" or "opencv" to degirum.model.Model.image_backend property.

Note: In case of OpenCV backend, you cannot pass PIL Image objects to model predict methods.

If your input frame is in the file on a local filesystem, or is accessible through HTTP protocol, pass the filename string or URL string directly to model predict methods: PySDK will (down-)load the file, decode it, and convert to the model input tensor format. The set of supported graphical file formats is defined solely by the graphical backend library you selected, PIL or OpenCV - PySDK does not perform any own decoding.

Sometimes, image conversion to AI model input tensor format requires image resizing. This resizing can be done in two possible ways:

  • preserving the aspect ratio;
  • not preserving the aspect ratio.

In addition, the image can be cropped or not cropped.

You can control the way of image resizing by degirum.model.Model.input_pad_method property, which can have one of the following values: "stretch", "letterbox", "crop-first", and "crop-last".

When you select the "stretch" method, the input image is resized exactly to the AI model input tensor dimensions, possibly changing the aspect ratio.

When you select the "letterbox" method (default way), the image is resized to fit the AI model input tensor dimensions while preserving the aspect ratio. The voids which can appear on the image sides are filled with the color specified by degirum.model.Model.input_letterbox_fill_color property (black by default).

When you select the "crop-first" method, the image is first cropped to match the AI model input tensor aspect ratio with respect to the degirum.model.Model.input_crop_percentage and then resized to match the AI model input tensor dimensions. For example: an input image with dimensions 640x480 going into a model with input tensor dimensions 224x224 with crop percentage of 0.875 will first be center cropped to 420x420 (420 = min(640, 480) * 0.875) and then resized to 224x224.

When you select "crop-last" method, if the AI model input tensor dimensions are equal (square), the image is resized with its smaller side equal to the model dimension with respect to degirum.model.Model.input_crop_percentage. If the AI model input tensor dimensions are not equal (rectangle), the image is resized with stretching to the input tensor dimensions with respect to degirum.model.Model.input_crop_percentage. The image is then cropped to fit the AI model input tensor dimensions and aspect ratio. For example: an input image with dimensions 640x480 going into a model with input tensor dimensions of 224x224 with crop percentage of 0.875 will first be resized to 341x256 (256 = 224 / 0.875 and 341 = 256 * 640 / 480) and then center cropped to 224x224. Alternatively an input image with dimensions 640x480 and a model with input tensor dimensions 280x224 will be resized to 320x256 (320 = 280 / 0.875 and 256 = 224 / 0.875) and then center cropped to 280x224.

You can specify the resize algorithm in the degirum.model.Model.input_resize_method property, which may have the following values: "nearest", "bilinear", "area", "bicubic", or "lanczos". These values specify various interpolation algorithms used for resizing.

In case your input frames are stored in numpy arrays, you may need to tell PySDK the order of colors in those numpy arrays: RGB or BGR. This order is called the colorspace. By default, PySDK treats numpy arrays as having RGB colorspace. So if your numpy arrays as such, then no additional action is needed from your side. But if your numpy arrays have color order opposite to default, then you need to change degirum.model.Model.input_numpy_colorspace property.

Note: If a model has multiple image inputs, the PySDK applies the same input_*** image properties as discussed above for every image input of a model.

Audio Input Data Handling

When dealing with model inputs of audio type (InputType is equal to "Audio"), PySDK does not perform any conversions of the input data: it expects numpy 1-D array with audio waveform samples of proper size and with proper sampling rate. The waveform size should be equal to InputWaveformSize model info property. The waveform sampling rate should be equal to InputSamplingRate model info property. And finally the data element type should be equal to the data type specified by the InputRawDataType model info property. All aforementioned model info properties are the properties of the ModelInfo class returned by degirum.model.Model.model_info property (see Model Info section for details).

Tensor Input Data Handling

When dealing with model inputs of raw tensor type (InputType is equal to "Tensor"), PySDK expects that you provide a 4-D numpy array of proper dimensions. The dimensions of that array should match model input dimensions as specified by the following model info properties:

  • InputN for dimension 0,
  • InputH for dimension 1,
  • InputW for dimension 2,
  • InputC for dimension 3.

The data element type should be equal to the data type specified by the InputRawDataType model info property (see Model Info section for details).

Inference Results

All model predict methods return result objects derived from degirum.postprocessor.InferenceResults class. Particular class types of result objects depend on the AI model type: classification, object detection, pose detection, segmentation etc. But from the user point of view, they deliver identical functionality.

Result object contains the following data:

  • degirum.postprocessor.InferenceResults.image property keeps original image;
  • degirum.postprocessor.InferenceResults.image_overlay property keeps original image with inference results drawn on a top; the type of such drawing is model-dependent:
    • for classification models, the list of class labels with probabilities is printed below the original image;
    • for object detection models, bounding boxes of detected object are drawn on the original image;
    • for hand and pose detection models, detected keypoints and keypoint connections are drawn on the original image;
    • for segmentation models, detected segments are drawn on the original image;
  • degirum.postprocessor.InferenceResults.results property keeps a list of numeric results (follow the property link for detailed explanation of all result formats);
  • degirum.postprocessor.InferenceResults.image_model property keeps the binary array with image data converted to AI model input specifications. This property is assigned only if you set degirum.model.Model.save_model_image model property before performing predictions.

The results property is what you typically use for programmatic access to inference results. The type of results is always a list of dictionaries, but the format of those dictionaries is model-dependent. Also, if the result contains coordinates of objects, all such coordinates are recalculated from the model coordinates back to coordinates on the original image, so you can use them directly.

The image_overlay property is very handy for debugging and troubleshooting. It allows you to quickly assess the correctness of the inference results in graphical form.

There are result properties which affect how the overlay image is drawn:

  • degirum.postprocessor.InferenceResults.overlay_alpha: transparency value (alpha-blend weight) for all overlay details;
  • degirum.postprocessor.InferenceResults.overlay_font_scale: font scaling factor for overlay text;
  • degirum.postprocessor.InferenceResults.overlay_line_width: line width in pixels for overlay lines;
  • degirum.postprocessor.InferenceResults.overlay_color: RGB color tuple or list of RGB color tuples for drawing all overlay details;
  • degirum.postprocessor.InferenceResults.overlay_show_labels: flag to enable drawing class labels of detected objects;
  • degirum.postprocessor.InferenceResults.overlay_show_probabilities: flag to enable drawing probabilities of detected objects;
  • degirum.postprocessor.InferenceResults.overlay_fill_color: RGB color tuple for filling voids which appear due to letterboxing.

When each individual result object is created, all these overlay properties (except overlay_fill_color) are assigned with values of similarly named properties taken from the model object (see Model Parameters section for the list of model properties). This allows assigning overlay property values only once and applying them to all consecutive results. But if you want to play with individual result, you may reassign any of overlay properties and then re-read image_overlay property. Each time you read image_overlay, it returns new image object freshly drawn according to the current values of overlay properties.

The overlay_color property is used to define the color to draw overlay details. In the case of a single RGB tuple, the corresponding color is used to draw all the overlay data: points, boxes, labels, segments, etc. In the case of a list of RGB tuples the behavior depends on the model type.

  • For classification models different colors from the list are used to draw labels of different classes.
  • For detection models different colors are used to draw labels and boxes of different classes.
  • For pose detection models different colors are used to draw keypoints of different persons.
  • For segmentation models different colors are used to highlight segments of different classes.

If the list size is less than the number of classes of the model, then overlay_color values are used cyclically, for example, for three-element list it will be overlay_color[0], then overlay_color[1], overlay_color[2], and again overlay_color[0].

The default value of overlay_color is a single RBG tuple of yellow color for all model types except segmentation models. For segmentation models it is the list of RGB tuples with the list size equal to the number of model classes. You can use Model.label_dictionary property to obtain a list of model classes. Each color is automatically assigned to look pretty and different from other colors in the list.

Note" overlay_fill_color is assigned with degirum.model.Model.input_letterbox_fill_color.

Batch Inferences

If you need to process multiple frames using the same model and the same settings, the most effective way to do it is to use batch prediction methods of `degirum.model.Model' class:

  • degirum.model.Model.predict_batch method to run predictions on a list of frames;
  • degirum.model.Model.predict_dir method to run predictions on files in a directory.

Both methods perform predictions of multiple frames in a pipelined manner, which is more efficient than just calling predict() method in a loop.

Both methods return the generator object, so you can iterate over inference results. This allows you to directly use the result of batch prediction methods in for-loops, for example:

for result in model.predict_batch(['image1.jpg','image2.jpg']):
   print(result)

Note: Since batch prediction methods return generator object, simple assignment of batch prediction method result to some variable does not start any inference. Only iterating over that generator object does.

The predict_batch method accepts single parameter: an iterator object, for example, a list. You populate your iterator object with the same type of data you pass to regular predict(), i.e. input image path strings, input image URL string, numpy arrays, or PIL Image objects (in case of PIL image backend).

The predict_dir method accepts a filepath to a directory containing graphical files for inference. You may supply optional extensions parameter passing the list of file extensions to process.

The following minimal example demonstrates how to use batch predict to perform AI inference from the video file:

import degirum as dg
import cv2

# connect to cloud model zoo, load model, set model properties needed for OpenCV
zoo = dg.connect(dg.CLOUD, "https://cs.degirum.com", token="<your cloud API access token>")
model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1")
model.image_backend = "opencv"
model.input_numpy_colorspace = "BGR"
stream = cv2.VideoCapture("images/Traffic.mp4") # open video file

# define generator function to produce video frames
def frame_source(stream):
   while True:
   ret, frame = stream.read()
      if not ret:
         break # end of file
      yield frame

# run batch predict on stream of frames from video file
for result in model.predict_batch(frame_source(stream)):
   # show annotated frames
   cv2.imshow("Demo", res.image_overlay)   

stream.release()

Model Parameters

The model behavior can be controlled with various Model class properties, which define model parameters. They can be divided into the following categories:

  • parameters, which control how to handle input frames;
  • parameters, which control the inference;
  • parameters, which control how to display inference results;
  • parameters, which control model run-time behavior and provide access to model information

The following table provides complete summary of Model class properties arranged by categories.

Property Name Description Possible Values Default Value
Input Handling Parameters
image_backend package to be used for image processing "auto", "pil", or "opencv"
"auto" tries PIL first
"auto"
input_letterbox_fill_color image fill color in case of 'letterbox' padding 3-element tuple of RGB color (0,0,0)
input_numpy_colorspace colorspace for numpy arrays "RGB" or "BGR" "RGB"
input_pad_method how input image will be padded when resized "stretch", "letterbox", "crop-first", or "crop-last" "letterbox"
input_crop_percentage a percentage of input image dimension to retain when "input_pad_method" is set to "crop-first" or "crop-last" Float value in [0..1] range 1.0
input_resize_method interpolation algorithm for image resizing "nearest", "bilinear", "area", "bicubic", "lanczos" "bilinear"
save_model_image flag to enable/disable saving of model input image in inference results Boolean value False
Inference Parameters
output_confidence_threshold confidence threshold to reject results with low scores Float value in [0..1] range 0.1
output_max_detections maximum number of objects to report for detection models Integer value 20
output_max_detections_per_class maximum number of objects to report for each class for detection models Integer value 100
output_max_classes_per_detection maximum number of classes to report for detection models Integer value 30
output_nms_threshold rejection threshold for non-max suppression Float value in [0..1] range 0.6
output_pose_threshold rejection threshold for pose detection models Float value in [0..1] range 0.8
output_postprocess_type inference result post-processing type. You may set it to 'None' to bypass post-processing. String Model-dependent
output_top_k Number of classes with biggest scores to report for classification models. If 0, report all classes above confidence threshold Integer value 0
output_use_regular_nms use regular (per-class) NMS algorithm as opposed to global (class-ignoring) NMS algorithm for detection models Boolean value False
Display Parameters
overlay_alpha transparency value (alpha-blend weight) for all overlay details Float value in [0..1] range 0.5
overlay_color color for drawing all overlay details 3-element tuple of RGB color or list of 3-element tuples of RGB color (255,255,128)
overlay_font_scale font scaling factor for overlay text Positive float value 1.0
overlay_line_width line width in pixels for overlay lines 3
overlay_show_labels flag to enable drawing class labels of detected objects Boolean value True
overlay_show_probabilities flag to enable drawing probabilities of detected objects Boolean value False
Control and Information Parameters
devices_available list of inference device indices which can be used for model inference (read-only) List of integer values N/A
devices_selected list of inference device indices selected for model inference List of integer values Equal to devices_available
label_dictionary model class label dictionary (read-only) Dictionary N/A
measure_time flag to enable measuring and collecting inference time statistics Boolean value False
model_info model information object to provide read-only access to model parameters (read-only) ModelParams object N/A
non_blocking_batch_predict flag to control the blocking behavior of predict_batch() method Boolean value False
eager_batch_size The size of the batch to be used by device scheduler when inferencing this model. The batch is the number of consecutive frames before this model is switched to another model during batch predict. Integer value in [1..80] range 8
frame_queue_depth The depth of the model prediction queue. When the queue size reaches this value, the next prediction call will block until there will be space in the queue. Integer value in [1..160] range 80 for cloud inference, 8 for other cases

Note: For segmentation models default value of overlay_color is the list of unique colors (RGB tuples). The size of the list is equal to the number of model classes. Use label_dictionary property to get a list of models classes.

Model Info

AI models have a lot of static attributes defining various model features and characteristics. Unlike model properties, these attributes in most cases cannot be changed: they come with the model.

To access all model attributes, you may query read-only model property degirum.model.Model.model_info.

Note: New deep copy of model info class is created each time you read this property, so any changes made to this copy will not affect model behavior.

Model attributes are divided into the following categories:

  • Device-related attributes
  • Pre-processing-related attributes
  • Inference-related attributes
  • Post-processing-related attributes

The following table provides a complete summary of model attributes arranged by categories. The Attribute Name column contains the name of the ModelInfo class member returned by the model_info property.

Note: Each attribute in the Pre-Processing-Related Attributes group is a list of values, one per model input.

Attribute Name Description Possible Values
Device-Related Attributes
DeviceType Device type to be used for AI inference of this model "ORCA": DeGirum Orca,
"EDGETPU": Google EdgeTPU,
"MYRIAD": Intel Myriad,
"CPU": host CPU
RuntimeAgent Type of runtime to be used for AI inference of this model "N2X": DeGirum NNExpress runtime,
"TFLITE": Google TFLite runtime
EagerBatchSize The size of the batch to be used by device scheduler when inferencing this model. The batch is the number of consecutive frames before this model is switched to another model during batch predict. Integer number
Pre-Processing-Related Attributes
InputType Model input type List of the following strings:
"Image": image input type,
"Audio": audio input type,
"Tensor": raw tensor input type
InputN Input frame dimension size 1
Other sizes to be supported
InputH Input height dimension size Integer number
InputW Input width dimension size Integer number
InputC Input color dimension size Integer number
InputQuantEn Enable input frame quantization flag (set for quantized models) Boolean value
InputRawDataType Data element type for audio or tensor inputs List of the following strings:
"DG_UINT8": 8-bit unsigned integer,
"DG_INT16": 16-bit signed integer,
"DG_FLT": 32-bit floating point
InputTensorLayout Input tensor shape and layout List of the following strings:
"NHWC": 4-D tensor frame-height-width-color
More layouts to be supported
InputColorSpace Input image colorspace (sequence of colors in C dimension) List of the following strings:
"RGB", "BGR"
InputScaleEn Enable global scaling of input data flag List of boolean values
InputScaleCoeff Scaling factor for input data global scaling; applied when InputScaleEn is enabled List of float values
InputNormMean Mean value for per-channel input data normalization; applied when both InputNormMean and InputNormStd are not empty List of 3-element arrays of float values
InputNormStd StDev value for per-channel input data normalization; applied when both InputNormMean and InputNormStd are not empty List of 3-element arrays of float values
InputQuantOffset Quantization offset for input image quantization List of float values
InputQuantScale Quantization scale for input image quantization List of float values
InputWaveformSize Input waveform size in samples for audio input types List of positive integer values
InputSamplingRate Input waveform sampling rate in Hz for audio input types List of positive float values
InputResizeMethod Interpolation algorithm used for image resizing during model training List of the following strings:
"nearest", "bilinear", "area", "bicubic", "lanczos"
InputPadMethod How input image was padded when resized during model training List of the following strings:
"stretch", "letterbox"
InputCropPercentage How much input image was cropped during model training Float value in [0..1] range
ImageBackend Graphical package used for image processing during model training List of the following strings:
"pil", "opencv"
Inference-Related Attributes
ModelPath Path to the model JSON file String with filepath
ModelInputN Model frame dimension size 1
Other sizes to be supported
ModelInputH Model height dimension size Integer number
ModelInputW Model width dimension size Integer number
ModelInputC Model color dimension size Integer number
ModelQuantEn Enable input frame quantization flag (set for quantized models) Boolean value
Post-Processing-Related Attributes
OutputNumClasses Number of classes model detects Integer value
OutputSoftmaxEn Enable softmax step in post-processing flag Boolean value
OutputClassIDAdjustment Class ID adjustment: number subtracted from the class ID reported by the model Integer value
OutputPostprocessType Post-processing type "Classification", "Detection", "DetectionYolo", "PoseDetection", "HandDetection", "FaceDetect", "Segmentation", "BodyPix", "Python"
Other types to be supported
OutputConfThreshold Confidence threshold to reject results with low scores Float value in [0..1] range
OutputNMSThreshold Rejection threshold for non-max suppression Float value in [0..1] range
OutputTopK Number of classes with biggest scores to report for classification models Integer number
MaxDetections Maximum number of objects to report for detection models Integer number
MaxDetectionsPerClass Maximum number of objects to report for each class for detection models Integer number
MaxClassesPerDetection Maximum number of classes to report for detection models Integer number
UseRegularNMS Use regular (per-class) NMS algorithm as opposed to global (class-ignoring) NMS algorithm for detection models Boolean value

Inference Advanced Topics

Selecting Devices for Inference

Every AI model in a model zoo is designed to work on a particular hardware, either on AI accelerator hardware such as DeGirum Orca, or on host computer CPU. Imagine the situation when the host computer is equipped with multiple hardware devices of a given type, and you run multiple inferences of a model designed for this device type. In this case by default all available hardware devices of this type will be used for this model inferences. This guarantees top inference performance in the case of single model running on all available devices.

To get the information about available devices you query degirum.model.Model.devices_available property. It returns the list of device indices of all available devices of the type this model is designed for. Those indices are zero-based, so if your host computer has a single device of a given type, the returned list would contain single zero element: [0]. In case of two devices it will be [0, 1] and so on.

In certain cases you may want to limit the model inference to particular subset of available devices. For example, you have two devices and you want to run concurrent inference of two models. In default case both devices would be used for both model inferences causing the models to be reloaded to devices each time you run the inference of another model. Even if the model loading for DeGirum Orca devices is extremely fast, it still may cause performance degradation. In this case you may want to run the first model inference only on the first device, and the second model inference only on the second device. To do so you need to assign degirum.model.Model.devices_selected property of each model object to contain the list of device indices you want your model to run on. In our example you need to assign the list [0] to the devices_selected property of the first model object, and the list [1] to the second model object.

In general, the list you assign to the devices_selected property should contain only indices occurred in the list returned by the devices_available property.

Note: since the inference device assignment for cloud inferences is performed dynamically, and actual AI farm node configuration is unknown until the inference starts, the devices_available property for such use case always returns the full list of available devices.

Handling Multiple Streams of Frames

The Model class interface has a method, degirum.model.Model.predict_batch, which can run multiple predictions on a sequence of frames. In order to deliver the sequence of frames to the predict_batch you implement an iterable object, which returns your frames one-by-one. One example of iterable object is a regular Python list, another example is a function, which yields frame data using yield statement. Then you pass such iterable object as an argument to the predict_batch method. In turn, the predict_batch method returns a generator object, which yields prediction results using yield statement.

All the inference magic with pipelining sequential inferences, asynchronously retrieving inference results, supporting various inference devices, and AI server vs. local operation modes happens inside the implementation of predict_batch method. All you need to do is to wrap your sequence of frame data in an iterable object, pass this object to predict_batch, and iterate over the generator object returned by predict_batch using either for-loop or by repeatedly calling next() built-in function on this generator object.

The following example runs the inference on an infinite sequence of frames captured from the camera:

import cv2 # OpenCV
stream = cv2.VideoCapture(0) # open video stream from local camera #0

def source(): # define iterator function, which returns frames from camera
   while True:
      ret, frame = stream.read()
      yield frame

for result in model.predict_batch(source()): # iterate over inference results
   cv2.imshow("AI camera", res.image_overlay) # process result

But what if you need to run multiple concurrent inferences of multiple asynchronous data streams with different frame rates? The simple approach when you combine two generators in one loop either using zip() built-in function or by manually calling next() built-in function for every generator in a loop body will not work effectively.

Non-working example 1. Using zip() built-in function:

batch1 = model1.predict_batch(source1()) # generator object for the first model
batch2 = model2.predict_batch(source2()) # generator object for the second model
for result1, result2 in zip(batch1, batch2)
   # process result1 and result2

Non-working example 2. Using next() built-in function:

batch1 = model1.predict_batch(source1()) # generator object for the first model
batch2 = model2.predict_batch(source2()) # generator object for the second model
while True:
   result1 = next(batch1)
   result2 = next(batch2)
   # process result1 and result2

The reason is that the Python runtime has Global Interpreter Lock (GIL), which allows running only one thread at a time blocking the execution of other threads. So if the currently running thread is itself blocked by waiting for the next frame or waiting for the next inference result, all other threads are blocked as well.

For example, if the frame rate of source1() is slower than the frame rate of source2() and assuming that the model inference frame rates are higher than the corresponding source frame rates, then the code above will spend most of the time waiting for the next frame from source1(), not letting frames from source2() to be retrieved, so the model2 will not get enough frames and will idle, losing performance.

Another example is when the inference latency of model1 is higher than the inference queue depth expressed in time (this is the product of the inference queue depth expressed in frames and the single frame inference time). In this case when the model1 inference queue is full, but inference result is not ready yet, the code above will block on waiting for that inference result inside next(batch1) preventing any operations with model2.

To get around such blocks the special non-blocking mode of batch predict operation is implemented. You turn on this mode by assigning True to degirum.model.Model.non_blocking_batch_predict property.

When non-blocking mode is enabled, the generator object returned by predict_batch() method accepts None from the input iterable object. This allows you to design non-blocking frame data source iterators: when no data is available, such iterator just yields None without waiting for the next frame. If None is returned from the input iterator, the model predict step is simply skipped for this iteration.

Also in non-blocking mode when no inference results are available in the result queue at some iteration, the generator yields None result. This allows to continue execution of the code which operates with another model.

In order to operate in non-blocking mode you need to modify your code the following way:

  1. Modify frame data source iterator to return None if no frame is available yet, instead of waiting for the next frame.
  2. Modify inference loop body to deal with None results by simply skipping them.

Measure Inference Timing

The degirum.model.Model class has a facility to measure and collect model inference time information. To enable inference time collection assign True to degirum.model.Model.measure_time property.

When inference timing collection is enabled, the durations of individual steps for each frame prediction are accumulated in internal statistic accumulators.

To reset time statistic accumulators you use degirum.model.Model.reset_time_stats method.

To retrieve time statistic accumulators you use degirum.model.Model.time_stats method. This method returns a dictionary with time statistic objects. Each time statistic object accumulates time statistics for particular inference step over all frame predictions happened since the timing collection was enabled or reset. The statistics includes minimum, maximum, average, and count. Inference steps correspond to dictionary keys. The following dictionary keys are supported:

Key Description
FrameTotalDuration_ms Frame total inference duration from the moment when you invoke predict method to the moment when inference results are returned
PythonPreprocessDuration_ms Duration of client-side pre-processing step including data loading time and data conversion time
CorePreprocessDuration_ms Duration of server-side pre-processing step
CoreInferenceDuration_ms Duration of server-side AI inference step
CoreLoadResultDuration_ms Duration of server-side data movement step
CorePostprocessDuration_ms Duration of server-side post-processing step

For DeGirum Orca AI accelerator hardware additional dictionary keys are supported:

Key Description
DeviceInferenceDuration_ms Duration of AI inference computations on AI accelerator IC excluding data transfers
DeviceTemperature_C Internal temperature of AI accelerator IC in C
DeviceFrequency_MHz Working frequency of AI accelerator IC in MHz

The time statistics object supports pretty-printing so you can directly print it using regular print() statement. For example, the output of the following statement:

print(model.time_stats()["PythonPreprocessDuration_ms"])

... will look like this:

PythonPreprocessDuration_ms: [4.62..6.19..13.01]/335

It consists of the inference step name (PythonPreprocessDuration_ms in this case) followed by four statistic values presented in a format [minimum..average..maximum]/count.

Note: In batch prediction mode many inference phases are pipelined so the pre- and post-processing steps of one frame may be executed in parallel with the AI inference step of another frame. Therefore actual frame rate may be higher than the frame rate calculated by FrameTotalDuration_ms statistic.

Note: PythonPreprocessDuration_ms statistic includes data loading time and data conversion time. This can give very different results for different ways of loading input frame data. For example, if you provide image URLs for inference, then the PythonPreprocessDuration_ms will include image downloading time, which can be much higher compared with the case when you provide the image as numpy array, which does not require any downloading.

The following example shows how to use time statistics collection interface. It assumes that the model variable is the model created by load_model().

model.measure_time = True # enable accumulation of time statistics

# perform batch prediction
for result in model.predict_batch(source()):
   # process result
   pass

stats = model.time_stats() # query time statistics dictionary

# pretty-print frame total inference duration statistics
print(stats["FrameTotalDuration_ms"])

# print average duration of AI inference step
print(stats["CoreInferenceDuration_ms"].avg)

model.reset_time_stats() # reset time statistics accumulators

# perform one more batch prediction
for result in model.predict_batch(source()):
   # process result
   pass

# print statistics of Python pre-processing step
print(stats["PythonPreprocessDuration_ms"].max)

Configuring and Launching AI Server

PySDK can be used to configure and launch DeGirum AI server on hosts equipped with DeGirum Orca AI accelerator card(s). This allows you to run AI inferences on this AI server host initiated from remote clients.

You can start AI server process multiple ways:

  • running AI server from the shell directly on the host OS;
  • running AI server as Linux System Service;
  • running AI server inside the Docker container.

Running AI Server from Shell Directly on Host OS

To run PySDK AI server directly on the host OS, perform the following steps on the host:

  • Create or select a user name to be used for all the following configuration steps. This user should have administrative rights on this host. The user name ai-user is used in the instructions below, but it can be changed to any other user name of your choice.

  • For convenience of future maintenance we recommend you to install PySDK into virtual environment, such as Miniconda.

  • Make sure you activated your Python virtual environment with the appropriate Python version and that PySDK installed into this virtual environment.

  • Create a directory for the AI server model zoo, and change your current working directory to this directory. For example:

    mkdir /home/ai-user/zoo
    cd /home/ai-user/zoo
    
  • If you wish to have your model zoo hosted locally, then you need to download all models from some cloud zoo into local model zoo directory, created on the previous step. In this case please perform the following steps:

    • If you want to download models from DeGirum Public cloud model zoo into the current working directory, then execute the following command:

      degirum download-zoo --path . --token <your cloud API token>
      
    • If you want to download models from some other cloud zoo, pass the URL of that cloud zoo as --url parameter:

      degirum download-zoo --path . --token <your cloud API token> --url "https://cs.degirum.com/<zoo URL>"
      

      The <zoo URL> suffix is the cloud zoo URL in the form <organization>/<model zoo name>. You can confirm zoo URL suffix by visiting your cloud user account and opening the model zoo management page via Management | Models main menu item.

  • If you wish your AI server to operate only with cloud zoos, just leave local zoo directory (/home/ai-user/zoo in our example) empty, but it must be created anyway.

  • Start DeGirum AI server process by executing the following command:

    degirum server start --zoo /home/ai-user/zoo
    

The AI server is now up and running, and will run until you press ENTER in the same terminal where you started it.

By default, AI server listens to 8778 TCP port. If you want to change the TCP port, pass --port command line argument when launching the server, for example:

python3 -m degirum.server --zoo /home/ai-user/zoo --port 8780

Running AI Server as Linux System Service

It is convenient to automate the process of AI server launch so that it will be started automatically on each system startup. On Linux-based hosts, you can achieve this by defining and configuring a system service, which will handle AI server startup.

Please perform the following steps to create, configure, and start the system service:

  • Create the configuration file in /etc/systemd/system directory named degirum.service. You will need administrative rights to create this file. You can use the following template as an example:

    [Unit]
    Description=DeGirum AI Service
    [Service]
    # You may want to adjust the working directory:
    WorkingDirectory=/home/ai-user/
    # You may want to adjust the path to your Python executable and --zoo model zoo path.
    # Also you may specify server TCP port other than default 8778 by adding --port <port> argument.
    ExecStart=/home/ai-user/miniconda3/bin/python -m degirum.server --zoo /home/ai-user/zoo
    Restart=always
    # You may want to adjust the restart time interval:
    RestartSec=10
    SyslogIdentifier=degirum-ai-server
    # You may want to change the user name under which this service will run.
    # This user should have rights to access model zoo directory
    User=ai-user
    [Install]
    WantedBy=multi-user.target
    
  • Please replace /home/ai-user/zoo path with the actual path to your local model zoo. Please refer to the Running AI Server from Shell Directly on Host OS section for discussions about model zoo hosting options.

  • Start the system service by executing the following command:

    sudo systemctl start degirum.service
    
  • Check the system service status by executing the following command:

    sudo systemctl status degirum.service
    
  • If the status is "Active", it means that the configuration is good and the service is up and running.

  • Then enable the service for automatic startup by executing the following command:

    sudo systemctl enable degirum.service
    

Running AI Server Inside Docker Container

To run PySDK AI server inside the Docker container, perform the following steps on the host:

  • Install Docker Desktop on a host computer where you want to build or run the docker container.

  • Follow the instructions provided on the DeGirum AI Server Docker Hub page. Or, in few words, run the following command:

       docker run --name aiserver -d -p 8778:8778 -v /home/ai-user/zoo:/zoo --privileged degirum/aiserver:latest
    
    • Please replace /home/ai-user/zoo path with the actual path to your local model zoo. Please refer to the Running AI Server from Shell Directly on Host OS section for discussions about model zoo hosting options.

    • Use --privileged option if you have AI hardware accelerator(s) in stalled on your system. In case of CPU-only inferences you may omit it.

    • AI server inside Docker container always listens to 8778 TCP port. -p 8778:8778 option maps Docker container TCP port to host TCP port. If you want to change the host TCP port, change the first number before the colon, for example: -p 9999:8778 will map the host port 9999 to the container port 8778.

Connecting to AI Server from Client Side

Now your AI server is up and running and you may connect to it from Python scripts using PySDK.

If you want to work with models from local AI server model zoo, you pass the AI server network hostname or its IP address to the degirum.connect PySDK function and do not specify other parameters:

import degirum as dg
model_zoo = dg.connect(host_address)

If you run your PySDK script on the same host as the AI server, you may use the "localhost" string as a network hostname.

In local Linux networks with standard mDNS configuration the network hostname is a concatenation of the local hostname as returned by hostname command and .local suffix, for example, if hostname command returns ai-host, then the network hostname will be ai-host.local

If your server listens to a TCP port other than default 8778, you append the TCP port number to the hostname separated by colon, for example, "localhost:9999".

If you want to work with models from some cloud model zoo of your choice, you pass a cloud zoo URL and cloud API access token as the second and the third parameter. The cloud model zoo URL is specified in the "https://cs.degirum.com/<zoo URL>" format (see more on that in Configuration for Cloud Inference section):

import degirum as dg
model_zoo = dg.connect( host_address, "https://cs.degirum.com/<zoo URL>", token = "<your cloud API access token>")

Updating AI Server Local Model Zoo

If you need to update AI Server local model zoo you need to perform the following steps:

  • Manage your model zoo directory:

    • Add new models by downloading them from the cloud zoo the way described in the beginning of this chapter.
    • Remove models by deleting model subdirectories.
  • If you started your AI server not in Docker container, then tell the AI server to rescan the local model zoo directory by executing the following command on the same host, where the AI server runs:

    degirum server rescan-zoo
    
  • If you started your AI server in the Docker container, then restart the container by executing the following command (assuming that the container was started with --name aiserver option):

    docker restart aiserver
    

PySDK Command-Line Interface

As a part of PySDK installation, the degirum executable console script is installed into the system path. It implements command-line interface (CLI) to various PySDK management facilities. This console script extends PySDK functionality through the mechanism of entry points.

The PySDK CLI supports the following commands:

Command Description
download-zoo Command to download AI models from the cloud model zoo
server Command to control the operation of AI server
sys-info Command to obtain system information dump
trace Command to manage AI server tracing facility

You invoke the console script passing the command from the table above as its first argument followed by command-specific parameters described in the following sections.

degirum <command> <arguments>

Download Model Zoo Command

Using this command you can download ML models from the cloud model zoo of your choice specified by URL. The command has the following parameters:

Parameter Description Possible Values Default
--path Local filesystem path to store models downloaded from a model zoo repo Valid local directory path Current directory
--url Cloud model zoo URL "https://cs.degirum.com[/<zoo URL>]" "https://cs.degirum.com"
--token Cloud API access token Valid token obtained at cs.degirum.com Empty
--model_family Model family name filter: model name substring or regular expression Any Empty
--device Target inference device filter ORCA, CPU, EDGETPU, MYRIAD Empty
--runtime Runtime agent type filter N2X, TFLITE, OPENVINO Empty
--precision Model calculation precision filter QUANT, FLOAT None
--pruned Model density filter PRUNED, DENSE None

The URL parameter is specified in the "https://cs.degirum.com/<zoo URL>" format. Here <zoo URL> suffix is the name of the cloud model zoo in the form of <organization>/<zoo>. You can obtain the model zoo URL suffix on DeGirum Cloud Portal site https://cs.degirum.com under Management | Models main menu item: just select the model zoo you want to access to open the model zoo page and click copy button near the model zoo name to copy the model zoo URL suffix into the clipboard.

Filter parameters work the same way as in degirum.zoo_manager.ZooManager.list_models method: they allow you downloading only models satisfying filter conditions.

Once models are downloaded into the directory specified by --path parameter, you may use this directory as the model zoo to be served by AI server (see Server Control Command section).

Example.

Download models for ORCA device type from DeGirum Public cloud model zoo into ./my-zoo directory.

degirum download-zoo --path ./my-zoo --token <your cloud API access token> --device ORCA

Here <your cloud API access token> is your cloud API access token, which you can generate on DeGirum Cloud Portal site https://cs.degirum.com under Management | My Tokens main menu item.

Server Control Command

Using this command you can start AI server, shut it down, or request AI server to rescan its local model zoo.

You can control only AI server which runs on the same host where you execute this command. The control of remote AI servers is disabled for security reasons.

This command has the following subcommands, which are passed just after the command:

Sub-command Description
start Start AI server
rescan-zoo Request AI server to rescan its model zoo
shutdown Request AI server to shutdown

The command has the following parameters:

Parameter Applicable To Description Possible Values Default
--zoo start Local model zoo directory to serve models from (applicable only to start subcommand) Any valid path Current directory
--quiet start Do not display any output (applicable only to start subcommand) N/A Disabled
--port start TCP port to bind AI server to 1...65535 8778

Example.

Start AI server to serve models from ./my-zoo directory and bind it to default port:

degirum server start --zoo ./my-zoo

System Info Command

Using this command you can query the system information either for the local system or for the remote AI server host. The system info dump is printed to the console.

The command has the following parameters:

Parameter Description Possible Values Default
--host Remote AI server hostname or IP address; omit to query local system Valid hostname, IP address, or empty Empty

Example.

Query system info from remote AI server at IP address 192.168.0.101:

degirum sys-info --host 192.168.0.101

Trace Management Command

Using this command you can manage AI server tracing facility.

AI server tracing facility is used for AI server debugging and time profiling. It is designed mostly for DeGirum customer support and not intended to be used by the end user directly.

This command has the following subcommands, which are passed just after the command:

Sub-command Description
list List all available trace groups
configure Configure trace levels for trace groups
read Read trace data to file

The command has the following parameters:

Parameter Applicable To Description Possible Values Default
--host All subcommands Remote AI server hostname or IP address Valid hostname or IP address localhost
--file read Filename to save trace data into; omit to print to console Valid local filename Empty
--filesize read Maximum trace data size to read Any integer number 10000000
--basic configure Set Basic trace level for a given list of trace groups One or multiple trace group names as returned by list sub-command Empty
--detailed configure Set Detailed trace level for a given list of trace groups One or multiple trace group names as returned by list sub-command Empty
--full configure Set Full trace level for a given list of trace groups One or multiple trace group names as returned by list sub-command Empty

Examples.

Query AI server at 192.168.0.101 address for the list of available trace groups and print it to console:

degirum trace list --host 192.168.0.101

Configure tracing for AI server on localhost: by setting various tracing levels for various trace groups:

degirum trace configure --basic CoreTaskServer --detailed OrcaDMA OrcaRPC --full CoreRuntime

Read trace data from AI server on localhost and save it to a file ./my-trace-1.txt

degirum trace read --file ./my-trace-1.txt

API Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

degirum-0.7.0-cp311-cp311-win_amd64.whl (5.0 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

degirum-0.7.0-cp311-cp311-manylinux_2_31_x86_64.whl (37.6 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.31+ x86-64

degirum-0.7.0-cp311-cp311-manylinux_2_31_aarch64.whl (9.5 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.31+ ARM64

degirum-0.7.0-cp310-cp310-win_amd64.whl (5.0 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

degirum-0.7.0-cp310-cp310-manylinux_2_31_x86_64.whl (37.6 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.31+ x86-64

degirum-0.7.0-cp310-cp310-manylinux_2_31_aarch64.whl (9.5 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.31+ ARM64

degirum-0.7.0-cp39-cp39-win_amd64.whl (5.0 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

degirum-0.7.0-cp39-cp39-manylinux_2_31_x86_64.whl (37.6 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.31+ x86-64

degirum-0.7.0-cp39-cp39-manylinux_2_31_aarch64.whl (9.5 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.31+ ARM64

degirum-0.7.0-cp39-cp39-macosx_12_0_x86_64.whl (7.3 MB view hashes)

Uploaded CPython 3.9 macOS 12.0+ x86-64

degirum-0.7.0-cp39-cp39-macosx_12_0_arm64.whl (6.4 MB view hashes)

Uploaded CPython 3.9 macOS 12.0+ ARM64

degirum-0.7.0-cp38-cp38-win_amd64.whl (5.0 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

degirum-0.7.0-cp38-cp38-manylinux_2_31_x86_64.whl (37.6 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.31+ x86-64

degirum-0.7.0-cp38-cp38-manylinux_2_31_aarch64.whl (9.5 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.31+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page