Skip to main content

FuriosaAI model server interacting Furiosa NPU.

Project description

Furiosa Server (Alpha)

Furiosa Model Server is a framework for serving Tflite/ONNX models through a REST API, using Furiosa NPUs.

Furiosa Model server API supoorts a REST and gRPC interface, compliant with KFServing's V2 Dataplane specification and Triton's Model Repository specification.

Features

  • HTTP REST API support
  • Multi-model support
  • GRPC support
  • OpenAPI specification support
  • Compiler configuration support
  • Input tensor adapter in Python (e.g., converting jpeg, png image files to tensors)
  • Authentication support

Building for Development

Requirements

  • Python >= 3.8
  • libnpu
  • libnux

Install apt depdencies.

sudo apt install furiosa-libnpu-sim # or furiosa-libnpu-xrt if you have Furiosa H/W
sudo apt install furiosa-libnux

Install Python dependencies.

pip install -e .

To build from source, generate required files from grpc tools and datamodel-codegen. Each step is needed to generate a GRPC stub and pydantic data class.

Generate GRPC API

for api in "predict" "model_repository"
do
    python -m grpc_tools.protoc \
        -I"./proto" \
        --python_out="./furiosa/server/api/grpc/generated" \
        --grpc_python_out="./furiosa/server/api/grpc/generated" \
        --mypy_out="./furiosa/server/api/grpc/generated" \
        "./proto/$api.proto"
done

Generate Pydantic data type

for api in "predict" "model_repository"
do
    datamodel-codegen \
    --output-model-type pydantic_v2.BaseModel \
    --input "./openapi/$api.yaml" \
    --output "./furiosa/server/types/$api.py"
done

Testing

furiosa-server$ pytest --capture=no
============================================================ test session starts =============================================================
platform linux -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/ys/Furiosa/cloud/furiosa-server
plugins: asyncio-0.15.1
collected 10 items

tests/test_server.py [1/6] 🔍   Compiling from tflite to dfg
Done in 0.006840319s
[2/6] 🔍   Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 47.121174s
▪▪▪▪▪ [2/3] Lowering...Done in 19.422386s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 0.27680752s
Done in 66.82971s
[3/6] 🔍   Compiling from ldfg to cdfg
Done in 0.000951856s
[4/6] 🔍   Compiling from cdfg to gir
Done in 0.028555028s
[5/6] 🔍   Compiling from gir to lir
Done in 0.01069514s
[6/6] 🔍   Compiling from lir to enf
Done in 0.05054388s
✨  Finished in 66.980644s
.........[1/6] 🔍   Compiling from tflite to dfg
Done in 0.005259287s
[2/6] 🔍   Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 0.003461787s
▪▪▪▪▪ [2/3] Lowering...Done in 7.16337s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 0.31032142s
Done in 7.4865813s
[3/6] 🔍   Compiling from ldfg to cdfg
Done in 0.001077142s
[4/6] 🔍   Compiling from cdfg to gir
Done in 0.02613672s
[5/6] 🔍   Compiling from gir to lir
Done in 0.012959026s
[6/6] 🔍   Compiling from lir to enf
Done in 0.058442567s
✨  Finished in 7.642151s
.

======================================================= 10 passed in 76.17s (0:01:16) ========================================================

Installing

Requirements

  • Python >= 3.8

Download the latest release from https://github.com/furiosa-ai/furiosa-server/releases.

pip install furiosa_server-x.y.z-cp38-cp38-linux_x86_64.whl

Usages

Command lines

furiosa-server command has the following options. To print out the command line usage, you can run furiosa-server --help option.

Usage: furiosa-server [OPTIONS]

  Start serving models from FuriosaAI model server

Options
  --log-level                 [ERROR|INFO|WARN|DEBUG|TRACE]    [default: LogLevel.INFO]
  --model-name                TEXT                             Model name [default: None]
  --model-path                TEXT                             Path to a model file (tflite, onnx are supported)
                                                               [default: None]
  --model-version             TEXT                             Model version [default: default]
  --host                      TEXT                             IPv4 address to bind [default: 0.0.0.0]
  --http-port                 INTEGER                          HTTP port to bind [default: 8080]
  --model-config              FILENAME                         Path to a model config file [default: None]
  --server-config             FILENAME                         Path to a server config file [default: None]
  --install-completion        [bash|zsh|fish|powershell|pwsh]  Install completion for the specified shell. [default: None]
  --show-completion           [bash|zsh|fish|powershell|pwsh]  Show completion for the specified shell, to copy it or
                                                               customize the installation.
                                                               [default: None]
  --help                                                       Show this message and exit.

Serving a single model

To serve a single model, you will need only a couple of command line options. The following is an example to start a model server with the specific model name and the model image file:

$ furiosa-server --model-name mnist --model-path samples/data/MNIST_inception_v3_quant.tflite --model-version 1
find native library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/release/
INFO:furiosa.runtime._api.v1:loaded dynamic library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/release/libnux.so (0.4.0-dev bdde0748b)
[1/6] 🔍   Compiling from tflite to dfg
Done in 0.04330982s
[2/6] 🔍   Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 38.590836s
▪▪▪▪▪ [2/3] Lowering...Done in 26.293291s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 2.2485964s
Done in 67.13952s
[3/6] 🔍   Compiling from ldfg to cdfg
Done in 0.000349475s
[4/6] 🔍   Compiling from cdfg to gir
Done in 0.07628228s
[5/6] 🔍   Compiling from gir to lir
Done in 0.002296112s
[6/6] 🔍   Compiling from lir to enf
Done in 0.06429358s
✨  Finished in 67.361084s
INFO:     Started server process [235857]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)

You can find and try APIs via openapi: http://localhost:8080/docs#/

Serving multiple models

To serve multiple models, you need to write a model configuration file. The following is an example file located at samples/model_config_example.yml:

model_config_list:
  - name: mnist
    path: "samples/data/MNISTnet_uint8_quant.tflite"
    version: 1
    npu_device: npu0pe0
    compiler_config:
      keep_unsignedness: true
      split_unit: 0
  - name: ssd
    path: "samples/data/tflite/SSD512_MOBILENET_V2_BDD_int_without_reshape.tflite"
    version: 1
    npu_device: npu1

In a model configuration file, you can also specify a NPU device name dedicated to serve a certain model, and a list of compiler configs as shown in the above example.

If you write a model config file, you can launch the model server with a specific model config file as follow:

$ furiosa-server --model-config samples/model_config_example.yaml
find native library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/release/
INFO:furiosa.runtime._api.v1:loaded dynamic library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/release/libnux.so (0.4.0-dev bdde0748b)
[1/6] 🔍   Compiling from tflite to dfg
Done in 0.000510351s
[2/6] 🔍   Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 1.5242418s
▪▪▪▪▪ [2/3] Lowering...Done in 0.41843188s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 0.00754911s
Done in 1.9507353s
[3/6] 🔍   Compiling from ldfg to cdfg
Done in 0.000069757s
[4/6] 🔍   Compiling from cdfg to gir
Done in 0.005654631s
[5/6] 🔍   Compiling from gir to lir
Done in 0.000294499s
[6/6] 🔍   Compiling from lir to enf
Done in 0.003239762s
✨  Finished in 1.9631383s
[1/6] 🔍   Compiling from tflite to dfg
Done in 0.010595854s
[2/6] 🔍   Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 36.860104s
▪▪▪▪▪ [2/3] Lowering...Done in 8.500944s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 1.2011535s
Done in 46.564877s
[3/6] 🔍   Compiling from ldfg to cdfg
Done in 0.000303809s
[4/6] 🔍   Compiling from cdfg to gir
Done in 0.07403221s
[5/6] 🔍   Compiling from gir to lir
Done in 0.001839668s
[6/6] 🔍   Compiling from lir to enf
Done in 0.07413657s
✨  Finished in 46.771423s
INFO:     Started server process [245257]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)

Submitting inference tasks

The following is an example of a request message. If you want to know the schema of the request message, please refer to openapi specication.

{"inputs": [{"name": "mnist", "datatype": "INT32", "shape": [1, 1, 28, 28], "data": ...}]}

You can test one of MNIST model with the following command:

$ curl -X POST -H "Content-Type: application/json" \
-d "@samples/mnist_input_sample_01.json" \
http://localhost:8080/v2/models/mnist/versions/1/infer

{"model_name":"mnist","model_version":"1","id":null,"parameters":null,"outputs":[{"name":"0","shape":[1,10],"datatype":"UINT8","parameters":null,"data":[0,0,0,1,0,255,0,0,0,0]}]}%

Also, you can run a simple Python code to request the prediction task to the furiosa-server. Here is an example:

import requests
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
url = 'http://localhost:8080/v2/models/mnist/versions/1/infer'
data = np.ndarray(x_train[0:1], dtype=np.uint8).flatten().tolist()
tensor = {
        'dataType': 'INT32',
        'shape': [1,1,28,28],
        'data': data
}
request = {'inputs': [tensor] }
response = requests.post(url, json=request)
print(response.json())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

furiosa_server-0.10.2-py3-none-any.whl (35.4 kB view details)

Uploaded Python 3

File details

Details for the file furiosa_server-0.10.2-py3-none-any.whl.

File metadata

File hashes

Hashes for furiosa_server-0.10.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b6849070db975c64bff445075b49ba60051dcd9051b4c3c8c0c955e6863b8d9b
MD5 d6cb83d433484c83545ea2cb19e9515a
BLAKE2b-256 88345f15591ae9dd053484ffbbf07dff240c83939649bad97a8aaafa45f74cc2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page