FuriosaAI model server interacting Furiosa NPU.
Project description
Furiosa Server (Alpha)
Furiosa Model Server is a framework for serving Tflite/ONNX models through a REST API, using Furiosa NPUs.
Furiosa Model server API supoorts a REST and gRPC interface, compliant with KFServing's V2 Dataplane specification and Triton's Model Repository specification.
Features
- HTTP REST API support
- Multi-model support
- GRPC support
- OpenAPI specification support
- Compiler configuration support
- Input tensor adapter in Python (e.g., converting jpeg, png image files to tensors)
- Authentication support
Building for Development
Requirements
- Python >= 3.8
- libnpu
- libnux
Install apt depdencies.
sudo apt install furiosa-libnpu-sim # or furiosa-libnpu-xrt if you have Furiosa H/W
sudo apt install furiosa-libnux
Install Python dependencies.
pip install -e .
To build from source, generate required files from grpc tools and datamodel-codegen. Each step is needed to generate a GRPC stub and pydantic data class.
Generate GRPC API
for api in "predict" "model_repository"
do
python -m grpc_tools.protoc \
-I"./proto" \
--python_out="./furiosa/server/api/grpc/generated" \
--grpc_python_out="./furiosa/server/api/grpc/generated" \
--mypy_out="./furiosa/server/api/grpc/generated" \
"./proto/$api.proto"
done
Generate Pydantic data type
for api in "predict" "model_repository"
do
datamodel-codegen \
--output-model-type pydantic_v2.BaseModel \
--input "./openapi/$api.yaml" \
--output "./furiosa/server/types/$api.py"
done
Testing
furiosa-server$ pytest --capture=no
============================================================ test session starts =============================================================
platform linux -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/ys/Furiosa/cloud/furiosa-server
plugins: asyncio-0.15.1
collected 10 items
tests/test_server.py [1/6] 🔍 Compiling from tflite to dfg
Done in 0.006840319s
[2/6] 🔍 Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 47.121174s
▪▪▪▪▪ [2/3] Lowering...Done in 19.422386s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 0.27680752s
Done in 66.82971s
[3/6] 🔍 Compiling from ldfg to cdfg
Done in 0.000951856s
[4/6] 🔍 Compiling from cdfg to gir
Done in 0.028555028s
[5/6] 🔍 Compiling from gir to lir
Done in 0.01069514s
[6/6] 🔍 Compiling from lir to enf
Done in 0.05054388s
✨ Finished in 66.980644s
.........[1/6] 🔍 Compiling from tflite to dfg
Done in 0.005259287s
[2/6] 🔍 Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 0.003461787s
▪▪▪▪▪ [2/3] Lowering...Done in 7.16337s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 0.31032142s
Done in 7.4865813s
[3/6] 🔍 Compiling from ldfg to cdfg
Done in 0.001077142s
[4/6] 🔍 Compiling from cdfg to gir
Done in 0.02613672s
[5/6] 🔍 Compiling from gir to lir
Done in 0.012959026s
[6/6] 🔍 Compiling from lir to enf
Done in 0.058442567s
✨ Finished in 7.642151s
.
======================================================= 10 passed in 76.17s (0:01:16) ========================================================
Installing
Requirements
- Python >= 3.8
Download the latest release from https://github.com/furiosa-ai/furiosa-server/releases.
pip install furiosa_server-x.y.z-cp38-cp38-linux_x86_64.whl
Usages
Command lines
furiosa-server
command has the following options.
To print out the command line usage, you can run furiosa-server --help
option.
Usage: furiosa-server [OPTIONS]
Start serving models from FuriosaAI model server
Options
--log-level [ERROR|INFO|WARN|DEBUG|TRACE] [default: LogLevel.INFO]
--model-name TEXT Model name [default: None]
--model-path TEXT Path to a model file (tflite, onnx are supported)
[default: None]
--model-version TEXT Model version [default: default]
--host TEXT IPv4 address to bind [default: 0.0.0.0]
--http-port INTEGER HTTP port to bind [default: 8080]
--model-config FILENAME Path to a model config file [default: None]
--server-config FILENAME Path to a server config file [default: None]
--install-completion [bash|zsh|fish|powershell|pwsh] Install completion for the specified shell. [default: None]
--show-completion [bash|zsh|fish|powershell|pwsh] Show completion for the specified shell, to copy it or
customize the installation.
[default: None]
--help Show this message and exit.
Serving a single model
To serve a single model, you will need only a couple of command line options. The following is an example to start a model server with the specific model name and the model image file:
$ furiosa-server --model-name mnist --model-path samples/data/MNIST_inception_v3_quant.tflite --model-version 1
find native library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/release/
INFO:furiosa.runtime._api.v1:loaded dynamic library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/release/libnux.so (0.4.0-dev bdde0748b)
[1/6] 🔍 Compiling from tflite to dfg
Done in 0.04330982s
[2/6] 🔍 Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 38.590836s
▪▪▪▪▪ [2/3] Lowering...Done in 26.293291s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 2.2485964s
Done in 67.13952s
[3/6] 🔍 Compiling from ldfg to cdfg
Done in 0.000349475s
[4/6] 🔍 Compiling from cdfg to gir
Done in 0.07628228s
[5/6] 🔍 Compiling from gir to lir
Done in 0.002296112s
[6/6] 🔍 Compiling from lir to enf
Done in 0.06429358s
✨ Finished in 67.361084s
INFO: Started server process [235857]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
You can find and try APIs via openapi: http://localhost:8080/docs#/
Serving multiple models
To serve multiple models, you need to write a model configuration file.
The following is an example file located at samples/model_config_example.yml
:
model_config_list:
- name: mnist
path: "samples/data/MNISTnet_uint8_quant.tflite"
version: 1
npu_device: npu0pe0
compiler_config:
keep_unsignedness: true
split_unit: 0
- name: ssd
path: "samples/data/tflite/SSD512_MOBILENET_V2_BDD_int_without_reshape.tflite"
version: 1
npu_device: npu1
In a model configuration file, you can also specify a NPU device name dedicated to serve a certain model, and a list of compiler configs as shown in the above example.
If you write a model config file, you can launch the model server with a specific model config file as follow:
$ furiosa-server --model-config samples/model_config_example.yaml
find native library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/release/
INFO:furiosa.runtime._api.v1:loaded dynamic library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/release/libnux.so (0.4.0-dev bdde0748b)
[1/6] 🔍 Compiling from tflite to dfg
Done in 0.000510351s
[2/6] 🔍 Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 1.5242418s
▪▪▪▪▪ [2/3] Lowering...Done in 0.41843188s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 0.00754911s
Done in 1.9507353s
[3/6] 🔍 Compiling from ldfg to cdfg
Done in 0.000069757s
[4/6] 🔍 Compiling from cdfg to gir
Done in 0.005654631s
[5/6] 🔍 Compiling from gir to lir
Done in 0.000294499s
[6/6] 🔍 Compiling from lir to enf
Done in 0.003239762s
✨ Finished in 1.9631383s
[1/6] 🔍 Compiling from tflite to dfg
Done in 0.010595854s
[2/6] 🔍 Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 36.860104s
▪▪▪▪▪ [2/3] Lowering...Done in 8.500944s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 1.2011535s
Done in 46.564877s
[3/6] 🔍 Compiling from ldfg to cdfg
Done in 0.000303809s
[4/6] 🔍 Compiling from cdfg to gir
Done in 0.07403221s
[5/6] 🔍 Compiling from gir to lir
Done in 0.001839668s
[6/6] 🔍 Compiling from lir to enf
Done in 0.07413657s
✨ Finished in 46.771423s
INFO: Started server process [245257]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
Submitting inference tasks
The following is an example of a request message. If you want to know the schema of the request message, please refer to openapi specication.
{"inputs": [{"name": "mnist", "datatype": "INT32", "shape": [1, 1, 28, 28], "data": ...}]}
You can test one of MNIST model with the following command:
$ curl -X POST -H "Content-Type: application/json" \
-d "@samples/mnist_input_sample_01.json" \
http://localhost:8080/v2/models/mnist/versions/1/infer
{"model_name":"mnist","model_version":"1","id":null,"parameters":null,"outputs":[{"name":"0","shape":[1,10],"datatype":"UINT8","parameters":null,"data":[0,0,0,1,0,255,0,0,0,0]}]}%
Also, you can run a simple Python code to request the prediction task to the furiosa-server. Here is an example:
import requests
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
url = 'http://localhost:8080/v2/models/mnist/versions/1/infer'
data = np.ndarray(x_train[0:1], dtype=np.uint8).flatten().tolist()
tensor = {
'dataType': 'INT32',
'shape': [1,1,28,28],
'data': data
}
request = {'inputs': [tensor] }
response = requests.post(url, json=request)
print(response.json())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file furiosa_server-0.10.2-py3-none-any.whl
.
File metadata
- Download URL: furiosa_server-0.10.2-py3-none-any.whl
- Upload date:
- Size: 35.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6849070db975c64bff445075b49ba60051dcd9051b4c3c8c0c955e6863b8d9b |
|
MD5 | d6cb83d433484c83545ea2cb19e9515a |
|
BLAKE2b-256 | 88345f15591ae9dd053484ffbbf07dff240c83939649bad97a8aaafa45f74cc2 |