Furiosa model server
Project description
Furiosa Model Server (Alpha)
Furiosa Model Server is a framework for serving Tflite/ONNX models through a REST API, using Furiosa NPUs.
Furiosa Model server API supoorts a REST and gRPC interface, compliant with KFServing's V2 Dataplane specification and Triton's Model Repository specification.
Features
- HTTP REST API support
- Multi-model support
- GRPC support
- OpenAPI specification support
- Compiler configuration support
- Input tensor adapter in Python (e.g., converting jpeg, png image files to tensors)
- Authentication support
Building for Development
Requirements
- Python >= 3.7
- libnpu
- libnux
Install apt depdencies.
sudo apt install furiosa-libnpu-sim # or furiosa-libnpu-xrt if you have Furiosa H/W
sudo apt install furiosa-libnux
Install Python dependencies.
pip install -e .
To build source, generate required files from grpc tools and datamodel-codegen. Each step is needed to generate a GRPC stub and pydantic data class.
Generate GRPC API
for api in "predict" "model_repository"
do
python -m grpc_tools.protoc \
-I"./proto" \
--python_out="./furiosa/server/api/grpc/generated" \
--grpc_python_out="./furiosa/server/api/grpc/generated" \
--mypy_out="./furiosa/server/api/grpc/generated" \
"./proto/$api.proto"
done
Generate Pydantic data type
for api in "predict" "model_repository"
do
datamodel-codegen \
--input "./openapi/$api.yaml" \
--output "./furiosa/server/types/$api.py"
done
Testing
furiosa-server$ pytest --capture=no
============================================================ test session starts =============================================================
platform linux -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/ys/Furiosa/cloud/furiosa-server
plugins: asyncio-0.15.1
collected 10 items
tests/test_server.py [1/6] 🔍 Compiling from tflite to dfg
Done in 0.006840319s
[2/6] 🔍 Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 47.121174s
▪▪▪▪▪ [2/3] Lowering...Done in 19.422386s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 0.27680752s
Done in 66.82971s
[3/6] 🔍 Compiling from ldfg to cdfg
Done in 0.000951856s
[4/6] 🔍 Compiling from cdfg to gir
Done in 0.028555028s
[5/6] 🔍 Compiling from gir to lir
Done in 0.01069514s
[6/6] 🔍 Compiling from lir to enf
Done in 0.05054388s
✨ Finished in 66.980644s
.........[1/6] 🔍 Compiling from tflite to dfg
Done in 0.005259287s
[2/6] 🔍 Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 0.003461787s
▪▪▪▪▪ [2/3] Lowering...Done in 7.16337s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 0.31032142s
Done in 7.4865813s
[3/6] 🔍 Compiling from ldfg to cdfg
Done in 0.001077142s
[4/6] 🔍 Compiling from cdfg to gir
Done in 0.02613672s
[5/6] 🔍 Compiling from gir to lir
Done in 0.012959026s
[6/6] 🔍 Compiling from lir to enf
Done in 0.058442567s
✨ Finished in 7.642151s
.
======================================================= 10 passed in 76.17s (0:01:16) ========================================================
Installing
Requirements
- Python >= 3.7
Download the latest release from https://github.com/furiosa-ai/furiosa-server/releases.
pip install furiosa_server-x.y.z-cp38-cp38-linux_x86_64.whl
Usages
Command lines
furiosa-server command has the following options.
To print out the command line usage, you can run furiosa-server --help option.
Usage: furiosa-server [OPTIONS]
Start serving models from FuriosaAI model server
Options:
--log-level [ERROR|INFO|WARN|DEBUG|TRACE]
[default: INFO]
--model-path TEXT Path to Model file (tflite, onnx are
supported)
--model-name TEXT Model name used in URL path
--model-version INTEGER Model version used in URL path
--host TEXT IP address to bind [default: 0.0.0.0]
--http-port INTEGER HTTP port to listen to requests [default:
8080]
--model-config PATH Path to a config file about models with
specific configurations
--server-config PATH Path to Model file (tflite, onnx are
supported)
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to
copy it or customize the installation.
--help Show this message and exit.
Serving a single model
To serve a single model, you will need only a couple of command line options. The following is an example to startup a model server with the specific model name and the model image file:
$ furiosa-server --model-name mnist --model-path samples/data/MNIST_inception_v3_quant.tflite --model-version 1
find native library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/release/
INFO:furiosa.runtime._api.v1:loaded dynamic library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/release/libnux.so (0.4.0-dev bdde0748b)
[1/6] 🔍 Compiling from tflite to dfg
Done in 0.04330982s
[2/6] 🔍 Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 38.590836s
▪▪▪▪▪ [2/3] Lowering...Done in 26.293291s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 2.2485964s
Done in 67.13952s
[3/6] 🔍 Compiling from ldfg to cdfg
Done in 0.000349475s
[4/6] 🔍 Compiling from cdfg to gir
Done in 0.07628228s
[5/6] 🔍 Compiling from gir to lir
Done in 0.002296112s
[6/6] 🔍 Compiling from lir to enf
Done in 0.06429358s
✨ Finished in 67.361084s
INFO: Started server process [235857]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
You can find and try APIs via openapi: http://localhost:8080/docs#/
Serving multiple models
To serve multiple models, you need to write a model configuration file.
The following is an example file located at samples/model_config_example.yml:
model_config_list:
- name: mnist
path: "samples/data/MNISTnet_uint8_quant.tflite"
version: 1
npu_device: npu0pe0
compiler_config:
keep_unsignedness: true
split_unit: 0
- name: ssd
path: "samples/data/tflite/SSD512_MOBILENET_V2_BDD_int_without_reshape.tflite"
version: 1
npu_device: npu1
In a model configuration file, you can also specify a NPU device name dedicated to serve a certain model, and a list of compiler configs as shown in the above example.
If you write a model config file, you can launch the model server with a specific model config file as follow:
$ furiosa-server --model-config samples/model_config_example.yaml
find native library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/release/
INFO:furiosa.runtime._api.v1:loaded dynamic library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/release/libnux.so (0.4.0-dev bdde0748b)
[1/6] 🔍 Compiling from tflite to dfg
Done in 0.000510351s
[2/6] 🔍 Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 1.5242418s
▪▪▪▪▪ [2/3] Lowering...Done in 0.41843188s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 0.00754911s
Done in 1.9507353s
[3/6] 🔍 Compiling from ldfg to cdfg
Done in 0.000069757s
[4/6] 🔍 Compiling from cdfg to gir
Done in 0.005654631s
[5/6] 🔍 Compiling from gir to lir
Done in 0.000294499s
[6/6] 🔍 Compiling from lir to enf
Done in 0.003239762s
✨ Finished in 1.9631383s
[1/6] 🔍 Compiling from tflite to dfg
Done in 0.010595854s
[2/6] 🔍 Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 36.860104s
▪▪▪▪▪ [2/3] Lowering...Done in 8.500944s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 1.2011535s
Done in 46.564877s
[3/6] 🔍 Compiling from ldfg to cdfg
Done in 0.000303809s
[4/6] 🔍 Compiling from cdfg to gir
Done in 0.07403221s
[5/6] 🔍 Compiling from gir to lir
Done in 0.001839668s
[6/6] 🔍 Compiling from lir to enf
Done in 0.07413657s
✨ Finished in 46.771423s
INFO: Started server process [245257]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
Submitting inference tasks
The following is an example of a request message. If you want to know the schema of the request message, please refer to openapi specication.
{"inputs": [{"name": "mnist", "datatype": "INT32", "shape": [1, 1, 28, 28], "data": ...}]}
You can test one of MNIST model with the following command:
$ curl -X POST -H "Content-Type: application/json" \
-d "@samples/mnist_input_sample_01.json" \
http://localhost:8080/v2/models/mnist/versions/1/infer
{"model_name":"mnist","model_version":"1","id":null,"parameters":null,"outputs":[{"name":"0","shape":[1,10],"datatype":"UINT8","parameters":null,"data":[0,0,0,1,0,255,0,0,0,0]}]}%
Also, you can run a simple Python code to request the prediction task to the furiosa-server. Here is an example:
import requests
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
url = 'http://localhost:8080/v2/models/mnist/versions/1/infer'
data = np.ndarray(x_train[0:1], dtype=np.uint8).flatten().tolist()
tensor = {
'dataType': 'INT32',
'shape': [1,1,28,28],
'data': data
}
request = {'inputs': [tensor] }
response = requests.post(url, json=request)
print(response.json())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file furiosa-server-0.6.2.tar.gz.
File metadata
- Download URL: furiosa-server-0.6.2.tar.gz
- Upload date:
- Size: 30.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b49762b7e99ec99a85a942189fab550ee39b4c49716d56942b675b8b74db9ad
|
|
| MD5 |
1dba4a8503ce34078514c3afd0299e3e
|
|
| BLAKE2b-256 |
eac6f9ad42ff2ee94c694af1ea002f8df2708127b2e3c46bf63cc79fe14f3166
|
File details
Details for the file furiosa_server-0.6.2-py3-none-any.whl.
File metadata
- Download URL: furiosa_server-0.6.2-py3-none-any.whl
- Upload date:
- Size: 34.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ff627f9bcd9e79a38bed0c4c44407f7b6b34d0911a6725cb308dfd62e8df839
|
|
| MD5 |
314a24a154dd126e6cb6c4e0ae845d4f
|
|
| BLAKE2b-256 |
ed36fe717bd68337e04f16d17f5b775af99581c78c12f5a04e3ca8e13572b8e4
|