Skip to main content

Furiosa serving

Project description

Furiosa Serving

Furiosa serving is a lightweight library based on FastAPI to make a model server running on a Furiosa NPU.

Dependency

Furiosa serving depends on followings:

Installation

furiosa-serving can be installed from PyPI using pip (note that the package name is different from the importable name)::

pip install 'furiosa-sdk[serving]'

Getting started

There is one main API called ServeAPI. You can think of ServeAPI as a kind of FastAPI wrapper.

Run server

# main.py
from fastapi import FastAPI
from furiosa.serving import ServeAPI

serve = ServeAPI()

# This is FastAPI instance
app: FastAPI = serve.app

You can run uvicorn server via internal app variable from ServeAPI instance like normal FastAPI application

$ uvicorn main:app # or uvicorn main:server.app

Load model

From ServeAPI, you can load your model binary which will be running on a Furiosa NPU. You should specify model name and URI where to load the model. URI can be one of them below

  • Local file
  • HTTP
  • S3

Note that model binary which is now supported by Furiosa NPU should be one of them below

from furiosa.common.thread import synchronous
from furiosa.serving import ServeAPI, ServeModel


serve = ServeAPI()


# Load model from local disk
imagenet: ServeModel = synchronous(serve.model)(
    'imagenet',
    location='./examples/assets/models/image_classification.onnx'
)

# Load model from HTTP
resnet: ServeModel = synchronous(serve.model)(
    'imagenet',
     location='https://raw.githubusercontent.com/onnx/models/main/vision/classification/resnet/model/resnet50-v1-12.onnx'
)

# Load model from S3 (Auth environment variable for aioboto library required)
densenet: ServeModel = synchronous(serve.model)(
    'imagenet',
     location='s3://furiosa/models/93d63f654f0f192cc4ff5691be60fb9379e9d7fd'
)

Define API

From a model you just created, you can define FastAPI path operation decorator like post(), get() to expose API endpoints.

You should follow FastAPI Request Body concept to correctly define payload.

:warning: This example below is not actually working as you have to define your own preprocess(), postprocess() functions first.

from typing import Dict

from fastapi import File, UploadFile
from furiosa.common.thread import synchronous
from furiosa.serving import ServeAPI, ServeModel
import numpy as np


serve = ServeAPI()


model: ServeModel = synchronous(serve.model)(
    'imagenet',
    location='./examples/assets/models/image_classification.onnx'
)

@model.post("/models/imagenet/infer")
async def infer(image: UploadFile = File(...)) -> Dict:
    # Convert image to Numpy array with your preprocess() function
    tensors: List[np.ndarray] = preprocess(image)

    # Infer from ServeModel
    result: List[np.ndarray] = await model.predict(tensors)

    # Classify model from numpy array with your postprocess() function
    response: Dict = postprocess(result)

    return response

After running uvicorn server, you can find documentations provided by FastAPI at localhost:8000/docs

Use sub applications

Furiosa serving provides predefined FastAPI sub applications to give you additional functionalities out of box.

You can mount the sub applications using mount(). We provides several sub applications like below

  • Repository: model repository to list models and load/unload a model dynamically
  • Model: model metadata, model readiness
  • Health: server health, server readiness
from fastapi import FastAPI
from furiosa.serving import ServeAPI
from furiosa.serving.apps import health, model, repository


# Create ServeAPI with Repository instance. This repository maintains models
serve = ServeAPI(repository.repository)

app: FastAPI = serve.app

app.mount("/repository", repository.app)
app.mount("/models", model.app)
app.mount("/health", health.app)

You can also find documentations for the sub applications at localhost:8000/{application}/docs. Note that model sub application has different default doc API like localhost:8000/{application}/api/docs since default doc URL conflicts model API.

Use processors for pre/post processing

Furiosa serving provides several processors which are predefined pre/post process functions to convert your data for each model.

You can directly use the preprocess(), postprocess() from Processor instance or use the Processor in the form of decorator. When used as a decorator, Processor call preprocess() and postprocess() before and after your function respectively.

import numpy as np
from furiosa.common.thread import synchronous
from furiosa.serving import ServeModel, ServeAPI
from furiosa.serving.processors import ImageNet


serve = ServeAPI()

model: ServeModel = synchronous(serve.model)(
    'imagenet',
    location='./examples/assets/models/image_classification.onnx'
)

@model.post("/models/imagenet/infer")
@ImageNet(model=model, label='./examples/assets/labels/ImageNetLabels.txt')  # This makes infer() Callable[[UploadFile], Dict]
async def infer(tensor: np.ndarray) -> np.ndarray:
    return await model.predict(tensor)

For better understanding, this approximately describes how infer() function works internally

# Create processor
processor = ImageNet(model=model, label='./examples/assets/labels/ImageNetLabels.txt')

# API endpoint signature replaced with ImageNet.preprocess()
def infer(image: PIL.image) -> Dict:

    # Preprocess image from API client from processor
    tensor: np.ndarray = processor.preprocess(image)

    # Call your function from tensor above
    output: np.ndarray = infer(tensor)

    # Postprocess output above from processor
    response: Dict = processor.postprocess(output)

    # Return response in the form of Dict which is defined at ImageNet.postprocess()
    return response

Note that you must call processor decorator first to pass correct function signature to FastAPI route decoartor which will be used argument validation.

# Correct:
@model.post("/models/imagenet/infer")
@ImageNet(tensor=model.inputs[0], label='./examples/assets/labels/ImageNetLabels.txt')  # This makes infer() Callable[[UploadFile], Dict]
async def infer(tensor: np.ndarray) -> np.ndarray:
    ...

# Wrong:
@ImageNet(tensor=model.inputs[0], label='./examples/assets/labels/ImageNetLabels.txt')  # This makes infer() Callable[[UploadFile], Dict]
@model.post("/models/imagenet/infer")
async def infer(tensor: np.ndarray) -> np.ndarray:
    ...

Compose models

You can composite multiple models using FastAPI dependency injection.

:warning: This example below is not actually working as there is no SegmentNet in processors yet

from fastapi import Depends
from furiosa.common.thread import synchronous
from furiosa.serving import ServeModel, ServeAPI
from furiosa.serving.processors import ImageNet, SegmentNet


serve = ServeAPI()

imagenet: ServeModel = synchronous(serve.model)(
    'imagenet',
    location='./examples/assets/models/image_classification.onnx'
)

segmentnet: ServeModel = synchronous(serve.model)(
    'segmentnet',
    location='./examples/assets/models/image_segmentation.onnx'
)

# Note that no "imagenet.post()" here not to expose the endpoint
async def classify(image: UploadFile = File(...)) -> List[np.ndarray]:
    tensors: List[np.arrary] = ImageNet(tensor=imagenet.inputs[0]).preprocess(image)
    return await imagenet.predict(tensors)

@segmentnet.post("/models/composed/infer")
async def segment(tensors: List[np.ndarray] = Depends(classify)) -> Dict:
    tensors = await model.predict(tensors)
    return SegmentNet(tensor=segmentnet.inputs[0]).postprocess(tensors)

Example

You can find a complete example at examples/image_classify.py

cd examples

examples$ uvicorn image_classify:serve.app
INFO:furiosa_sdk_runtime._api.v1:loaded dynamic library /home/ys/Furiosa/compiler/npu-tools/target/x86_64-unknown-linux-gnu/debug/libnux.so (0.4.0-dev d1720b938)
INFO:     Started server process [984608]
INFO:uvicorn.error:Started server process [984608]
INFO:     Waiting for application startup.
INFO:uvicorn.error:Waiting for application startup.
[1/6] 🔍   Compiling from tflite to dfg
Done in 0.27935523s
[2/6] 🔍   Compiling from dfg to ldfg
▪▪▪▪▪ [1/3] Splitting graph...Done in 1079.9143s
▪▪▪▪▪ [2/3] Lowering...Done in 93.315895s
▪▪▪▪▪ [3/3] Precalculating operators...Done in 45.07178s
Done in 1218.3285s
[3/6] 🔍   Compiling from ldfg to cdfg
Done in 0.002127793s
[4/6] 🔍   Compiling from cdfg to gir
Done in 0.096237786s
[5/6] 🔍   Compiling from gir to lir
Done in 0.03271749s
[6/6] 🔍   Compiling from lir to enf
Done in 0.48739022s
✨  Finished in 1219.4524s
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

You can find available API in http://localhost:8000/docs#/

Send image to classify a image from server you just launched.

examples$ curl -X 'POST' \
  'http://127.0.0.1:8000/imagenet/infer' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'image=@assets/images/car.jpg'

Code

The code and issue tracker are hosted on GitHub:
https://github.com/furiosa-ai/furiosa-sdk

Contributing

We welcome many types of contributions - bug reports, pull requests (code, infrastructure or documentation fixes). For more information about how to contribute to the project, see the CONTRIBUTING.md file in the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

furiosa-serving-0.6.4.tar.gz (13.7 kB view hashes)

Uploaded Source

Built Distribution

furiosa_serving-0.6.4-py3-none-any.whl (11.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page