Skip to main content

Host your deep learning models easily.

Project description

Ventu

pypi versions Python Test Python document Language grade: Python

Serving the deep learning models easily.

Install

pip install vento

Features

  • Only need to implement Model(preprocess, postprocess, inference or batch_inference)
  • request & response data validation using pydantic
  • API document using SpecTree (when run with run_http)
  • backend service using falcon supports both JSON and msgpack
  • dynamic batching with batching using Unix Domain Socket
    • errors in one request won't affect others in the same batch
    • load balancing
  • support all the runtime
  • health check
  • monitoring metrics (Prometheus)
    • if you have multiple workers, remember to setup prometheus_multiproc_dir environment variable to a directory
  • inference warm-up

How to use

  • define your request data schema and response data schema with pydantic
    • add examples to schema.Config.schema_extra[examples] for warm-up and health check (optional)
  • inherit ventu.Ventu, implement the preprocess and postprocess methods
  • for standalone HTTP service, implement the inference method, run with run_http
  • for the worker behind dynamic batching service, implement the batch_inference method, run with run_socket

check the document for API details

Example

Dynamic Batching Demo

Server

Need to run the batching server first.

The demo code can be found in batching demo.

import logging
from pydantic import BaseModel
from ventu import Ventu


# request schema
class Req(BaseModel):
    num: int

    # request examples, used for health check and inference warm-up
    class Config:
        schema_extra = {
            'examples': [
                {'num': 23},
                {'num': 0},
            ]
        }


# response schema
class Resp(BaseModel):
    square: int

    # response examples, should be the true results for request examples
    class Config:
        schema_extra = {
            'examples': [
                {'square': 23 * 23},
                {'square': 0},
            ]
        }


class ModelInference(Ventu):
    def __init__(self, *args, **kwargs):
        # init parent class
        super().__init__(*args, **kwargs)

    def preprocess(self, data: Req):
        return data.num

    def batch_inference(self, data):
        return [num ** 2 for num in data]

    def postprocess(self, data):
        return {'square': data}


if __name__ == "__main__":
    logger = logging.getLogger()
    formatter = logging.Formatter(
        fmt='%(asctime)s - %(levelname)s - %(module)s - %(message)s')
    handler = logging.StreamHandler()
    handler.setFormatter(formatter)
    logger.setLevel(logging.DEBUG)
    logger.addHandler(handler)

    model = ModelInference(Req, Resp, use_msgpack=True)
    model.run_socket('batching.socket')

Client

from concurrent import futures
import httpx
import msgpack


URL = 'http://localhost:8080'
packer = msgpack.Packer(
    autoreset=True,
    use_bin_type=True,
)


def request(text):
    return httpx.post(URL, data=packer.pack({'num': text}))


if __name__ == "__main__":
    with futures.ThreadPoolExecutor() as executor:
        text = (0, 'test', -1, 233)
        results = executor.map(request, text)
        for i, resp in enumerate(results):
            print(
                f'>> {text[i]} -> [{resp.status_code}]\n'
                f'{msgpack.unpackb(resp.content, raw=False)}'
            )

Single Service Demo

source code can be found in single_service_demo.py

import logging
import pathlib
from typing import Tuple

import numpy
import onnxruntime
from pydantic import BaseModel

from ventu import Ventu


# define the input schema
class Input(BaseModel):
    text: Tuple[(str,) * 3]

    # provide an example for health check and inference warm-up
    class Config:
        schema_extra = {
            'examples': [
                {'text': ('hello', 'world', 'test')},
            ]
        }


# define the output schema
class Output(BaseModel):
    label: Tuple[(bool,) * 3]


class CustomModel(Ventu):
    def __init__(self, model_path, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # load model
        self.sess = onnxruntime.InferenceSession(model_path)
        self.input_name = self.sess.get_inputs()[0].name
        self.output_name = self.sess.get_outputs()[0].name

    def preprocess(self, data: Input):
        # data format is defined in ``Input``
        words = [sent.split(' ')[:4] for sent in data.text]
        # padding
        words = [word + [''] * (4 - len(word)) for word in words]
        # build embedding
        emb = [[
            numpy.random.random(5) if w else [0] * 5
            for w in word]
            for word in words]
        return numpy.array(emb, dtype=numpy.float32)

    def inference(self, data):
        # model inference
        return self.sess.run([self.output_name], {self.input_name: data})[0]

    def postprocess(self, data):
        # generate the same format as defined in ``Output``
        return {'label': [bool(numpy.mean(d) > 0.5) for d in data]}


def create_model():
    logger = logging.getLogger()
    formatter = logging.Formatter(fmt='%(asctime)s - %(levelname)s - %(module)s - %(message)s')
    handler = logging.StreamHandler()
    handler.setFormatter(formatter)
    logger.setLevel(logging.DEBUG)
    logger.addHandler(handler)

    model_path = pathlib.Path(__file__).absolute().parent / 'sigmoid.onnx'
    model = CustomModel(str(model_path), Input, Output)
    return model


def create_app():
    return create_model().app


if __name__ == "__main__":
    model = create_model()
    model.run_http(host='localhost', port=8000)

    """
    # try with `httpie`
    ## health check
        http :8000/health
    ## inference
        http POST :8000/inference text:='["hello", "world", "test"]'
    """

try with httpie

# health check
http :8000/health
# inference
http POST :8000/inference text:='["hello", "world", "test"]'

Open localhost:8000/apidoc/redoc in your browser to see the API document.

Run with Gunicorn

gunicorn -w 2 'example.single_service_demo:create_app()'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ventu-0.4.1.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

ventu-0.4.1-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file ventu-0.4.1.tar.gz.

File metadata

  • Download URL: ventu-0.4.1.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for ventu-0.4.1.tar.gz
Algorithm Hash digest
SHA256 5106799a93097458c325438456f64186825dcc5a21d4908efdae63cf6a3aa9d7
MD5 37103d11865d5395f5919ae14ec05900
BLAKE2b-256 dbadebb45d7adf09f9db6746440cbd9e091b0762a527d636b7e761a617650717

See more details on using hashes here.

File details

Details for the file ventu-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: ventu-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for ventu-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3dfd88d4d802b20a465a26fef7ae5226d6676814eb1afc9138b36c8cca17062f
MD5 e302250206e68cfe7c7a77304e814cf1
BLAKE2b-256 64f08d5c1c606f8398e796ed3338de3f6266d99dc6941b0b2fd3c3a17f0e8600

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page