Skip to main content

Host your deep learning models easily.

Project description

Ventu

pypi versions Python Test Python document Language grade: Python

Serving the deep learning models easily.

Install

pip install ventu

Features

  • nnly need to implement Model(preprocess, postprocess, inference or batch_inference)
  • request & response data validation using pydantic
  • API document using SpecTree (when run with run_http)
  • backend service using falcon supports both JSON and msgpack
  • dynamic batching with batching using Unix Domain Socket
    • errors in one request won't affect others in the same batch
    • load balancing
  • support all the runtime
  • health check
  • monitoring metrics (Prometheus)
    • if you have multiple workers, remember to setup prometheus_multiproc_dir environment variable to a directory
  • inference warm-up

How to use

  • define your request data schema and response data schema with pydantic
    • add examples to schema.Config.schema_extra[examples] for warm-up and health check (optional)
  • inherit ventu.Ventu, implement the preprocess and postprocess methods
  • for standalone HTTP service, implement the inference method, run with run_http
  • for the worker behind dynamic batching service, implement the batch_inference method, run with run_socket

check the document for API details

Example

The demo code can be found in examples.

Service

Install requirements pip install numpy torch transformers httpx

import argparse
import logging

import numpy as np
import torch
from pydantic import BaseModel, confloat, constr
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

from ventu import Ventu


# request schema used for validation
class Req(BaseModel):
    # the input sentence should be at least 2 characters
    text: constr(min_length=2)

    class Config:
        # examples used for health check and warm-up
        schema_extra = {
            'example': {'text': 'my cat is very cut'},
            'batch_size': 16,
        }


# response schema used for validation
class Resp(BaseModel):
    positive: confloat(ge=0, le=1)
    negative: confloat(ge=0, le=1)


class ModelInference(Ventu):
    def __init__(self, *args, **kwargs):
        # initialize super class with request & response schema, configs
        super().__init__(*args, **kwargs)
        # initialize model and other tools
        self.tokenizer = DistilBertTokenizer.from_pretrained(
            'distilbert-base-uncased')
        self.model = DistilBertForSequenceClassification.from_pretrained(
            'distilbert-base-uncased-finetuned-sst-2-english')

    def preprocess(self, data: Req):
        # preprocess a request data (as defined in the request schema)
        tokens = self.tokenizer.encode(data.text, add_special_tokens=True)
        return tokens

    def batch_inference(self, data):
        # batch inference is used in `socket` mode
        data = [torch.tensor(token) for token in data]
        with torch.no_grad():
            result = self.model(torch.nn.utils.rnn.pad_sequence(data, batch_first=True))[0]
        return result.numpy()

    def inference(self, data):
        # inference is used in `http` mode
        with torch.no_grad():
            result = self.model(torch.tensor(data).unsqueeze(0))[0]
        return result.numpy()[0]

    def postprocess(self, data):
        # postprocess a response data (returned data as defined in the response schema)
        scores = (np.exp(data) / np.exp(data).sum(-1, keepdims=True)).tolist()
        return {'negative': scores[0], 'positive': scores[1]}


def create_model():
    logger = logging.getLogger()
    formatter = logging.Formatter(
        fmt='%(asctime)s - %(levelname)s - %(module)s - %(message)s')
    handler = logging.StreamHandler()
    handler.setFormatter(formatter)
    logger.setLevel(logging.DEBUG)
    logger.addHandler(handler)

    model = ModelInference(Req, Resp, use_msgpack=True)
    return model


def create_app():
    """for gunicorn"""
    return create_model().app


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Ventu service')
    parser.add_argument('--mode', '-m', default='http', choices=('http', 'socket'))
    parser.add_argument('--host', default='localhost')
    parser.add_argument('--port', '-p', default=8080, type=int)
    parser.add_argument('--socket', '-s', default='batching.socket')
    args = parser.parse_args()

    model = create_model()
    if args.mode == 'socket':
        model.run_socket(args.socket)
    else:
        model.run_http(args.host, args.port)

You can run this script as:

  • a single thread HTTP service: python examples/app.py
  • a HTTP service with multiple workers: gunicorn -w 2 -b localhost:8080 'examples.app:create_app()'
    • when run as a HTTP service, can check the follow links:
      • /metrics Prometheus metrics
      • /health health check
      • /inference inference
      • /apidoc/redoc or /apidoc/swagger OpenAPI document
  • an inference worker behind the batching service: python examples/app.py -m socket (need to run the batching service first)

Client

from concurrent import futures

import httpx
import msgpack

URL = 'http://localhost:8080/inference'
HEADER = {'Content-Type': 'application/msgpack'}
packer = msgpack.Packer(
    autoreset=True,
    use_bin_type=True,
)


def request(text):
    return httpx.post(URL, data=packer.pack({'text': text}), headers=HEADER)


if __name__ == "__main__":
    with futures.ThreadPoolExecutor() as executor:
        text = [
            'They are smart',
            'what is your problem?',
            'I hate that!',
            'x',
        ]
        results = executor.map(request, text)
        for i, resp in enumerate(results):
            print(
                f'>> {text[i]} -> [{resp.status_code}]\n'
                f'{msgpack.unpackb(resp.content)}'
            )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ventu-0.4.3.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

ventu-0.4.3-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file ventu-0.4.3.tar.gz.

File metadata

  • Download URL: ventu-0.4.3.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for ventu-0.4.3.tar.gz
Algorithm Hash digest
SHA256 057b7cc5b53207936c3778a3f830ea333c5b32098b6b12ccc715d6fe0e72f094
MD5 19d504c7a09b98f070a0ec74629a819d
BLAKE2b-256 42876fb005767f591c7ee8ccff5387dd12a2a60b67b328fa757c548de38dff20

See more details on using hashes here.

File details

Details for the file ventu-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: ventu-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for ventu-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 724a4190fb123c1a9786d864b9806bde539ddca3547f43ba2b4c96765e971457
MD5 14a69bd0f0bef76f02688a6c16ebe104
BLAKE2b-256 df92aa7d0f1fd3f0053cbc72641597db742229fe5106598c6d5f43418e5102b7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page