Gunicorn Flask based library for serving ML Models, built by the ml-ops and science team at Aylien
Project description
model-serving
Flask based python wrapper for deploying models as a REST based service based on
💡 Flask
💡 Gunicorn
💡 Protobuf3 (optional for schema validation)
🥳 flask-caching
🥳 prometheus metrics
The repo also contains examples of registering end points and a Makefile to run the service
Installation
pip install model-serving
Project Structure
aylien_model_serving
│
|--- requirements.txt
|--- Makefile
│
│
└───app
│ |-- app_factory.py
│ |-- cached_app_factory.py
│
│
└───examples
│-----example_schema.proto
|-----example_schema_pb2.py(autogenerated by protoc)
│-----example_serving_handler.py
|-----example_serving_handler_cached.py
How it works
- It runs a web service on the given port (defaults to
8000
). - Any incoming request JSON will be passed to your
ServingHandler.process_request
- Your
ServingHandler.process_request
is expected to return ajson
- The request and response will be validated with a protobuf schema (optional)
- This library wraps common service code, monitoring, exception handling, etc.
Usage
- Install this library as a dependency for whatever model you want to serve.
- Create a
ServingHandler
(see below for interface details). - Run the make target
make COMMAND_UNCACHED='ServingHandler.run_app()' example-service
Interfaces
The main interface to flask apps defined in app_factory is the process_json
function.
This function expects to receive json input, optionally perform schema
validation, then call the callable_handler
function using each of the fields
in the json object as a keyword argument to the function. The function is expected to
return an object that can be parsed to json and sent as the response.
This design allows for a very simple but powerful interface that can easily make an endpoint out of just about any Python function.
Example Serving Handler
The example serving handler defined here does the following
- Defines a method predict_lang. For the purposes of this example, this returns a static prediction. Ideally would be the prediction or classification from your model.
- Imports a protobuf3 generated .py schema file(only if you require the json message to be schema validated)
- Defines a function process_request that calls the wrapper function process_json with the callable from 1 and schema from 2
- Registers process_request and its route mapping
- Repeat 1-4 for a (route, callable) pair if you have more than one service end point.
import examples.example_schema_pb2 as schema
from aylien_model_serving.app_factory import FlaskAppWrapper, InvalidRequest
def predict_lang(text):
return "en", 0.71
def predict(title=None, body=None, enrichments=None):
if body is None:
body = enrichments["extracted"]["value"]["body"]
if title is None and body is None:
raise InvalidRequest("Missing text")
article_text = f"{title} {body}"
detected_lang, confidence = predict_lang(article_text)
return {
'language': detected_lang,
'confidence': confidence,
'error': 'Not an error',
'version': '0.0.1'
}
def process_request():
return FlaskAppWrapper.process_json(predict, schema=schema)
def run_app():
routes = [
{
"endpoint": "/",
"callable": process_request,
"methods": ["POST"]
}
]
return FlaskAppWrapper.create_app(routes)
Note that the FlaskAppWrapper accepts a callable in the process_json , and if you'd like to load a classifier or model.bin in your memory you could modify it like below 👇
class ClassifyHandler:
def __init__(self):
self.classifier = Classifier() #this is the classifier to load , or a binary in local file storage
def __call__(self, text):
return self.classifier.predict(text)
def run_app():
classify_handler = ClassifyHandler()
routes = [
{
"endpoint": "/classify",
"callable": classify_handler,
"methods": ["POST"]
}
]
return FlaskAppWrapper.create_app(routes)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file model-serving-2.0.6.tar.gz
.
File metadata
- Download URL: model-serving-2.0.6.tar.gz
- Upload date:
- Size: 11.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88f67e4b5e0c5bd5e61d07ea82a0790198f7d39bbd429abc87604dd41a428974 |
|
MD5 | f9ec12a44bdeb58815afdddaf395815b |
|
BLAKE2b-256 | 8ad7faa7b78c118d8595293ce3193c51946c4eab53fcba454207ff04ed5639b2 |