Gunicorn Flask based library for serving ML Models, built by the ml-ops and science team at Aylien
Project description
model-serving
Flask based python wrapper for deploying models as a REST based service based on
💡 Flask
💡 Gunicorn
💡 Protobuf3 (optional for schema validation)
🥳 flask-caching
🥳 prometheus metrics
The repo also contains examples of registering end points and a Makefile to run the service
Installation
pip install model-serving
Project Structure
aylien_model_serving
│
|--- requirements.txt
|--- Makefile
│
│
└───app
│ |-- app_factory.py
│ |-- cached_app_factory.py
│
│
└───examples
│-----example_schema.proto
|-----example_schema_pb2.py(autogenerated by protoc)
│-----example_serving_handler.py
|-----example_serving_handler_cached.py
How it works
- It runs a web service on the given port (defaults to
8000
). - Any incoming request JSON will be passed to your
ServingHandler.process_request
- Your
ServingHandler.process_request
is expected to return ajson
- The request and response will be validated with a protobuf schema (optional)
- This library wraps common service code, monitoring, exception handling, etc.
Usage
- Install this library as a dependency for whatever model you want to serve.
- Create a
ServingHandler
(see below for interface details). - Run the make target
make COMMAND_UNCACHED='ServingHandler.run_app()' example-service
Interfaces
The main interface to flask apps defined in app_factory is the process_json
function.
This function expects to receive json input, optionally perform schema
validation, then call the callable_handler
function using each of the fields
in the json object as a keyword argument to the function. The function is expected to
return an object that can be parsed to json and sent as the response.
This design allows for a very simple but powerful interface that can easily make an endpoint out of just about any Python function.
Example Serving Handler
The example serving handler defined here does the following
- Defines a method predict_lang. For the purposes of this example, this returns a static prediction. Ideally would be the prediction or classification from your model.
- Imports a protobuf3 generated .py schema file(only if you require the json message to be schema validated)
- Defines a function process_request that calls the wrapper function process_json with the callable from 1 and schema from 2
- Registers process_request and its route mapping
- Repeat 1-4 for a (route, callable) pair if you have more than one service end point.
import examples.example_schema_pb2 as schema
from aylien_model_serving.app_factory import FlaskAppWrapper, InvalidRequest
def predict_lang(text):
return "en", 0.71
def predict(title=None, body=None, enrichments=None):
if body is None:
body = enrichments["extracted"]["value"]["body"]
if title is None and body is None:
raise InvalidRequest("Missing text")
article_text = f"{title} {body}"
detected_lang, confidence = predict_lang(article_text)
return {
'language': detected_lang,
'confidence': confidence,
'error': 'Not an error',
'version': '0.0.1'
}
def process_request():
return FlaskAppWrapper.process_json(predict, schema=schema)
def run_app():
routes = [
{
"endpoint": "/",
"callable": process_request,
"methods": ["POST"]
}
]
return FlaskAppWrapper.create_app(routes)
Note that the FlaskAppWrapper accepts a callable in the process_json , and if you'd like to load a classifier or model.bin in your memory you could modify it like below 👇
class ClassifyHandler:
def __init__(self):
self.classifier = Classifier() #this is the classifier to load , or a binary in local file storage
def __call__(self, text):
return self.classifier.predict(text)
def run_app():
classify_handler = ClassifyHandler()
routes = [
{
"endpoint": "/classify",
"callable": classify_handler,
"methods": ["POST"]
}
]
return FlaskAppWrapper.create_app(routes)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.