functions to make working with TorchServe quik-er
Project description
serve-quik
For creating torch archived transformer models and TorchServe containers (much quik-er)
Summary
Quick Start
The process of building a torch model archive, building a torch serve container, determining the appropriate ports, and testing your container can be tedious, so I tried to create an automated process for all of this. Most of my automation is centered around MarianMT models, but can be used for other models (I use it for a BERT model). For instance, to build and deploy a container with some MarianMT models that will translate from Brazilian Portuguese, Chinese, French, German, Japanese, Korean, and Spanish to English, you could run the following:
python serve-quik -p "text-translate" -mt marianmt -src ja de es fr bzs zh ko
Not only is everything set up, but the container is up and running on the server:
$ docker ps --filter name=local_text_translate --format "table {{.Names}}\t{{.Image}}\t{{.RunningFor}}"
NAMES IMAGE CREATED
local_text_translate serve-text-translate:latest 3 seconds ago
$ docker ps --filter name=local_text_translate --format "table {{.Ports}}"
PORTS
7070-7071/tcp, 8081/tcp, 0.0.0.0:8180->8080/tcp, :::8180->8080/tcp, 0.0.0.0:8182->8082/tcp, :::8182->8082/tcp
Note: serve-quik realized that ports 8080 and 8082 were being used by the host, and built this container to use 8180 and 8182. For more information, see Dockerfile automation
Inference Testing
Now the container is up and running, to test this, you can then run:
>>> import serve_quik as sq
>>> import pandas as pd
>>> import numpy as np
>>>
>>> PORT = 8180
>>>
>>> text_dict = {
... "opus-mt-ja-en": ["口は災いの元"],
... "opus-mt-de-en": ["Alles hat ein Ende, nur die Wurst hat zwei"],
... "opus-mt-es-en": ["Es tan corto el amor y tan largo el olvido"],
... "opus-mt-zh-en": [" 笑一笑,十年少"],
... "opus-mt-fr-en": ["Dans une grande âme tout est grand"],
... "opus-mt-bzs-en": ["Quando a alma fala, já não fala nada"],
... "opus-mt-ko-en": ["멈추지 말고 계속 해나가기만 한다면 늦어도 상관없다."],
... }
>>> res = pd.DataFrame()
>>> for key, value in text_dict.items():
... x = np.array(value, dtype='object')
... url = f"http://localhost:{PORT}/predictions/{key}"
... sr = sq.api.ServeRequest(x, 2, url)
... df = sr.batch_inference()
... res = pd.concat([res, df])
...
INFO:serve_quik.api:Batch 0, status_code: 200
INFO:serve_quik.api:Batch 0, status_code: 200
INFO:serve_quik.api:Batch 0, status_code: 200
INFO:serve_quik.api:Batch 0, status_code: 200
INFO:serve_quik.api:Batch 0, status_code: 200
INFO:serve_quik.api:Batch 0, status_code: 200
INFO:serve_quik.api:Batch 0, status_code: 200
>>> print(res)
translation
0 The mouth is a curse.
0 Everything has an end, only the sausage has two
0 Love is so short and forgetfulness so long
0 Smile. Ten years is short.
0 In a great soul everything is great
0 When the soul speaks, there is nothing else
0 I don't care if it's too late.
Process
Pretty cool right? But what exactly is being automated? Hypothetically, any huggingface.co
tokenizer and model could be placed into a torch archive and served with TorchServe. serve-quik completes the following four steps for this to happen:
- Builds a project directory
- Builds a model directory (or multiple directories)
- Builds a model-archive file
- Builds and runs the serving container
Set Directories
Build a project directory
Building a project directory allows all files/config required for a serving container to be located in one place. Doing this will both create the directory, but also let the rest of the process know where your project is stored. Second, there's a .env file added to the directory. This can be done manually, but there are automations if you use serve-quik. These are both done with:
from serve_quik import container, utils
serve_dir = utils.set_serve_dir(args.project_name)
container.build_dot_env(serve_dir)
Build a model directory
As multiple models can be served from a container, you can do this multiple times (e.g. different models for different translations). Here we'll just create the dir for later use. args.module_type
can be something like "marianmt", and kwargs would be the source langage (e.g. "es") and the target language (e.g. "en"):
model_dir = utils.set_model_dir(
serve_dir,
args.model_type,
args.kwargs
)
Pull and prepare tokenizer
I've only implemented BERT, RoBERTA, and MarianMT, but more are to come. The tokenizer functions do the following:
- maps a string to a model name and tokenizer, such as:
bert
tobert-base-uncased
andBertTokenizer
roberta
toroberta-base
andRobertaTokenizer
marianmt
(with source and target like es and en) toHelsinki-NLP/opus-mt-es-en
andMarianTokenizer
- pulls the appropriate tokenizer, then converts the cached tokenizer files to the input files
config.json
,tokenizer_config.json
,special_tokens_map.json
, then does the same for the tokenizer specific files, such as :index_to_name.json
,sample_text.txt
,vocab.txt
for sequence_classification modelsvocab.json
,source.spm
andtarget.spm
for sequence_to_sequence models
Pull and prepare a serialized model
To prepare a model, you pull it, add model weights, and then save it. If you are using the pretrained model as-is, you can just provide the weights already in the model. The steps are:
- mapping a string to a model name, such as:
bert
toBertForSequenceClassification
roberta
toRobertaForSequenceClassification
marianmt
toMarianMTModel
- pulling the pretrained model
- builds the model archive's
setup_config.json
file with defaults
Note: If you aren't providing your own trained weights (
state_dict
), you can just provide back the original weights to the pulled modelmodel.state_dict()
Choose a handler
Huggingface created a great example handler which I used to use, but there are some captum
dependencies that I don't use, and there's not an example for a sequence to sequence model like I use, so I rebuilt it. Feel free to use mine, but if you want to use your own, make sure to copy it into the directory where your mar files will be, and add handler="yourhandlername.py
to create_mar
in the next section.
Build a Model Archive
PyTorch has a helpful feature called Torch Model archiver for TorchServe. It's a command line tool that will pull the 6+ tokenizer files, setup_config.json
, and the serialized model (i.e. pytorch_model.bin
) into one model archive (mar) file.
Setup
If you install serve-quik
, it install this for you, but you should know it's a completely separate package you'd install:
pip install torch-model-archiver
The command for creating the mar would be something like this:
torch-model-archiver
--model-name=text-translate
--version=1.0
--serialized-file= <serialized_file>
--handler=<handler_file>
--extra-files "<file_1><file_2><file_3><file_4><file_5><file_6>"
--export-path=<export_dir}>
It seems simple, but the directory structure can make it difficult, so serve-quik does this for you. If you've saved your tokenizer and model, then:
>>> import serve_quik as sq
>>> from pathlib import Path
>>>
>>> dir = Path.cwd().joinpath('opus-mt-en-es')
>>> sq.mar.create_mar(model_type="marianmt", model_dir=dir)
INFO:serve_quik.mar:torch archive opus-mt-en-es.mar created
Dockerfile automation
Usually a container is built with a Dockerfile, docker-compose, or both. Although most TorchServe API containers are similar, there will always be differences, such as port numbers and container name. serve-quik takes these steps:
- Determine ports: Search for ports similar to the container's 8080 for the Inference API and 8082 for the Metrics API that aren't being used (in a **80 and **82 pattern)
- Build
.env
file: In order to use a common Dockerfile and docker-compose, a.env
is built withCONTAINER_DIR
,IMAGE_NAME
,CONTAINER_NAME
,DIR_NAME
,API_PORT
, andMETRIC_PORT
. - Build and start container: Using the model archive directory, docker-compose directory, and
.env
file, build a torchserve container, and start it on the determined ports. the basic process is tocd
to theserve_quik/container
directory, and run:
$ docker-compose --project-directory=<yourprojectdirectory> up --detach
serve-quik will figure this out for you, and you can just run:
import serve_quik as sq
dir = Path.cwd().joinpath('text-translation')
sq.container.start_container(dir)
To summarize, I have a main.py
in serve_quik, but here is what you'd likely run:
import serve_quik as sq
LANGS = {source='es', target='en'}
serve_dir = sq.utils.set_serve_dir("text-translation")
model_dir = utils.set_model_dir(serve_dir, "marianmt", LANGS)
)
sq.container.build_dot_env(serve_dir)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file serve-quik-0.0.2.tar.gz
.
File metadata
- Download URL: serve-quik-0.0.2.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09951de7d603be2743a432b5d068093c9ba54a2e16c059926a3e3a899d29cb9e |
|
MD5 | 0070e7759b2dc39ae9c941d283707583 |
|
BLAKE2b-256 | aa08fd2e8b1a1ed7bd90bd4cec4fc909b573a35414234b5fb6a2613beb948776 |
File details
Details for the file serve_quik-0.0.2-py2.py3-none-any.whl
.
File metadata
- Download URL: serve_quik-0.0.2-py2.py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c175f232f3984e083b659d3f77652e0ac80d2a0cde9493b54541d994fd5ebcef |
|
MD5 | 8db872daf1f88ada515ab1970245b542 |
|
BLAKE2b-256 | ad55c8d3a16fc9b42f3812454ec8371e55119e88b668eff4b90ef102fa98ffe0 |