Skip to main content

A containerized service for neural machine translation

Project description

sockeye-serving

sockeye-serving is a containerized service for neural machine translation that uses Amazon's sockeye framework as the translation engine. The web server makes use of mxnet-model-server, which provides a management API for loading models and a prediction API for requesting translations.

Any Sockeye model can be loaded via the management API. Text preprocessing is built into the request pipeline and supports a wide variety of languages. Specialized processing for specific languages can be implemented using custom handlers.

Quickstart

This example shows how to serve an existing model for Chinese to English translation. First, pull the latest Docker image:

docker pull jwoo11/sockeye-serving

Download the example model archive (MAR). This is a ZIP archive containing the parameter files and scripts needed to run translation:

Extract the MAR file to /tmp/models. This directory will be the source for a bind mount for Docker:

unzip -d /tmp/models/zh zh.mar

Start the server:

docker run -itd --name sockeye_serving -p 8080:8080 -p 8081:8081 -v /tmp/models:/opt/ml/model jwoo11/sockeye-serving

Now, load the model using the management API. Note that the URL of the model is relative to the bind mount:

curl -X POST "http://localhost:8081/models?synchronous=true&initial_workers=1&url=zh"

Get the status of the model with the following:

curl -X GET "http://localhost:8081/models/zh"

The response should look like this:

{
  "modelName": "zh",
  "modelUrl": "zh",
  "runtime": "python3",
  "minWorkers": 1,
  "maxWorkers": 1,
  "batchSize": 1,
  "maxBatchDelay": 100,
  "workers": [
    {
      "id": "9000",
      "startTime": "2019-01-26T00:49:10.431Z",
      "status": "READY",
      "gpu": false,
      "memoryUsage": 601395200
    }
  ]
}

To translate text use the inference API. Notice that the port is different from above.

curl -X POST "http://localhost:8080/predictions/zh" -H "Content-Type: application/json" \
    -d '{ "text": "我的世界是一款開放世界遊戲,玩家沒有具體要完成的目標,即玩家有超高的自由度選擇如何玩遊戲" }'

The translation quality depends on the model. The provided model returns this translation:

{
  "translation": "in my life was a life of a life of a public public, and a public, a time, a video, a play, which, it was a time of a time of a time."
}

A better model trained on more data returns this response:

{
  "translation": "My world is an open world game, and players have no specific goal to accomplish, that is, players have a high degree of freedom to choose how to play."
}

Installation

To install the command line clients for sockeye-serving run the following in a virtual environment:

pip install sockeye-serving

If you want to install from source, a Pipfile is provided. Clone the repository and run pipenv install.

Installation places the command line interfaces sockeye-serving and sockeye-client on your virtual environment's path.

Command Line Interfaces

You can use sockeye-serving to easily start Docker and to make REST calls to both the management and prediction APIs. First, a configuration file must be placed in either the current directory or some place referenced by SOCKEYE_SERVING_CONF. Example properties are located in config/sockeye-serving.conf. Here's some basic usage:

# start the Docker container
sockeye-serving start

# deploy a model
sockeye-serving deploy zh

# list available models
sockeye-serving list

# translate text
sockeye-serving translate zh "my text"

# upload a file for translation
sockeye-serving upload zh "my_file.txt"

Run sockeye-serving help for a full list of commands.

The Python client takes a YAML configuration file. An example configuration is in config/sockeye-client.yml. This client does not support restarting Docker, however, it does exercise the full API provided by mxnet-model-server. The commands which accept query parameters are below:

$ sockeye-client deploy -h
usage: sockeye-client deploy [-h] [-m MODEL_NAME] [-x HANDLER] [-r RUNTIME]
                             [-b BATCH_SIZE] [-d MAX_BATCH_DELAY]
                             [-i INITIAL_WORKERS] [-s] [-t RESPONSE_TIMEOUT]
                             url
...

$ sockeye-client list -h
usage: sockeye-client list [-h] [-l LIMIT] [-t NEXT_PAGE_TOKEN]
...

$ sockeye-client scale -h
usage: sockeye-client scale [-h] [-a MIN_WORKER] [-b MAX_WORKER]
                            [-n NUMBER_GPU] [-s] [-t TIMEOUT]
                            model_name
...

Run sockeye-client -h to show a full list of commands. For more information on the API, see additional documentation for mxnet-model-server.

Jupyter Notebook

If you want to translate text with Jupyter, you can use notebooks/machine_translation.ipynb. Make sure requests is installed in your Python environment.

Choosing between CPUs and GPUs

sockeye-serving provides different image tags for CPUs and GPUs. You can set the desired tag in your sockeye-serving.conf file. You'll also need to specify a Sockeye config file sockeye-args.txt. This file contains arguments passed to the Sockeye translation engine. Example files for both CPU and GPU configs are under config/sockeye.

To use GPUs, ensure nvidia-docker is installed on the host machine. In sockeye-serving.conf set the image tag to one with "gpu" in its name, such as latest-gpu, and set docker_exec="nvidia-docker". Then run sockeye-serving update MODEL_NAME config/sockeye/gpu/sockeye-args.txt.

For CPUs, use a tag without "gpu" in its name, such as latest, and use the CPU version of the Sockeye config file. The changes to sockeye-serving.conf will be picked up when you run sockeye-serving start.

Initializing Models

Each model must be initialized with a MANIFEST.json file in order for mxnet-model-server to deploy it. An easy way to initialize a model is to run sockeye-serving archive MODEL_NAME HANDLER, where HANDLER is the name of a Python handler module under src/sockeye_serving. The provided handlers include ko_handler (Korean), zh_handler (Chinese), and default_handler (generic). After running the archive command, your model directory should have a file MAR-INF/MANIFEST.json that looks like:

{
  "runtime": "python3",
  "model": {
    "modelName": "zho",
    "handler": "sockeye_serving.zh_handler:handle"
  },
  "modelServerVersion": "1.0",
  "implementationVersion": "1.0",
  "specificationVersion": "1.0"
}

Enabling TLS

The provided configuration instructs the server to use plain HTTP. To enable TLS, you can either supply a Java keystore or a private key and certificate in PEM format.

Using config/config.properties as a starting point, create a new config.properties file and save it under /tmp/models:

model_store=/opt/ml/model
inference_address=https://0.0.0.0:8443
management_address=https://0.0.0.0:8444

Suppose you have a key pair residing on the host at /path/to/certs. Set the properties for the keystore:

keystore=/path/to/certs/keystore.p12
keystore_pass=changeit
keystore_type=PKCS12

Or provide the path to the server's private key and certificate:

private_key_file=/path/to/certs/private.key
certificate_file=/path/to/certs/cert.pem

Then start the container:

docker run -itd --name sockeye_serving -p 8443:8443 -p 8444:8444 \
    -v /path/to/certs:/path/to/certs \
    -v /tmp/models:/opt/ml/model jwoo11/sockeye-serving \
    mxnet-model-server --start --mms-config /opt/ml/model/config.properties

To make requests using curl you should ensure that you set --cert, --key, and --cacert as needed.

Additional Documentation

For more information on mxnet-model-server, see:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sockeye-serving-2.1.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sockeye_serving-2.1.0-py3-none-any.whl (75.3 kB view details)

Uploaded Python 3

File details

Details for the file sockeye-serving-2.1.0.tar.gz.

File metadata

  • Download URL: sockeye-serving-2.1.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.7

File hashes

Hashes for sockeye-serving-2.1.0.tar.gz
Algorithm Hash digest
SHA256 2af8e03208cc0d0c17dbca695831d929ef269d95ad5233451ea55f3748ee6db8
MD5 eafd9e927b1fad71f6608d1e5add348c
BLAKE2b-256 bf6726402e14c1d4585dce8fd2985f0045164edc5af25b2a8ab167de9124a863

See more details on using hashes here.

File details

Details for the file sockeye_serving-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: sockeye_serving-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 75.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.7

File hashes

Hashes for sockeye_serving-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1cefaac58bf203425130ce7ef6a3626de92ee4affeead4ebcd3007b120f22ccd
MD5 a961c807aca47aeb03858178d429b2ae
BLAKE2b-256 ed1b5be413b659c6e0102a5abf6c02a3b1205d42325f5deda26f4051e0f0659f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page