Skip to main content
Join the official 2020 Python Developers SurveyStart the survey!

BentoML: Package and Deploy Your Machine Learning Models

Project description


From a model in ipython notebook to production API service in 5 minutes.

project status build status pypi status python versions

BentoML is a python library for packaging and deploying machine learning models. It provides high-level APIs for defining an ML service and packaging its artifacts, source code, dependencies, and configurations into a production-system-friendly format that is ready for deployment.

Feature Highlights

  • Multiple Distribution Format - Easily package your Machine Learning models into a format that works best with your inference scenario:

    • Docker Image - deploy as containers running REST API Server
    • PyPI Package - integrate into your python applications seamlessly
    • CLI tool - put your model into Airflow DAG or CI/CD pipeline
    • Spark UDF - run batch serving on a large dataset with Spark
    • Serverless Function - host your model on serverless platforms such as AWS Lambda
  • Multiple Framework Support - BentoML supports a wide range of ML frameworks out-of-the-box including Tensorflow, PyTorch, Scikit-Learn, xgboost and can be easily extended to work with new or custom frameworks.

  • Deploy Anywhere - BentoML bundled ML service can be easily deployed with platforms such as Docker, Kubernetes, Serverless, Airflow and Clipper, on cloud platforms including AWS, Gogole Cloud, and Azure.

  • Custom Runtime Backend - Easily integrate your python pre-processing code with high-performance deep learning runtime backend, such as tensorflow-serving.


python versions pypi status

pip install bentoml

Verify installation:

bentoml --version

Getting Started

Let's get started with a simple scikit-learn model as an example:

from sklearn import svm
from sklearn import datasets

clf = svm.SVC(gamma='scale')
iris = datasets.load_iris()
X, y =,, y)

To package this model with BentoML, you don't need to change anything in your training code. Simply create a new BentoService by subclassing it:

from bentoml import BentoService, api, env, artifacts
from bentoml.artifact import PickleArtifact
from bentoml.handlers import DataframeHandler

# You can also import your own python module here and BentoML will automatically
# figure out the dependency chain and package all those python modules

class IrisClassifier(BentoService):

    def predict(self, df):
        # arbitrary preprocessing or feature fetching code can be placed here 
        return self.artifacts.model.predict(df)

The @artifacts decorator here tells BentoML what artifacts are required when packaging this BentoService. Besides PickleArtifact, BentoML also provides TfKerasModelArtifact, PytorchModelArtifact, and TfSavedModelArtifact etc.

@env is designed for specifying the desired system environment in order for this BentoService to load. Other ways you can use this decorator:

  • If you already have a requirement.txt file listing all python libraries you need:
  • Or if you are running this code in a Conda environment that matches the desired production environment:

Lastly @api adds an entry point for accessing this BentoService. Each api will be translated into a REST endpoint when deploying as API server, or a CLI command when running as a CLI tool.

Each API also requires a Handler for defining the expected input format. In this case, DataframeHandler will transform either an HTTP request or CLI command arguments into a pandas Dataframe and pass it down to the user defined API function. BentoML also supports JsonHandler, ImageHandler and TensorHandler.

Next, to save your trained model for production use with this custom BentoService class:

# 1) import the custom BentoService defined above
from iris_classifier import IrisClassifier

# 2) `pack` it with required artifacts
svc = IrisClassifier.pack(model=clf)

# 3) save packed BentoService as archive'./bento_archive', version='v0.0.1')
# archive will saved to ./bento_archive/IrisClassifier/v0.0.1/

That's it. You've just created your first BentoArchive. It's a directory containing all the source code, data and configurations files required to load and run a BentoService. You will also find three 'magic' files generated within the archive directory:

  • bentoml.yml - a YAML file containing all metadata related to this BentoArchive
  • Dockerfile - for building a Docker Image exposing this BentoService as REST API endpoint
  • - the config file that makes a BentoArchive 'pip' installable

Deployment & Inference Scenarios

Serving via REST API

For exposing your model as a HTTP API endpoint, you can simply use the bentoml serve command:

bentoml serve ./bento_archive/IrisClassifier/v0.0.1/

Note you must ensure the pip and conda dependencies are available in your python environment when using bentoml serve command. More commonly we recommend using BentoML API server with Docker:

Run REST API server with Docker

You can build a Docker Image for running API server hosting your BentoML archive by using the archive folder as docker build context:

cd ./bento_archive/IrisClassifier/v0.0.1/

docker build -t iris-classifier .

Next, you can docker push the image to your choice of registry for deployment, or run it locally for development and testing:

docker run -p 5000:5000 iris-classifier

Loading BentoService in Python

bentoml.load is the enssential API for loading a BentoArchive into your python application:

import bentoml

# yes it works with BentoArchive saved to s3 ;)
bento_svc = bentoml.load('s3://my-bento-svc/iris_classifier/')

Use as PyPI Package

BentoML also supports distributing a BentoService as PyPI package, with the generated file. A BentoArchive can be installed with pip:

pip install ./bento_archive/IrisClassifier/v0.0.1/
import IrisClassifier

installed_svc = IrisClassifier.load()

With the config, a BentoArchive can also be uploaded to as a public python package, or to your organization's private PyPI index for all developers in your organization to use:

cd ./bento_archive/IrisClassifier/v0.0.1/

# You will need a ".pypirc" config file before doing this:
python sdist upload

Use as CLI tool

When pip install a BentoML archive, it also provides you with a CLI tool for accessing your BentoService's APIs from the command line:

pip install ./bento_archive/IrisClassifier/v0.0.1/

IrisClassifier info  # this will also print out all APIs available

IrisClassifier predict --input='./test.csv'

Alternatively, you can also use the bentoml cli to load and run a BentoArchive directly:

bentoml info ./bento_archive/IrisClassifier/v0.0.1/

bentoml predict ./bento_archive/IrisClassifier/v0.0.1/ --input='./test.csv'

More About BentoML

We build BentoML because we think there should be a much simpler way for machine learning teams to ship models for production. They should not wait for engineering teams to re-implement their models for production environment or build complex feature pipelines for experimental models.

Our vision is to empower Machine Learning scientists to build and ship their own models end-to-end as production services, just like software engineers do. BentoML is essentially this missing 'build tool' for Machine Learning projects.


All examples can be found in the BentoML/examples directory.

Releases and Contributing

BentoML is under active development and is evolving rapidly. Currently it is a Beta release, we may change APIs in future releases.

Want to help build BentoML? Check out our contributing documentation.

To make sure you have a pleasant experience, please read the code of conduct. It outlines core values and beliefs and will make working together a happier experience.


BentoML is GPL-3.0 licensed, as found in the COPYING file.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for BentoML, version 0.1.2
Filename, size File type Python version Upload date Hashes
Filename, size BentoML-0.1.2-py3-none-any.whl (81.7 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size BentoML-0.1.2.tar.gz (34.2 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page