Skip to main content

Bentoml: The Unified Model Serving Framework

Project description


The Unified Model Serving Framework Tweet

pypi_status downloads actions_status documentation_status join_slack

BentoML simplifies ML model deployment and serves your models at production scale.

๐Ÿ‘‰ Join us in our Slack community where hundreds of ML practitioners are contributing to the project, helping other users, and discuss all things MLOps.

Why BentoML?

๐Ÿฑ Easily go from training to model serving in production

  • Support multiple ML frameworks natively: Tensorflow, PyTorch, XGBoost, Scikit-Learn and many more!
  • Define custom serving pipeline with pre-processing, post-processing and ensemble models
  • Standard .bento format for packaging code, models and dependencies for easy versioning and deployment
  • Integrate with any training pipeline or ML experimentation platform

โœจ Model Serving the way you need it

  • Online serving via REST API or gRPC
  • Offline scoring on batch datasets with Apache Spark, or Dask
  • Stream serving with Kafka, Beam, and Flink

๐Ÿšข Deployment workflow made for production

๐Ÿ Python-first, scales with powerful optimizations

  • Parallelize compute-intense model inference workloads to scale separately from the serving logic
  • Adaptive batching dynamically groups inference requests for optimal performance
  • Orchestrate distributed inference graph with multiple models via Yatai on Kubernetes
  • Easily configure CUDA dependencies for running inference with GPU

Getting Started

  • Documentation - Overview of the BentoML docs and related resources
  • Tutorial: Intro to BentoML - Learn by doing! In under 10 minutes, you'll serve a model via REST API and generate a docker image for deployment.
  • Main Concepts - A step-by-step tour for learning main concepts in BentoML
  • Examples - Gallery of sample projects using BentoML
  • ML Framework Sepecific Guides - Best practices and example usages by the ML framework of your choice
  • Advanced Guides - Learn about BentoML's internals, architecture and advanced features

Installation

pip install bentoml

Quick Tour

Step 1: At the end of your model training pipeline, save your trained model instance with BentoML:

import bentoml

model = train(...)

saved_model = bentoml.pytorch.save_model("fraud_detect", model)
print(f"Model saved: {saved_model}")

# Model saved: Model(tag="fraud_detect:3qee3zd7lc4avuqj", path="~/bentoml/models/fraud_detect/3qee3zd7lc4avuqj/")

BentoML saves the model artifact files in a local model store, along with necessary metadata. A new version tag is automatically generated for the model.

Optionally, you may provide the signatures of your model for running inference with dynamic batching enabled, and attach labels, metadata, or custom_objects to be saved together with your model, e.g.:

bentoml.pytorch.save_model(
    "demo_mnist",  # model name in the local model store
    trained_model,  # model instance being saved
    signatures={   # model signatures for runner inference
      "predict": {
        "batchable": True,
        "batch_dim": 0,
      }
    },
    metadata={   # user-defined additional metadata
        "acc": acc,
        "cv_stats": cv_stats,
    },
)

Step 2: Create a prediction service with the saved model:

Create a service.py file with:

import numpy as np
import bentoml
from bentoml.io import NumpyNdarray, Image
from PIL.Image import Image as PILImage

mnist_runner = bentoml.pytorch.get("demo_mnist:latest").to_runner()

svc = bentoml.Service("pytorch_mnist", runners=[mnist_runner])

@svc.api(input=Image(), output=NumpyNdarray(dtype="int64"))
def predict(input_img: PILImage):
    img_arr = np.array(input_img)/255.0
    input_arr = np.expand_dims(img_arr, 0).astype("float32")
    output_tensor = mnist_runner.predict.run(input_arr)
    return output_tensor.numpy()

Start an HTTP server locally:

bentoml serve service.py:svc

And sent a test request to it:

curl -F 'image=@samples/1.png' http://127.0.0.1:3000/predict_image

You can also open http://127.0.0.1:3000 in a browser and debug the endpoint by sending requests directly from the web UI.

Note that saved model is converted into a Runner, which in BentoML, represents a unit of computation that can be scaled separately. In local deployment mode, this means the model will be running in its own worker processes. Since the model is saved with a batchable: True signature, BentoML applies dynamic batching to all the mnist_runner.predict.run calls under the hood for optimal performance.

Step 3: Build a Bento for deployment:

Define a bentofile.yaml build file for your ML project:

service: "service:svc"  # where the bentoml.Service instance is defined
include:
- "*.py"
exclude:
- "tests/"
python:
  packages:
    - numpy
    - torch
    - Pillow
docker:
  distro: debian
  cuda_version: 11.6.2

Build a Bento using the bentofile.yaml specification from current directory:

$ bentoml build

Building BentoML service "pytorch_mnist:4mymorgurocxjuqj" from build context "~/workspace/gallery/pytorch_mnist"
Packing model "demo_mnist:7drxqvwsu6zq5uqj" from "~/bentoml/models/demo_mnist/7drxqvwsu6zq5uqj"
Locking PyPI package versions..

โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ•—โ–‘โ–‘โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ–ˆโ•—โ–‘โ–‘โ–‘โ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•—โ–‘โ–‘โ–‘โ–‘โ–‘
โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ•‘โ•šโ•โ•โ–ˆโ–ˆโ•”โ•โ•โ•โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–‘โ–‘
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•ฆโ•โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–‘โ–‘โ–ˆโ–ˆโ•”โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ–ˆโ–ˆโ–ˆโ–ˆโ•”โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–‘โ–‘
โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ•โ–‘โ–‘โ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–‘โ–‘
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•ฆโ•โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘โ–‘โ•šโ–ˆโ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ•‘โ–‘โ–‘โ–‘โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘โ–‘โ•šโ•โ•โ–‘โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—
โ•šโ•โ•โ•โ•โ•โ•โ–‘โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•โ–‘โ–‘โ•šโ•โ•โ•โ–‘โ–‘โ–‘โ•šโ•โ•โ–‘โ–‘โ–‘โ–‘โ•šโ•โ•โ•โ•โ•โ–‘โ•šโ•โ•โ–‘โ–‘โ–‘โ–‘โ–‘โ•šโ•โ•โ•šโ•โ•โ•โ•โ•โ•โ•

Successfully built Bento(tag="pytorch_mnist:4mymorgurocxjuqj") at "~/bentoml/bentos/pytorch_mnist/4mymorgurocxjuqj/"

The Bento with tag="pytorch_mnist:4mymorgurocxjuqj" is now created in the local Bento store. It is an archive containing all the source code, model files, and dependency specs - anything that is required for reproducing the model in an identical environment for serving in production.

Step 4: Deploying the Bento

Generate a docker image from the Bento and run a docker container locally for serving:

$ bentoml containerize pytorch_mnist:4mymorgurocxjuqj

Successfully built docker image "pytorch_mnist:4mymorgurocxjuqj"

$ docker run --gpus all -p 3000:3000 pytorch_mnist:4mymorgurocxjuqj

Learn more about other deployment options here.

Community

  • For general questions and support, join the community slack.
  • To receive release notification, star & watch the BentoML project on GitHub.
  • To report a bug or suggest a feature request, use GitHub Issues.
  • For long-form discussions, use Github Discussions.
  • To stay informed with community updates, follow the BentoML Blog and @bentomlai on Twitter.

Contributing

There are many ways to contribute to the project:

  • If you have any feedback on the project, share it in Github Discussions or the #bentoml-contributors channel in the community slack.
  • Report issues you're facing and "Thumbs up" on issues and feature requests that are relevant to you.
  • Investigate bugs and reviewing other developer's pull requests.
  • Contributing code or documentation to the project by submitting a Github pull request. Check out the Development Guide.
  • Learn more in the contributing guide.

Contributors!

Thanks to all of our amazing contributors!


Usage Reporting

BentoML collects usage data that helps our team to improve the product. Only BentoML's internal API calls are being reported. We strip out as much potentially sensitive information as possible, and we will never collect user code, model data, model names, or stack traces. Here's the code for usage tracking. You can opt-out of usage tracking by the --do-not-track CLI option:

bentoml [command] --do-not-track

Or by setting environment variable BENTOML_DO_NOT_TRACK=True:

export BENTOML_DO_NOT_TRACK=True

License

Apache License 2.0

FOSSA Status

Project details


Release history Release notifications | RSS feed

This version

1.0.3

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bentoml-1.0.3.tar.gz (697.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bentoml-1.0.3-py3-none-any.whl (779.6 kB view details)

Uploaded Python 3

File details

Details for the file bentoml-1.0.3.tar.gz.

File metadata

  • Download URL: bentoml-1.0.3.tar.gz
  • Upload date:
  • Size: 697.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for bentoml-1.0.3.tar.gz
Algorithm Hash digest
SHA256 95eeef7eaed3131a0d82bfc05ae626e6db964866c1bb9d2896a0f44303ea9fa5
MD5 fd8fb343e3467984c576e8a33dc7606c
BLAKE2b-256 e79f4a2c11d3ad04a6577ebe17af0d994edc93ec333fe65a638375a52ef09502

See more details on using hashes here.

File details

Details for the file bentoml-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: bentoml-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 779.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for bentoml-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ff501a1d14f5b974bf513df5af95218eb8ab72c98263badd82c43646c61cc102
MD5 7476becadba37484baf3af0590aeea85
BLAKE2b-256 4818cc109a2f2c1329f53805f6334e461cca6921bc22eb02fec1888cd711a39e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page