Skip to main content

BentoML: The Unified Model Serving Framework

Project description


The Unified Model Serving Framework Tweet

pypi_status downloads actions_status documentation_status join_slack

BentoML makes it easy to create Machine Learning services that are ready to deploy and scale.

👉 Join our Slack community today!

✨ Looking deploy your ML service quickly? Checkout BentoML Cloud for the easiest and fastest way to deploy your bento.

Getting Started


Highlights

🍭 Unified Model Serving API

  • Framework-agnostic model packaging for Tensorflow, PyTorch, XGBoost, Scikit-Learn, ONNX, and many more!
  • Write custom Python code alongside model inference for pre/post-processing and business logic
  • Apply the same code for online(REST API or gRPC), offline batch, and streaming inference
  • Simple abstractions for building multi-model inference pipelines or graphs

🚂 Standardized process for a frictionless transition to production

  • Build Bento as the standard deployable artifact for ML services
  • Automatically generate docker images with the desired dependencies
  • Easy CUDA setup for inference with GPU
  • Rich integration with the MLOps ecosystem, including Kubeflow, Airflow, MLFlow, Triton

🏹 Scalable with powerful performance optimizations

  • Adaptive batching dynamically groups inference requests on server-side optimal performance
  • Runner abstraction scales model inference separately from your custom code
  • Maximize your GPU and multi-core CPU utilization with automatic provisioning

🎯 Deploy anywhere in a DevOps-friendly way

  • Streamline production deployment workflow via:
    • ☁️ BentoML Cloud: the fastest way to deploy your bento, simple and at scale
    • 🦄️ Yatai: Model Deployment at scale on Kubernetes
    • 🚀 bentoctl: Fast model deployment on AWS SageMaker, Lambda, ECE, GCP, Azure, Heroku, and more!
  • Run offline batch inference jobs with Spark or Dask
  • Built-in support for Prometheus metrics and OpenTelemetry
  • Flexible APIs for advanced CI/CD workflows

How it works

Save your trained model with BentoML:

import bentoml

saved_model = bentoml.pytorch.save_model(
    "demo_mnist", # model name in the local model store
    model, # model instance being saved
)

print(f"Model saved: {saved_model}")
# Model saved: Model(tag="demo_mnist:3qee3zd7lc4avuqj", path="~/bentoml/models/demo_mnist/3qee3zd7lc4avuqj/")

Define a prediction service in a service.py file:

import numpy as np
import bentoml
from bentoml.io import NumpyNdarray, Image
from PIL.Image import Image as PILImage

mnist_runner = bentoml.pytorch.get("demo_mnist:latest").to_runner()

svc = bentoml.Service("pytorch_mnist", runners=[mnist_runner])

@svc.api(input=Image(), output=NumpyNdarray(dtype="int64"))
def predict(input_img: PILImage):
    img_arr = np.array(input_img)/255.0
    input_arr = np.expand_dims(img_arr, 0).astype("float32")
    output_tensor = mnist_runner.predict.run(input_arr)
    return output_tensor.numpy()

Create a bentofile.yaml build file for your ML service:

service: "service:svc"
include:
  - "*.py"
python:
  packages:
    - numpy
    - torch
    - Pillow

Now, run the prediction service:

bentoml serve

Sent a prediction request:

curl -F 'image=@samples/1.png' http://127.0.0.1:3000/predict_image

Build a Bento and generate a docker image:

$ bentoml build
Successfully built Bento(tag="pytorch_mnist:4mymorgurocxjuqj") at "~/bentoml/bentos/pytorch_mnist/4mymorgurocxjuqj/"

$ bentoml containerize pytorch_mnist:4mymorgurocxjuqj
Successfully built docker image "pytorch_mnist:4mymorgurocxjuqj"

$ docker run -p 3000:3000 pytorch_mnist:4mymorgurocxjuqj
Starting production BentoServer from "pytorch_mnist:4mymorgurocxjuqj" running on http://0.0.0.0:3000

For a more detailed user guide, check out the BentoML Tutorial.


Community

  • For general questions and support, join the community slack.
  • To receive release notification, star & watch the BentoML project on GitHub.
  • To report a bug or suggest a feature request, use GitHub Issues.
  • To stay informed with community updates, follow the BentoML Blog and @bentomlai on Twitter.

Contributing

There are many ways to contribute to the project:

  • If you have any feedback on the project, share it under the #bentoml-contributors channel in the community slack.
  • Report issues you're facing and "Thumbs up" on issues and feature requests that are relevant to you.
  • Investigate bugs and reviewing other developer's pull requests.
  • Contributing code or documentation to the project by submitting a GitHub pull request. Check out the Development Guide.
  • Learn more in the contributing guide.

Contributors

Thanks to all of our amazing contributors!


Usage Reporting

BentoML collects usage data that helps our team to improve the product. Only BentoML's internal API calls are being reported. We strip out as much potentially sensitive information as possible, and we will never collect user code, model data, model names, or stack traces. Here's the code for usage tracking. You can opt-out of usage tracking by the --do-not-track CLI option:

bentoml [command] --do-not-track

Or by setting environment variable BENTOML_DO_NOT_TRACK=True:

export BENTOML_DO_NOT_TRACK=True

License

Apache License 2.0

FOSSA Status

Project details


Release history Release notifications | RSS feed

This version

1.0.8

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bentoml-1.0.8.tar.gz (14.4 MB view details)

Uploaded Source

Built Distribution

bentoml-1.0.8-py3-none-any.whl (879.7 kB view details)

Uploaded Python 3

File details

Details for the file bentoml-1.0.8.tar.gz.

File metadata

  • Download URL: bentoml-1.0.8.tar.gz
  • Upload date:
  • Size: 14.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for bentoml-1.0.8.tar.gz
Algorithm Hash digest
SHA256 13a7b21cbfbc8ad6ca25e8464952ca266e3da9ac591652b3aff1e5ca65aef704
MD5 e7c3dd8cf7165c7d27af6d9d481d923c
BLAKE2b-256 03ddfcdee2ea476e419e1baa92d9614d8de2d843cd53fc82c4ccb1b5a86d7b46

See more details on using hashes here.

File details

Details for the file bentoml-1.0.8-py3-none-any.whl.

File metadata

  • Download URL: bentoml-1.0.8-py3-none-any.whl
  • Upload date:
  • Size: 879.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for bentoml-1.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 6edf424f4c3577ef6d35af4f1e4d78f4ed3ea3acb568604eb59c278a99884598
MD5 53e7af64b960b039aec54b485489d4ad
BLAKE2b-256 94459d589a30cdb4c26aa3e8a998bf5913bed302a4932818431f807fc3123564

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page