Skip to main content

BentoML: Build Production-Grade AI Applications

Project description

BentoML: Unified Model Serving Framework

Unified Model Serving Framework

🍱 Build model inference APIs and multi-model serving systems with any open-source or custom AI models. 👉 Join our Slack community!

License: Apache-2.0 Releases CI Twitter Community

What is BentoML?

BentoML is an open-source model serving framework, simplifying how AI/ML models gets into production:

  • 🍱 Easily build APIs for Any AI/ML Model. Turn any model inference script into a REST API server with just a few lines of code and standard Python type hints.
  • 🐳 Docker Containers made simple. No more dependency hell! Manage your environments, dependencies and models with a simple config file. BentoML automatically generates Docker images, ensures reproducibility, and simplifies how you run inference across different environments.
  • 🧭 Maximize CPU/GPU utilization. Improve your API throughput and latency performance leveraging built-in serving optimization features like dynamic batching, model parallelism, multi-stage pipeline and multi-model inference-graph orchestration.
  • 👩‍💻 Build Custom AI Applications. BentoML is highly flexible for advanced customizations. Easily implement your own API specifications, asynchronous inference tasks; customize pre/post-processing, model inference logic; and define model composition; all using Python code. Supports any ML framework, modality, and inference runtime.
  • 🚀 Build for Production. Develop, run and debug locally. Seamlessly deploy to production with Docker containers or BentoCloud.

Getting started

Install BentoML:

# Requires Python≥3.8
pip install bentoml torch transformers

Define APIs in a service.py file.

import bentoml
from transformers import pipeline
from typing import List

@bentoml.service
class Summarization:
    def __init__(self):
        self.pipeline = pipeline('summarization')

    @bentoml.api(batchable=True)
    def summarize(self, texts: List[str]) -> List[str]:
        results = self.pipeline(texts)
        return list(map(lambda res: res['summary_text'], results))

Run the service code locally (serving at http://localhost:3000 by default):

bentoml serve service.py:Summarization

Now you can run inference from your browser at http://localhost:3000 or with a Python script:

import bentoml

with bentoml.SyncHTTPClient('http://localhost:3000') as client:
    text_to_summarize: str = input("Enter text to summarize: ")
    summarized_text: str = client.summarize([text_to_summarize])[0]
    print(f"Summarized text: {summarized_text}")

Deploying your first Bento

To deploy your BentoML Service code, first create a bentofile.yaml file to define its dependencies and environments. Find the full list of bentofile options here.

service: "service:Summarization" # Entry service import path
include:
  - "*.py" # Include all .py files in current directory
python:
  packages: # Python dependencies to include
  - torch
  - transformers

Then, choose one of the following ways for deployment:

🐳 Docker Container

Run bentoml build to package necessary code, models, dependency configs into a Bento - the standardized deployable artifact in BentoML:

bentoml build

Ensure Docker is running. Generate a Docker container image for deployment:

bentoml containerize summarization:latest

Run the generated image:

docker run --rm -p 3000:3000 summarization:latest
☁️ BentoCloud

BentoCloud is the AI inference platform for fast moving AI teams. It lets you easily deploy your BentoML code in a fast-scaling infrastructure. Sign up for BentoCloud for personal access; for enterprise use cases, contact our team.

# After signup, follow login instructions upon API token creation:
bentoml cloud login --api-token <your-api-token>

# Deploy from current directory:
bentoml deploy .

bentocloud-ui

For detailed explanations, read Quickstart.

Use cases

Check out the examples folder for more sample code and usage.

Advanced topics

See Documentation for more tutorials and guides.

Community

Get involved and join our Community Slack 💬, where thousands of AI/ML engineers help each other, contribute to the project, and talk about building AI products.

To report a bug or suggest a feature request, use GitHub Issues.

Contributing

There are many ways to contribute to the project:

Thanks to all of our amazing contributors!

Usage tracking and feedback

The BentoML framework collects anonymous usage data that helps our community improve the product. Only BentoML's internal API calls are being reported. This excludes any sensitive information, such as user code, model data, model names, or stack traces. Here's the code used for usage tracking. You can opt-out of usage tracking by the --do-not-track CLI option:

bentoml [command] --do-not-track

Or by setting the environment variable:

export BENTOML_DO_NOT_TRACK=True

License

Apache License 2.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bentoml-1.2.17.tar.gz (931.9 kB view details)

Uploaded Source

Built Distribution

bentoml-1.2.17-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file bentoml-1.2.17.tar.gz.

File metadata

  • Download URL: bentoml-1.2.17.tar.gz
  • Upload date:
  • Size: 931.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for bentoml-1.2.17.tar.gz
Algorithm Hash digest
SHA256 72f57208b2fcbaa43bf7e5e93624403db0b81a507062a6e4a092e7dbbffe10ac
MD5 32afaaaf6e48580c2f363dee618133b6
BLAKE2b-256 0a8f79e7b988532e74a9d2eff1770510c7f6e2078b574d0d5e1f673fcea2f03f

See more details on using hashes here.

File details

Details for the file bentoml-1.2.17-py3-none-any.whl.

File metadata

  • Download URL: bentoml-1.2.17-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for bentoml-1.2.17-py3-none-any.whl
Algorithm Hash digest
SHA256 aa24b42ba3d0fb2a4b6612413c8a0dae2c4ab7251484e1adc6c4e1a23ab068a1
MD5 f23aab4163f490c9da2808e550146341
BLAKE2b-256 7d0b618f8404164e134f2d66c846da684f38201a74bfaa0978c3c1c8180381b2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page