Skip to main content

BentoML: The easiest way to serve AI apps and models

Project description

BentoML: Unified Model Serving Framework

Unified Model Serving Framework

🍱 Build model inference APIs and multi-model serving systems with any open-source or custom AI models. 👉 Join our forum!

License: Apache-2.0 Releases CI Twitter

What is BentoML?

BentoML is a Python library for building online serving systems optimized for AI apps and model inference.

  • 🍱 Easily build APIs for Any AI/ML Model. Turn any model inference script into a REST API server with just a few lines of code and standard Python type hints.
  • 🐳 Docker Containers made simple. No more dependency hell! Manage your environments, dependencies and model versions with a simple config file. BentoML automatically generates Docker images, ensures reproducibility, and simplifies how you deploy to different environments.
  • 🧭 Maximize CPU/GPU utilization. Build high performance inference APIs leveraging built-in serving optimization features like dynamic batching, model parallelism, multi-stage pipeline and multi-model inference-graph orchestration.
  • 👩‍💻 Fully customizable. Easily implement your own APIs or task queues, with custom business logic, model inference and multi-model composition. Supports any ML framework, modality, and inference runtime.
  • 🚀 Ready for Production. Develop, run and debug locally. Seamlessly deploy to production with Docker containers or BentoCloud.

Getting started

Install BentoML:

# Requires Python≥3.9
pip install -U bentoml

Define APIs in a service.py file.

import bentoml

@bentoml.service(
    image=bentoml.images.Image(python_version="3.11").python_packages("torch", "transformers"),
)
class Summarization:
    def __init__(self) -> None:
        import torch
        from transformers import pipeline

        device = "cuda" if torch.cuda.is_available() else "cpu"
        self.pipeline = pipeline('summarization', device=device)

    @bentoml.api(batchable=True)
    def summarize(self, texts: list[str]) -> list[str]:
        results = self.pipeline(texts)
        return [item['summary_text'] for item in results]

💻 Run locally

Install PyTorch and Transformers packages to your Python virtual environment.

pip install torch transformers  # additional dependencies for local run

Run the service code locally (serving at http://localhost:3000 by default):

bentoml serve

You should expect to see the following output.

[INFO] [cli] Starting production HTTP BentoServer from "service:Summarization" listening on http://localhost:3000 (Press CTRL+C to quit)
[INFO] [entry_service:Summarization:1] Service Summarization initialized

Now you can run inference from your browser at http://localhost:3000 or with a Python script:

import bentoml

with bentoml.SyncHTTPClient('http://localhost:3000') as client:
    summarized_text: str = client.summarize([bentoml.__doc__])[0]
    print(f"Result: {summarized_text}")

🐳 Deploy using Docker

Run bentoml build to package necessary code, models, dependency configs into a Bento - the standardized deployable artifact in BentoML:

bentoml build

Ensure Docker is running. Generate a Docker container image for deployment:

bentoml containerize summarization:latest

Run the generated image:

docker run --rm -p 3000:3000 summarization:latest

☁️ Deploy on BentoCloud

BentoCloud provides compute infrastructure for rapid and reliable GenAI adoption. It helps speed up your BentoML development process leveraging cloud compute resources, and simplify how you deploy, scale and operate BentoML in production.

Sign up for BentoCloud for personal access; for enterprise use cases, contact our team.

# After signup, run the following command to create an API token:
bentoml cloud login

# Deploy from current directory:
bentoml deploy

bentocloud-ui

For detailed explanations, read the Hello World example.

Examples

Check out the full list for more sample code and usage.

Advanced topics

See Documentation for more tutorials and guides.

Community

Get involved and join our Community Forum 💬, where thousands of AI/ML engineers help each other, contribute to the project, and talk about building AI products.

To report a bug or suggest a feature request, use GitHub Issues.

Contributing

There are many ways to contribute to the project:

Thanks to all of our amazing contributors!

Usage tracking and feedback

The BentoML framework collects anonymous usage data that helps our community improve the product. Only BentoML's internal API calls are being reported. This excludes any sensitive information, such as user code, model data, model names, or stack traces. Here's the code used for usage tracking. You can opt-out of usage tracking by the --do-not-track CLI option:

bentoml [command] --do-not-track

Or by setting the environment variable:

export BENTOML_DO_NOT_TRACK=True

License

Apache License 2.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bentoml-1.4.37.tar.gz (987.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bentoml-1.4.37-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file bentoml-1.4.37.tar.gz.

File metadata

  • Download URL: bentoml-1.4.37.tar.gz
  • Upload date:
  • Size: 987.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bentoml-1.4.37.tar.gz
Algorithm Hash digest
SHA256 179fb9aa66d9ce51093fc6ef5eaeba082f904b72804b512da8f3f8b9ced96223
MD5 263d3f57dd58496facd359e3824e6a45
BLAKE2b-256 5077e15a9f48a07339b6b47e7bfc0f7e9a7f6fb19091502ff460fba70e5504fe

See more details on using hashes here.

Provenance

The following attestation bundles were made for bentoml-1.4.37.tar.gz:

Publisher: release.yml on bentoml/BentoML

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bentoml-1.4.37-py3-none-any.whl.

File metadata

  • Download URL: bentoml-1.4.37-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bentoml-1.4.37-py3-none-any.whl
Algorithm Hash digest
SHA256 34232cc01fe37dede70ddd3987d6ac537a1a53c308da2e111529a36686203e15
MD5 cce6f3714fd41d84b3bf8efebf521d12
BLAKE2b-256 a46998bddd4b228330f15f6e611dad1566aad7387efb2c9912290a7d90cd7f75

See more details on using hashes here.

Provenance

The following attestation bundles were made for bentoml-1.4.37-py3-none-any.whl:

Publisher: release.yml on bentoml/BentoML

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page