Benchmark performance of **any Foundation Model (FM)** deployed on **any AWS Generative AI service**, be it **Amazon SageMaker**, **Amazon Bedrock**, **Amazon EKS**, or **Amazon EC2**. The FMs could be deployed on these platforms either directly through `FMbench`, or, if they are already deployed then also they could be benchmarked through the **Bring your own endpoint** mode supported by `FMBench`.
Project description
FMBench
Benchmark any Foundation Model (FM) on any AWS Generative AI service [Amazon SageMaker, Amazon Bedrock, Amazon EKS, Amazon EC2, or Bring your own endpoint.]
Amazon Bedrock | Amazon SageMaker | Amazon EKS | Amazon EC2
FMBench
is a Python package for running performance benchmarks for any Foundation Model (FM) deployed on any AWS Generative AI service, be it Amazon SageMaker, Amazon Bedrock, Amazon EKS, or Amazon EC2. The FMs could be deployed on these platforms either directly through FMbench
, or, if they are already deployed then also they could be benchmarked through the Bring your own endpoint mode supported by FMBench
.
Here are some salient features of FMBench
:
-
Highly flexible: in that it allows for using any combinations of instance types (
g5
,p4d
,p5
,Inf2
), inference containers (DeepSpeed
,TensorRT
,HuggingFace TGI
and others) and parameters such as tensor parallelism, rolling batch etc. as long as those are supported by the underlying platform. -
Benchmark any model: it can be used to be benchmark open-source models, third party models, and proprietary models trained by enterprises on their own data.
-
Run anywhere: it can be run on any AWS platform where we can run Python, such as Amazon EC2, Amazon SageMaker, or even the AWS CloudShell. It is important to run this tool on an AWS platform so that internet round trip time does not get included in the end-to-end response time latency.
Use FMBench
to benchmark an LLM on any AWS generative AI service for price and performance (inference latency, transactions/minute). Here is one of the plots generated by FMBench
to help answer the price performance question for the Llama2-13b
model when hosted on Amazon SageMaker (the instance types in the legend have been blurred out on purpose, you can find them in the actual plot generated on running FMBench
).
Models benchmarked
Configuration files are available in the configs folder for the following models in this repo.
Llama3 on Amazon SageMaker
Llama3 is now available on SageMaker (read blog post), and you can now benchmark it using FMBench
. Here are the config files for benchmarking Llama3-8b-instruct
and Llama3-70b-instruct
on ml.p4d.24xlarge
, ml.inf2.24xlarge
and ml.g5.12xlarge
instances.
- Config file for
Llama3-8b-instruct
onml.p4d.24xlarge
andml.g5.12xlarge
. - Config file for
Llama3-70b-instruct
onml.p4d.24xlarge
andml.g5.48xlarge
. - Config file for
Llama3-8b-instruct
onml.inf2.24xlarge
andml.g5.12xlarge
.
Full list of benchmarked models
Model | EC2 g5 | EC2 Inf2/Trn1 | SageMaker g4dn/g5/p3 | SageMaker Inf2 | SageMaker P4 | SageMaker P5 | Bedrock On-demand throughput | Bedrock provisioned throughput |
---|---|---|---|---|---|---|---|---|
Anthropic Claude-3 Sonnet | ✅ | ✅ | ||||||
Anthropic Claude-3 Haiku | ✅ | |||||||
Mistral-7b-instruct | ✅ | ✅ | ✅ | ✅ | ||||
Mistral-7b-AWQ | ✅ | |||||||
Mixtral-8x7b-instruct | ✅ | |||||||
Llama3.1-8b instruct | ✅ | |||||||
Llama3.1-70b instruct | ✅ | |||||||
Llama3-8b instruct | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
Llama3-70b instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |||
Llama2-13b chat | ✅ | ✅ | ✅ | ✅ | ||||
Llama2-70b chat | ✅ | ✅ | ✅ | ✅ | ||||
Amazon Titan text lite | ✅ | |||||||
Amazon Titan text express | ✅ | |||||||
Cohere Command text | ✅ | |||||||
Cohere Command light text | ✅ | |||||||
AI21 J2 Mid | ✅ | |||||||
AI21 J2 Ultra | ✅ | |||||||
Gemma-2b | ✅ | |||||||
Phi-3-mini-4k-instruct | ✅ | |||||||
distilbert-base-uncased | ✅ |
New in this release
v1.0.51
FMBench
has a website now. Rework the README file to make it lightweight.Llama3.1
config files for Bedrock.
v1.0.50
Llama3-8b
on Amazon EC2inf2.48xlarge
config file.- Update to new version of DJL LMI (0.28.0).
v1.0.49
- Streaming support for Amazon SageMaker and Amazon Bedrock.
- Per-token latency metrics such as time to first token (TTFT) and mean time per-output token (TPOT).
- Misc. bug fixes.
Getting started
FMBench
is available as a Python package on PyPi and is run as a command line tool once it is installed. All data that includes metrics, reports and results are stored in an Amazon S3 bucket.
[!IMPORTANT] 💡 All documentation for
FMBench
is available on theFMBench
website
You can run FMBench
on either a SageMaker notebook or on an EC2 VM. Both options are described here as part of the documentation. You can even run FMBench
as a Docker container A Quickstart guide for SageMaker is bring provided below as well.
👉 The following sections are discussing running FMBench
the tool, as different from where the FM is actually deployed. For example, we could run FMBench
on EC2 but the model being deployed is on SageMaker or even Bedrock.
Quickstart
FMBench on a SageMaker Notebook
-
Each
FMBench
run works with a configuration file that contains the information about the model, the deployment steps, and the tests to run. A typicalFMBench
workflow involves either directly using an already provided config file from theconfigs
folder in theFMBench
GitHub repo or editing an already provided config file as per your own requirements (say you want to try benchmarking on a different instance type, or a different inference container etc.).👉 A simple config file with key parameters annotated is included in this repo, see
config-llama2-7b-g5-quick.yml
. This file benchmarks performance of Llama2-7b on anml.g5.xlarge
instance and anml.g5.2xlarge
instance. You can use this config file as it is for this Quickstart. -
Launch the AWS CloudFormation template included in this repository using one of the buttons from the table below. The CloudFormation template creates the following resources within your AWS account: Amazon S3 buckets, Amazon IAM role and an Amazon SageMaker Notebook with this repository cloned. A read S3 bucket is created which contains all the files (configuration files, datasets) required to run
FMBench
and a write S3 bucket is created which will hold the metrics and reports generated byFMBench
. The CloudFormation stack takes about 5-minutes to create.AWS Region Link us-east-1 (N. Virginia) us-west-2 (Oregon) us-gov-west-1 (GovCloud West) -
Once the CloudFormation stack is created, navigate to SageMaker Notebooks and open the
fmbench-notebook
. -
On the
fmbench-notebook
open a Terminal and run the following commands.conda create --name fmbench_python311 -y python=3.11 ipykernel source activate fmbench_python311; pip install -U fmbench
-
Now you are ready to
fmbench
with the following command line. We will use a sample config file placed in the S3 bucket by the CloudFormation stack for a quick first run.-
We benchmark performance for the
Llama2-7b
model on aml.g5.xlarge
and aml.g5.2xlarge
instance type, using thehuggingface-pytorch-tgi-inference
inference container. This test would take about 30 minutes to complete and cost about $0.20. -
It uses a simple relationship of 750 words equals 1000 tokens, to get a more accurate representation of token counts use the
Llama2 tokenizer
(instructions are provided in the next section). It is strongly recommended that for more accurate results on token throughput you use a tokenizer specific to the model you are testing rather than the default tokenizer. See instructions provided later in this document on how to use a custom tokenizer.account=`aws sts get-caller-identity | jq .Account | tr -d '"'` region=`aws configure get region` fmbench --config-file s3://sagemaker-fmbench-read-${region}-${account}/configs/llama2/7b/config-llama2-7b-g5-quick.yml > fmbench.log 2>&1
-
Open another terminal window and do a
tail -f
on thefmbench.log
file to see all the traces being generated at runtime.tail -f fmbench.log
-
👉 For streaming support on SageMaker and Bedrock checkout these config files:
-
-
The generated reports and metrics are available in the
sagemaker-fmbench-write-<replace_w_your_aws_region>-<replace_w_your_aws_account_id>
bucket. The metrics and report files are also downloaded locally and in theresults
directory (created byFMBench
) and the benchmarking report is available as a markdown file calledreport.md
in theresults
directory. You can view the rendered Markdown report in the SageMaker notebook itself or download the metrics and report files to your machine for offline analysis.
If you would like to understand what is being done under the hood by the CloudFormation template, see the DIY version with gory details
FMBench
on SageMaker in GovCloud
No special steps are required for running FMBench
on GovCloud. The CloudFormation link for us-gov-west-1
has been provided in the section above.
- Not all models available via Bedrock or other services may be available in GovCloud. The following commands show how to run
FMBench
to benchmark the Amazon Titan Text Express model in the GovCloud. See the Amazon Bedrock GovCloud page for more details.
account=`aws sts get-caller-identity | jq .Account | tr -d '"'`
region=`aws configure get region`
fmbench --config-file s3://sagemaker-fmbench-read-${region}-${account}/configs/bedrock/config-bedrock-titan-text-express.yml > fmbench.log 2>&1
Results
Depending upon the experiments in the config file, the FMBench
run may take a few minutes to several hours. Once the run completes, you can find the report and metrics in the local results-*
folder in the directory from where FMBench
was run. The rpeort and metrics are also written to the write S3 bucket set in the config file.
Here is a screenshot of the report.md
file generated by FMBench
.
Benchmark models deployed on different AWS Generative AI services (Docs)
FMBench
comes packaged with configuration files for benchmarking models on different AWS Generative AI services i.e. Bedrock, SageMaker, EKS and EC2 or bring your own endpoint even.
Enhancements
View the ISSUES on GitHub and add any you might think be an beneficial iteration to this benchmarking harness.
Security
See CONTRIBUTING for more information.
License
This library is licensed under the MIT-0 License. See the LICENSE file.
Star History
Support
- Schedule Demo 👋 - send us an email 🙂
- Community Discord 💭
- Our emails ✉️ aroraai@amazon.com / madhurpt@amazon.com
Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.