Skip to main content

A LLM inference cluster simulator

Project description

Vidur: LLM Inference Simulator

Vidur is a high-fidelity and extensible LLM inference simulator. It can help you with:

  1. Capacity planning and finding the best deployment configuration for your LLM deployments.
  2. Test new research ideas like new scheduling algorithms, optimizations like speculative decoding, etc.
  3. Study the system performance of models under different workloads and configurations.

... all without access to GPUs except for a quick initial profiling phase.

Please refer to our MLSys'24 paper for more details. We have a talk with live demo that captures the capabilities of the system.

Supported Models

Model / Device A100 80GB DGX H100 DGX 4xA100 80GB Pairwise NVLink Node 8xA40 Pairwise NVLink Node
meta-llama/Llama-3-8B
meta-llama/Llama-3-70B
  • Instructions on adding a new model to existing or new SKUs can be found here.
  • All models support a maximum context length of 2M.
  • Pipeline parallelism is supported for all models. The PP dimension should divide the number of layers in the model.
  • In DGX nodes, there are 8 GPUs, fully connected via NVLink. So TP1, TP2, TP4 and TP8 are supported.
  • In 4x pairwise NVLink nodes, there are 4 GPUs, so TP1, TP2 and TP4 are supported. TP4 here is less performant than TP4 in DGX nodes because (GPU1, GPU2) are connected via NVLink and (GPU3, GPU4) are connected via NVLink. but between these layers, the interconnect is slower.
  • You can use any combination of TP and PP. For example, you can run LLaMA2-70B on TP2-PP2 on a 4xA100 80GB Pairwise NVLink Node.

Chrome Trace

Vidur exports chrome traces of each simulation. The trace can be found in the simulator_output directory. The trace can be opened by navigating to chrome://tracing/ or edge://tracing/ and loading the trace.

Chrome Trace

Setup

Using mamba

To run the simulator, create a mamba environment with the given dependency file.

mamba env create -p ./env -f ./environment-dev.yml

Using venv

  1. Ensure that you have Python 3.12 installed on your system. Refer https://www.bitecode.dev/p/installing-python-the-bare-minimum
  2. cd into the repository root
  3. Create a virtual environment using venv module using python3.12 -m venv .venv
  4. Activate the virtual environment using source .venv/bin/activate
  5. Install the dependencies using python -m pip install -r requirements.txt
  6. Run deactivate to deactivate the virtual environment

Using conda (Least recommended)

To run the simulator, create a conda environment with the given dependency file.

conda env create -p ./env -f ./environment.yml
conda env update -f environment-dev.yml

Setting up wandb (Optional)

First, setup your account on https://<your-org>.wandb.io/ or public wandb, obtain the api key and then run the following command,

wandb login --host https://<your-org>.wandb.io

To opt out of wandb, pick any one of the following methods:

  1. export WANDB_MODE=disabled in your shell or add this in ~/.zshrc or ~/.bashrc. Remember to reload using source ~/.zshrc.
  2. Set wandb_project and wandb_group as "" in vidur/config/default.yml. Also, remove these CLI params from the shell command with which the simulator is invoked.

Running the simulator

To run the simulator, execute the following command from the repository root,

python -m vidur.main

or a big example with all the parameters,

python -m vidur.main  \
--replica_config_device a100 \
--replica_config_model_name meta-llama/Llama-2-7b-hf  \
--cluster_config_num_replicas 1 \
--replica_config_tensor_parallel_size 1 \
--replica_config_num_pipeline_stages 1 \
--request_generator_config_type synthetic \
--length_generator_config_type trace \
--interval_generator_config_type static \
--[trace|zipf|uniform|fixed]_request_length_generator_config_max_tokens 4096 \
--trace_request_length_generator_config_trace_file ./data/processed_traces/arxiv_summarization_stats_llama2_tokenizer_filtered_v2.csv \
--synthetic_request_generator_config_num_requests 128  \
--replica_scheduler_config_type vllm  \
--[vllm|lightllm|orca|faster_transformer|sarathi]_scheduler_config_batch_size_cap 256  \
--[vllm|lightllm]_scheduler_config_max_tokens_in_batch 4096

The simulator supports a plethora of parameters for the simulation description which can be found here.

The metrics will be logged to wandb directly and a copy will be stored in the simulator_output directory along with the chrome trace. A description of all the logged metrics can be found here.

Formatting Code

To format code, execute the following command:

make format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidur-0.0.26.tar.gz (2.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

vidur-0.0.26-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

vidur-0.0.26-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

vidur-0.0.26-cp313-cp313-musllinux_1_2_x86_64.whl (5.4 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

vidur-0.0.26-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

vidur-0.0.26-cp312-cp312-musllinux_1_2_x86_64.whl (5.4 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

vidur-0.0.26-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

vidur-0.0.26-cp311-cp311-musllinux_1_2_x86_64.whl (5.4 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

vidur-0.0.26-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

vidur-0.0.26-cp310-cp310-musllinux_1_2_x86_64.whl (5.3 MB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

vidur-0.0.26-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file vidur-0.0.26.tar.gz.

File metadata

  • Download URL: vidur-0.0.26.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vidur-0.0.26.tar.gz
Algorithm Hash digest
SHA256 6b3ca065fc9828fbe634136df0dc02fa2b212f11be5d185457738b7c8a9edb94
MD5 449a98e0d320d2f4cdc64b8d2e6689ec
BLAKE2b-256 289667e05e80a5c85d041d5719332f69fdfbaf590b7f5a47aa8cedd7c236cac0

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidur-0.0.26.tar.gz:

Publisher: publish.yml on project-vajra/vidur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vidur-0.0.26-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vidur-0.0.26-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7cb157a3cd8baaf47510f944246e764c771456c6ddf5b5df46b8f5d8ce304dd8
MD5 f66642f1187a217b7d112de29c6a4cf7
BLAKE2b-256 06eff6e07443d595800e177a2a56fd23ba2df2604b11e06aca527f9349cebc63

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidur-0.0.26-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on project-vajra/vidur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vidur-0.0.26-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vidur-0.0.26-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0136f04dd402513d42fce4345057299fbf2236c41d90e5e051842edc681e79f6
MD5 a8b7bf08f0d52945490122948353d1f4
BLAKE2b-256 859fb3f6aa99eefbc8f6695f73cf107f783dfd16ec0ce3bf1ff250abc0093f67

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidur-0.0.26-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on project-vajra/vidur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vidur-0.0.26-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for vidur-0.0.26-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 c001bf5112ec7d0c242c1d7c5fee1b8ec8859875cb51b788612fafe30721def7
MD5 37bdaa58807d6f6d88bee88f45c92638
BLAKE2b-256 c2ae619573124e6f80227bc50b17e28e0d52a34f1090f7c4959225e43b54aa09

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidur-0.0.26-cp313-cp313-musllinux_1_2_x86_64.whl:

Publisher: publish.yml on project-vajra/vidur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vidur-0.0.26-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vidur-0.0.26-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 aca89f02c34baa204865d6dbc3c13483eac4d9b501fc21a6005140cf1bac1c4f
MD5 255d80f5819f8324e12126da8bd706cd
BLAKE2b-256 f4451eba1d0d263485f9c02c916a61c1a79c84532512f21c500c364d56a43071

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidur-0.0.26-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on project-vajra/vidur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vidur-0.0.26-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for vidur-0.0.26-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 0ab53614535da5752dd5a68fe23955cb1a17e49b2254123c3ad12c292391a764
MD5 29ff6c2689e956f3f64c05356bae2471
BLAKE2b-256 74fb916a6acef7e0f2fbad25f3da5d145936d32daa2fe60dc4419fbddb33f6ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidur-0.0.26-cp312-cp312-musllinux_1_2_x86_64.whl:

Publisher: publish.yml on project-vajra/vidur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vidur-0.0.26-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vidur-0.0.26-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 70e96999901d324520f891f9881fd414e791e035ea3fd79d0e5ed05a262df69f
MD5 4cc2ec121386f6cec2356970be9e4fbc
BLAKE2b-256 666002a2d63e920d697c85211861fc3b836839308372fcbbe35ff74634842276

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidur-0.0.26-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on project-vajra/vidur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vidur-0.0.26-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for vidur-0.0.26-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 4b83314b594cc2fcd52371c901f88fd4c548bb933a4fb1cd1e2ae040f2c1045a
MD5 df68751ac065948891e3a5eb67d38b15
BLAKE2b-256 744f1bc60c6d6a6cf52c901c3c8aefae106dc24652f4971590ecf46b77f1ec41

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidur-0.0.26-cp311-cp311-musllinux_1_2_x86_64.whl:

Publisher: publish.yml on project-vajra/vidur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vidur-0.0.26-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vidur-0.0.26-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b31aec590ec37b155aa279870854e84dff11f363a22768e5fa9b3ac0076bc80e
MD5 a905ea6b16f19a76fe7a7e7ec248aef9
BLAKE2b-256 c2d73dc0fd7775c79b7b1508c15a6f1776cb51e0139e1698e696580e189eb2c2

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidur-0.0.26-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on project-vajra/vidur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vidur-0.0.26-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for vidur-0.0.26-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 010bb577ba549025f62d880c43f458bff214928ea8e5413cbb8a770e87fcffbc
MD5 b0d38811493106a169929dc65ceaf356
BLAKE2b-256 496a2c52a00b9c247d982162315a26299d19380390e1ce65401261749a5929d5

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidur-0.0.26-cp310-cp310-musllinux_1_2_x86_64.whl:

Publisher: publish.yml on project-vajra/vidur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vidur-0.0.26-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vidur-0.0.26-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4479031f448b9c95ad7636b5899110fdf39ba979097d721a1b3de336577a203a
MD5 0b813472a64c34e255f5a28efb061d83
BLAKE2b-256 0945ed0fd3120ee6cdb7450adf0d661c0aaa4d0335b926a826a3e4cf431c751e

See more details on using hashes here.

Provenance

The following attestation bundles were made for vidur-0.0.26-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on project-vajra/vidur

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page