Skip to main content

No project description provided

Project description

vLLM TPU vLLM TPU

| Documentation | Blog | User Forum | Developer Slack |


Upcoming Events 🔥

Latest News 🔥

  • [2025/10] vLLM TPU: A New Unified Backend Supporting PyTorch and JAX on TPU
Previous News 🔥

About

vLLM TPU is now powered by tpu-inference, an expressive and powerful new hardware plugin unifying JAX and PyTorch under a single lowering path within the vLLM project. The new backend now provides a framework for developers to:

  • Push the limits of TPU hardware performance in open source.
  • Provide more flexibility to JAX and PyTorch users by running PyTorch model definitions performantly on TPU without any additional code changes, while also extending native support to JAX.
  • Retain vLLM standardization: keep the same user experience, telemetry, and interface.

Recommended models and features

Although vLLM TPU’s new unified backend makes out-of-the-box high performance serving possible with any model supported in vLLM, the reality is that we're still in the process of implementing a few core components.

For this reason, we’ve provided a list of recommended models and features that are validated for accuracy and stress-tested for performance.

Get started

Get started with vLLM on TPUs by following the quickstart guide.

Visit our documentation to learn more.

Contribute

We're always looking for ways to partner with the community to accelerate vLLM TPU development. If you're interested in contributing to this effort, check out the Contributing guide and Issues to start. We recommend filtering Issues on the good first issue tag if it's your first time contributing.

Contact us

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tpu_inference-0.11.1rc2.tar.gz (229.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tpu_inference-0.11.1rc2-py3-none-any.whl (269.0 kB view details)

Uploaded Python 3

File details

Details for the file tpu_inference-0.11.1rc2.tar.gz.

File metadata

  • Download URL: tpu_inference-0.11.1rc2.tar.gz
  • Upload date:
  • Size: 229.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for tpu_inference-0.11.1rc2.tar.gz
Algorithm Hash digest
SHA256 14f3ab835228eb573546bcc0f802c68169bbfd005a40bc69ed6a02bff43f4f55
MD5 20b2a3cceea893c0934a9681d2165b9c
BLAKE2b-256 7f31aabce6445768aebf814092bf82b1be46137444c880c59b5db32410d3d778

See more details on using hashes here.

File details

Details for the file tpu_inference-0.11.1rc2-py3-none-any.whl.

File metadata

File hashes

Hashes for tpu_inference-0.11.1rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 74b17624dd7d4e40c99d66a82e2b857dc5fda5223e7d768ba888f641359228c4
MD5 3f9d63b6d45675e3d407fac7d4270f2c
BLAKE2b-256 cc715dbf5692173464973d5ed31da91a8f9a99607ff211807cffafa468913fcd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page