vLLM platform plugin for Moore Threads MUSA GPUs

These details have not been verified by PyPI

Project links

Project description

vLLM MUSA Platform Plugin

A vLLM platform plugin that enables running vLLM on Moore Threads MUSA GPUs.

Overview

This plugin provides MUSA (Moore Threads Unified Software Architecture) support for vLLM through:

torchada: CUDA→MUSA compatibility layer for PyTorch
pymtml: Moore Threads Management Library for device queries
Triton patches: Compatibility fixes for MUSA's Triton compiler

Requirements

Python 3.9+
vLLM
Moore Threads GPU with MUSA toolkit installed
torchada (CUDA→MUSA compatibility)
mthreads-ml-py (pymtml - MTML bindings)

Installation

From Source (Development)

# Clone the repository
git clone https://github.com/vllm-project/vllm-musa.git
cd vllm-musa

# Install in development mode
pip install -e .

# Or with development dependencies
pip install -e ".[dev]"

From PyPI (when published)

pip install vllm-musa

Verification

After installation, verify the plugin is registered:

python -c "from vllm_musa_platform import musa_platform_plugin; print('Plugin loaded successfully')"

Check if MTML (device management) is available:

python -c "from vllm_musa_platform import mtml; print(f'MTML available: {mtml.is_mtml_available()}')"

Usage

Once installed, the plugin is automatically detected by vLLM. Simply run vLLM as usual:

from vllm import LLM, SamplingParams

# vLLM will automatically use the MUSA platform
llm = LLM(model="your-model-path", trust_remote_code=True)

sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=100)
outputs = llm.generate(["Hello, how are you?"], sampling_params)

for output in outputs:
    print(output.outputs[0].text)

Environment Variables

MUSA_VISIBLE_DEVICES: Control which MUSA devices are visible (similar to CUDA_VISIBLE_DEVICES)
VLLM_WORKER_MULTIPROC_METHOD=spawn: Recommended for multi-process workers

Example

VLLM_WORKER_MULTIPROC_METHOD=spawn python -c "
from vllm import LLM, SamplingParams

llm = LLM(model='/path/to/model', trust_remote_code=True, enforce_eager=True)
outputs = llm.generate(['Hello!'], SamplingParams(max_tokens=20))
print(outputs[0].outputs[0].text)
"

Testing

Unit Tests

Run the test suite:

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_mtml.py -v
pytest tests/test_musa.py -v
pytest tests/test_patches.py -v

# Run with coverage
pytest tests/ -v --cov=vllm_musa_platform --cov-report=term-missing

Supported vLLM Versions

This plugin supports multiple vLLM versions:

vLLM Version	PyTorch Version	Engine	Status
0.10.1.1	2.7.1	V0/V1	✅ Supported
0.13.0	2.7.1	V1 only	✅ Supported

Testing with Different vLLM Versions

vLLM 0.10.1.1 (with torch 2.7.1)

# Install the plugin (vLLM 0.10.1.1 is installed automatically as a dependency)
pip install -e .

# Start the server
vllm serve /path/to/model/

# In another terminal, test inference
curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "/path/to/model/", "prompt": "Hello!", "max_tokens": 50}'

vLLM 0.13.0 (with torch 2.7.1)

Important: Use --no-deps when upgrading vLLM to prevent torch from being replaced. The MUSA container includes a pre-configured torch 2.7.1 that must not be overwritten.

# Install the plugin (vLLM 0.10.1.1 is installed automatically as a dependency)
pip install -e .

# Upgrade to vLLM 0.13.0 without reinstalling dependencies
pip install vllm==0.13.0 --no-deps --upgrade

# Install additional dependencies required by vLLM 0.13.0
pip install 'depyf==0.20.0' 'llguidance>=1.3.0,<1.4.0' \
            'lm-format-enforcer==0.11.3' 'outlines_core==0.2.11' \
            'xgrammar==0.1.27' 'compressed-tensors==0.12.2' \
            'model-hosting-container-standards<1.0.0,>=0.1.9' \
            ijson anthropic mcp

# Start the server
vllm serve /path/to/model/

# In another terminal, test inference
curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "/path/to/model/", "prompt": "Hello!", "max_tokens": 50}'

# Test chat completions
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "/path/to/model/", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 50}'

Version-Specific Notes

vLLM 0.10.x

Supports both V0 and V1 engines
Uses VLLM_USE_V1=1 environment variable to enable V1 engine
The vllm.worker.worker module exists for V0 engine support

vLLM 0.13.x

V1 is the default (and only) engine
The vllm.worker module was removed (V0 engine deprecated)
Requires additional dependencies: depyf, llguidance, lm-format-enforcer, outlines_core, xgrammar, compressed-tensors

Docker Testing

For containerized testing with MUSA GPUs:

# Start a container with MUSA support
docker run -d --net host --privileged --pid=host --shm-size 500g \
  -v $PWD:/ws -w /ws \
  -v /data/vllm:/home/dist \
  --name musa-test \
  sh-harbor.mthreads.com/mcctest/musa-compile:rc4.3.3-torch2.7-20251120 \
  sleep infinity

# Enter the container
docker exec -it musa-test bash

# Inside the container, install and test
pip install -e /ws
vllm serve /home/dist/your-model/

Project Structure

vllm-musa/
├── pyproject.toml              # Project configuration
├── README.md                   # This file
├── vllm_musa_platform/         # Main package
│   ├── __init__.py             # Plugin entry point
│   ├── mtml.py                 # MTML wrapper (device management)
│   ├── musa.py                 # MUSA platform implementation
│   └── patches/                # Compatibility patches
│       ├── __init__.py         # Patch application logic
│       ├── README.md           # Patch documentation
│       ├── vllm__attention__ops__triton_unified_attention.patch.py
│       ├── vllm__v1__worker__gpu_worker.patch.py
│       └── vllm__worker__worker.patch.py
└── tests/                      # Test suite
    ├── conftest.py             # Pytest fixtures
    ├── test_mtml.py            # MTML wrapper tests
    ├── test_musa.py            # Platform tests
    └── test_patches.py         # Patch system tests

Patches

The plugin includes runtime patches for vLLM compatibility with MUSA's Triton compiler. See patches/README.md for details.

License

Apache-2.0

Contributing

Contributions are welcome! Please ensure all tests pass before submitting:

# Run tests
pytest tests/ -v

# Run linter (if ruff is installed)
ruff check .

# Run type checker (if mypy is installed)
mypy vllm_musa_platform/

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Dec 26, 2025

This version

0.1.0

Dec 26, 2025

0.0.0.1

Nov 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_musa-0.1.0.tar.gz (20.0 kB view details)

Uploaded Dec 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vllm_musa-0.1.0-py3-none-any.whl (16.2 kB view details)

Uploaded Dec 26, 2025 Python 3

File details

Details for the file vllm_musa-0.1.0.tar.gz.

File metadata

Download URL: vllm_musa-0.1.0.tar.gz
Upload date: Dec 26, 2025
Size: 20.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for vllm_musa-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6b7c39a9b040dd650eb0fd08b88ae97835fe046b1ea87bfcc6ce9476acf981b2`
MD5	`176a004047d3abc78ed63ce3b795e6d6`
BLAKE2b-256	`b1ce04f351632caabb74d7cac6829b1c565fbebb255d42474aeee6e659c9c442`

See more details on using hashes here.

File details

Details for the file vllm_musa-0.1.0-py3-none-any.whl.

File metadata

Download URL: vllm_musa-0.1.0-py3-none-any.whl
Upload date: Dec 26, 2025
Size: 16.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for vllm_musa-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c5ea84f3a0ca0cb925aa9b7bd94f32558950fb8c5a9b5ae66a85c615525b7443`
MD5	`116c4d8bc71c3af11d992799c7163a34`
BLAKE2b-256	`791f04f5584b81595c19846bf988d51584a9782db38dda7de7068c51a42b2e5a`

See more details on using hashes here.

vllm-musa 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

vLLM MUSA Platform Plugin

Overview

Requirements

Installation

From Source (Development)

From PyPI (when published)

Verification

Usage

Environment Variables

Example

Testing

Unit Tests

Supported vLLM Versions

Testing with Different vLLM Versions

vLLM 0.10.1.1 (with torch 2.7.1)

vLLM 0.13.0 (with torch 2.7.1)

Version-Specific Notes

vLLM 0.10.x

vLLM 0.13.x

Docker Testing

Project Structure

Patches

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes