Skip to main content

Adapter package for torch_musa to act exactly like PyTorch CUDA

Project description

logo

torchada

English | 中文

Run your CUDA code on Moore Threads GPUs — zero code changes required

torchada is an adapter that makes torch_musa (Moore Threads GPU support for PyTorch) compatible with standard PyTorch CUDA APIs. Import it once, and your existing torch.cuda.* code works on MUSA hardware.

Why torchada?

Many PyTorch projects are written for NVIDIA GPUs using torch.cuda.* APIs. To run these on Moore Threads GPUs, you would normally need to change every cuda reference to musa. torchada eliminates this by automatically translating CUDA API calls to MUSA equivalents at runtime.

Prerequisites

  • torch_musa: You must have torch_musa installed (this provides MUSA support for PyTorch)
  • Moore Threads GPU: A Moore Threads GPU with proper driver installed

Installation

pip install torchada

# Or install from source
git clone https://github.com/MooreThreads/torchada.git
cd torchada
pip install -e .

Quick Start

import torchada  # ← Add this one line at the top
import torch

# Your existing CUDA code works unchanged:
x = torch.randn(10, 10).cuda()
print(torch.cuda.device_count())
torch.cuda.synchronize()

That's it! All torch.cuda.* APIs are automatically redirected to torch.musa.*.

What Works

Feature Example
Device operations tensor.cuda(), model.cuda(), torch.device("cuda")
Memory management torch.cuda.memory_allocated(), empty_cache()
Synchronization torch.cuda.synchronize(), Stream, Event
Mixed precision torch.cuda.amp.autocast(), GradScaler()
CUDA Graphs torch.cuda.CUDAGraph, torch.cuda.graph()
CUDA Runtime torch.cuda.cudart() → uses MUSA runtime
Profiler ProfilerActivity.CUDA → uses PrivateUse1
Custom Ops Library.impl(..., "CUDA") → uses PrivateUse1
Distributed dist.init_process_group(backend='nccl') → uses MCCL
torch.compile torch.compile(model) with all backends
C++ Extensions CUDAExtension, BuildExtension, load()
FlexAttention torch.nn.attention.flex_attention works on MUSA
ctypes Libraries ctypes.CDLL with CUDA function names → MUSA equivalents

Examples

Mixed Precision Training

import torchada
import torch

model = MyModel().cuda()
scaler = torch.cuda.amp.GradScaler()

with torch.cuda.amp.autocast():
    output = model(data.cuda())
    loss = criterion(output, target.cuda())

scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

Distributed Training

import torchada
import torch.distributed as dist

# 'nccl' is automatically mapped to 'mccl' on MUSA
dist.init_process_group(backend='nccl')

CUDA Graphs

import torchada
import torch

g = torch.cuda.CUDAGraph()
with torch.cuda.graph(cuda_graph=g):  # cuda_graph= keyword works on MUSA
    y = model(x)

torch.compile

import torchada
import torch

compiled_model = torch.compile(model.cuda(), backend='inductor')

Building C++ Extensions

import torchada  # Must import before torch.utils.cpp_extension
from torch.utils.cpp_extension import CUDAExtension, BuildExtension

# Standard CUDAExtension works — torchada handles CUDA→MUSA translation
ext = CUDAExtension("my_ext", sources=["kernel.cu"])

Custom Ops

import torchada
import torch

my_lib = torch.library.Library("my_lib", "DEF")
my_lib.define("my_op(Tensor x) -> Tensor")
my_lib.impl("my_op", my_func, "CUDA")  # Works on MUSA!

Profiler

import torchada
import torch

# ProfilerActivity.CUDA works on MUSA
with torch.profiler.profile(
    activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA]
) as prof:
    model(x)

ctypes Library Loading

import torchada
import ctypes

# Load MUSA runtime library with CUDA function names
lib = ctypes.CDLL("libmusart.so")
func = lib.cudaMalloc  # Automatically translates to musaMalloc

# Works with MCCL too
nccl_lib = ctypes.CDLL("libmccl.so")
func = nccl_lib.ncclAllReduce  # Automatically translates to mcclAllReduce

Platform Detection

import torchada
from torchada import detect_platform, Platform

platform = detect_platform()
if platform == Platform.MUSA:
    print("Running on Moore Threads GPU")
elif platform == Platform.CUDA:
    print("Running on NVIDIA GPU")

# Or use torch.version-based detection
def is_musa():
    import torch
    return hasattr(torch.version, 'musa') and torch.version.musa is not None

Performance

torchada uses aggressive caching to minimize runtime overhead. All frequently-called operations complete in under 200 nanoseconds:

Operation Overhead
torch.cuda.device_count() ~140ns
torch.cuda.Stream (attribute access) ~130ns
torch.cuda.Event (attribute access) ~130ns
_translate_device('cuda') ~140ns
torch.backends.cuda.is_built() ~155ns

For comparison, a typical GPU kernel launch takes 5,000-20,000ns. The patching overhead is negligible for real-world applications.

Operations with inherent costs (runtime calls, object creation) take 300-600ns but cannot be optimized further without changing behavior.

Known Limitation

Device type string comparisons fail on MUSA:

device = torch.device("cuda:0")  # On MUSA, this becomes musa:0
device.type == "cuda"  # Returns False!

Solution: Use torchada.is_gpu_device():

import torchada

if torchada.is_gpu_device(device):  # Works on both CUDA and MUSA
    ...
# Or: device.type in ("cuda", "musa")

API Reference

Function Description
detect_platform() Returns Platform.CUDA, Platform.MUSA, or Platform.CPU
is_musa_platform() Returns True if running on MUSA
is_cuda_platform() Returns True if running on CUDA
is_gpu_device(device) Returns True if device is CUDA or MUSA
CUDA_HOME Path to CUDA/MUSA installation
cuda_to_musa_name(name) Convert cudaXxxmusaXxx
nccl_to_mccl_name(name) Convert ncclXxxmcclXxx
cublas_to_mublas_name(name) Convert cublasXxxmublasXxx
curand_to_murand_name(name) Convert curandXxxmurandXxx

Note: torch.cuda.is_available() is intentionally NOT redirected — it returns False on MUSA. This allows proper platform detection. For GPU availability checks, see the has_gpu() pattern in examples/migrate_existing_project.md.

Note: The name conversion utilities are exported for manual use, but ctypes.CDLL is automatically patched to translate function names when loading MUSA libraries.

C++ Extension Symbol Mapping

When building C++ extensions, torchada automatically translates CUDA symbols to MUSA:

CUDA MUSA
cudaMalloc musaMalloc
cudaStream_t musaStream_t
cublasHandle_t mublasHandle_t
at::cuda at::musa
c10::cuda c10::musa
#include <cuda/*> #include <musa/*>

See src/torchada/_mapping.py for the complete mapping table (380+ mappings).

Integrating torchada into Your Project

Step 1: Add Dependency

# pyproject.toml or requirements.txt
torchada>=0.1.45

Step 2: Conditional Import

# At your application entry point
def is_musa():
    import torch
    return hasattr(torch.version, "musa") and torch.version.musa is not None

if is_musa():
    import torchada  # noqa: F401

# Rest of your code uses torch.cuda.* as normal

Step 3: Extend Feature Flags (if applicable)

# Include MUSA in GPU capability checks
if is_nvidia() or is_musa():
    ENABLE_FLASH_ATTENTION = True

Step 4: Fix Device Type Checks (if applicable)

# Instead of: device.type == "cuda"
# Use: device.type in ("cuda", "musa")
# Or: torchada.is_gpu_device(device)

Projects Using torchada

Project Category Status
Xinference Model Serving ✅ Merged
LightLLM Model Serving ✅ Merged
LightX2V Image/Video Generation ✅ Merged
Chitu Model Serving ✅ Merged
SGLang Model Serving In Progress
ComfyUI Image/Video Generation In Progress

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchada-0.1.45.tar.gz (74.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchada-0.1.45-py3-none-any.whl (46.7 kB view details)

Uploaded Python 3

File details

Details for the file torchada-0.1.45.tar.gz.

File metadata

  • Download URL: torchada-0.1.45.tar.gz
  • Upload date:
  • Size: 74.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.16

File hashes

Hashes for torchada-0.1.45.tar.gz
Algorithm Hash digest
SHA256 4f970da55bc725a0f57222f80bc60882c6b982d5f7a5e86e5fe7504b11bc4081
MD5 97f43a6e4853df70f28f984f9ba26dec
BLAKE2b-256 62007f3630a44765794f031834999953ef83c0c046a5b44535be3bda07bb5e7a

See more details on using hashes here.

File details

Details for the file torchada-0.1.45-py3-none-any.whl.

File metadata

  • Download URL: torchada-0.1.45-py3-none-any.whl
  • Upload date:
  • Size: 46.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.16

File hashes

Hashes for torchada-0.1.45-py3-none-any.whl
Algorithm Hash digest
SHA256 9908b2da0e23fe376fb58d51efe627ca1d0c70a0e653f4c156be5ebe0ece22ff
MD5 b60a64b251beef38f3953440ae4a4f8b
BLAKE2b-256 d13fe047b1603b1c4482edd196d929eb6dc4dec73cd91781fa657a6c8fa401be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page