Python client for ModelExpress P2P GPU transfer service

These details have not been verified by PyPI

Project links

Project description

ModelExpress Python Client

Python client for ModelExpress -- high-performance GPU-to-GPU model weight transfers using NVIDIA NIXL over RDMA/InfiniBand.

Instead of each vLLM instance loading model weights from storage, one "source" instance loads the model and transfers weights directly to "target" instances via GPUDirect RDMA, bypassing the CPU entirely.

Installation

# From PyPI (coming soon)
pip install modelexpress

# Editable install from source
pip install -e .

# With dev dependencies (pytest, grpcio-tools)
pip install -e ".[dev]"

Requirements

Python >= 3.10
NVIDIA GPUs with RDMA/InfiniBand support
NIXL (NVIDIA Interconnect eXchange Library)
A running ModelExpress server (Rust gRPC service backed by Redis)

Quick Start with vLLM

ModelExpress integrates with vLLM via custom model loaders. vLLM can discover the package through its vllm.general_plugins entrypoint; set VLLM_PLUGINS=modelexpress if your vLLM deployment requires explicit plugin selection. For manual registration, call register_modelexpress_loaders() in your code.

export MX_SERVER_ADDRESS="modelexpress-server:8001"

vllm serve deepseek-ai/DeepSeek-V3 \
    --load-format modelexpress \
    --tensor-parallel-size 8

Starting the vLLM engine with the modelexpress load format on the source worker will load the weights from disk and register/publish the NIXL and tensor metadata to the MX server. The mx load format is kept as a backward-compatible alias. And on the target worker, it will retrieve these metadata from MX serverand stream weights over RDMA from GPU to GPU.

Programmatic Usage

MxClient

MxClient is a lightweight gRPC client for communicating with the ModelExpress server:

from modelexpress import MxClient

client = MxClient(server_url="modelexpress-server:8001")

# Query for a source model
response = client.get_metadata("deepseek-ai/DeepSeek-V3")
if response.found:
    for worker in response.workers:
        print(f"Worker rank {worker.worker_rank}: {len(worker.tensors)} tensors")

# Wait for source readiness (blocks until ready or timeout)
success, session_id, metadata_hash = client.wait_for_ready(
    model_name="deepseek-ai/DeepSeek-V3",
    worker_id=0,
    timeout_seconds=7200,
)

client.close()

Registering Loaders Manually

from modelexpress import register_modelexpress_loaders

register_modelexpress_loaders()
# Now vLLM recognizes --load-format modelexpress and mx

Environment Variables

Variable	Default	Description
`MX_SERVER_ADDRESS`	`localhost:8001`	ModelExpress gRPC server address (recommended)
`MODEL_EXPRESS_URL`	`localhost:8001`	Deprecated, pending removal in a future release. Still read by all client paths and takes precedence when both are set; keep setting it during the transition.
`MX_EXPECTED_WORKERS`	Auto-detected from TP size	Number of GPU workers to coordinate
`MX_SYNC_PUBLISH`	`0`	Source: wait for all workers before publishing metadata
`MX_SYNC_START`	`1`	Target: wait for all source workers before transferring
`MX_POOL_REG`	`0`	Allocation-level NIXL registration (registers cudaMalloc blocks instead of individual tensors)

UCX/NIXL Tuning

Variable	Recommended	Description
`UCX_RNDV_SCHEME`	`get_zcopy`	Zero-copy RDMA reads
`UCX_RNDV_THRESH`	`0`	Force rendezvous for all transfers
`NIXL_LOG_LEVEL`	`INFO`	NIXL logging level

Package Structure

Module	Description
`modelexpress.client`	`MxClient` -- gRPC client for the ModelExpress server
`modelexpress.metadata`	Metadata clients, source identity, heartbeat, and worker manifest serving
`modelexpress.engines.vllm.loader`	`MxModelLoader` -- vLLM integration
`modelexpress.vllm_loader`	Compatibility shim for the vLLM loader
`modelexpress.nixl_transfer`	`NixlTransferManager` -- NIXL agent lifecycle and RDMA transfers
`modelexpress.types`	`TensorDescriptor`, `WorkerMetadata` -- core data types
`modelexpress.vllm_worker`	Compatibility worker extension for older manual-registration workflows

How It Works

Source loads weights from disk, registers raw tensors with NIXL before FP8 processing, and publishes metadata to the ModelExpress server.
Target creates dummy weights, waits for the source ready flag, then pulls raw tensors via RDMA read.
Both source and target run process_weights_after_loading() independently, producing identical FP8-transformed weights.

This pre-processing transfer strategy is critical for FP8 models (e.g., DeepSeek-V3) where weight_scale_inv tensors are renamed and transformed during processing.

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Jun 10, 2026

0.3.0

Apr 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelexpress-0.4.0.tar.gz (136.1 kB view details)

Uploaded Jun 10, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

modelexpress-0.4.0-py3-none-any.whl (116.2 kB view details)

Uploaded Jun 10, 2026 Python 3

modelexpress-0.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (199.4 kB view details)

Uploaded Jun 10, 2026 CPython 3.12manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

modelexpress-0.4.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (199.8 kB view details)

Uploaded Jun 10, 2026 CPython 3.12manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file modelexpress-0.4.0.tar.gz.

File metadata

Download URL: modelexpress-0.4.0.tar.gz
Upload date: Jun 10, 2026
Size: 136.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for modelexpress-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`4fb9436bf184c3e0a35d7eff0e2997772d10049f43b9cf08ce0392cfad53cd4a`
MD5	`eb670ff8170cd6786a177663aa62e3de`
BLAKE2b-256	`f95b64b8c621ecb7176d0478ab85cfea198361aa56aca13f46e1a752944689a6`

See more details on using hashes here.

File details

Details for the file modelexpress-0.4.0-py3-none-any.whl.

File metadata

Download URL: modelexpress-0.4.0-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 116.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for modelexpress-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6edf8c9437f55fa0227a170a0927cedbb21f1f37fc513733c1b86fdfb069285a`
MD5	`bf5043da4978a22dd99f5ae40dea6c6a`
BLAKE2b-256	`7fae11bcb8084809d580e89c28be1a91219ecac76bff55b87742edc94af3e950`

See more details on using hashes here.

File details

Details for the file modelexpress-0.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

Download URL: modelexpress-0.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Upload date: Jun 10, 2026
Size: 199.4 kB
Tags: CPython 3.12, manylinux: glibc 2.24+ x86-64, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for modelexpress-0.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`0189ccad66ee625eb73cecb541bbcbd57ce057491cb065e48e2f17c89b65380f`
MD5	`17a80a3a4da12af1149df739d2ba0c05`
BLAKE2b-256	`e136ff15fc6c8026c9a49760105ee126e26b4dbffba83907f2b2913b9b8744ea`

See more details on using hashes here.

File details

Details for the file modelexpress-0.4.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

Download URL: modelexpress-0.4.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Upload date: Jun 10, 2026
Size: 199.8 kB
Tags: CPython 3.12, manylinux: glibc 2.24+ ARM64, manylinux: glibc 2.28+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for modelexpress-0.4.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm	Hash digest
SHA256	`22a9092fdff85581e62b5a2f595a4385fb0f1305b4d1212a11f4dc9b2781d2bb`
MD5	`afe60880b4a720d61054c75bc5908189`
BLAKE2b-256	`67d3e2343f18e696d0aa3c8eab7261519cbefb7beaaa9b718bd71732c284dcc9`

See more details on using hashes here.

modelexpress 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ModelExpress Python Client

Installation

Requirements

Quick Start with vLLM

Programmatic Usage

MxClient

Registering Loaders Manually

Environment Variables

UCX/NIXL Tuning

Package Structure

How It Works

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes