Skip to main content

An ultra-fast, distributed Safetensors loader

Reason this release was yanked:

If you need this version, install libboost-dev first: 'sudo apt install libboost-dev'.

Project description

InstantTensor

InstantTensor is an ultra-fast, distributed Safetensors loader designed to maximize I/O throughput when moving model weights from Safetensors files to GPU memory.

Model loading benchmark on inference engines:

Model GPU Backend Load Time (s) Throughput (GB/s) Speedup
Qwen3-30B-A3B 1*H20 Safetensors 57.4 1.2 1x
Qwen3-30B-A3B 1*H20 InstantTensor 1.77 39 32.5x
DeepSeek-R1 8*H20 Safetensors 160 4.3 1x
DeepSeek-R1 8*H20 InstantTensor 15.3 45 10.5x

Quickstart

from instanttensor import safe_open

tensors = {}
with safe_open("model.safetensors", framework="pt", device=0) as f:
    for name, tensor in f.tensors():
        tensors[name] = tensor.clone()

NOTE: tensor points to the internal buffer of InstantTensor and should be copied immediately (e.g. by clone() or copy_()) to avoid data being overwritten during buffer reuse.

See Usage for more details (multi-file and distributed usage).

Why InstantTensor?

  • Fast weight loading:
    • Direct I/O: Avoid the slow page cache allocation on cold start. Friendly for large models and tight memory budgets.
    • Tuned I/O size and concurrency: Maximize hardware throughput.
    • Pipelining and prefetching: Parallelize and overlap the various stages of transmission.
  • Distributed loading: Use torch.distributed (NCCL) to speed up loading under any parallelism policy (TP/PP/EP/CP/DP).
  • Minimal device buffer: ≤ ~3× largest-tensor size; far below single-file size.
  • Multiple I/O backends:
    • GPUDirect Storage
    • Legacy Storage
    • Memory-based Storage

Installation

First, we need a Linux environment with CUDA driver installed. The typical installation steps are as follows:

Method 1: Install from pip

pip install instanttensor

Method 2: Build from source

cd ./instanttensor
pip install .
# For a debug build, set "DEBUG=1" before "pip"

Usage

Multi-file mode (recommended)

Passing a list of files allows the backend to plan reads and provides higher throughput than making multiple calls to load single files:

from instanttensor import safe_open

files = ["model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors"]
tensors = {}
with safe_open(files, framework="pt", device=0) as f:
    for name, tensor in f.tensors():
        tensors[name] = tensor.clone()

Distributed loading

InstantTensor can use a torch.distributed NCCL process group to coordinate loading and achieve higher throughput compared to running safe_open independently on each GPU.

import torch
import torch.distributed as dist
from instanttensor import safe_open

dist.init_process_group(backend="nccl")
process_group = dist.GroupMember.WORLD

files = ["model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors"]
tensors = {}
with safe_open(files, framework="pt", device=torch.cuda.current_device(), process_group=process_group) as f:
    for name, tensor in f.tensors():
        tensors[name] = tensor.clone()

NOTE: You can also load weights using a subgroup created via dist.new_group, which allows multiple subgroups to load weights independently. For example, if you have TP=8 and PP=2 (i.e., two TP groups), you can create two subgroups and load weights independently on each TP group. In cross-node (multi-machine) scenarios, loading using per-node subgroups can sometimes be faster than loading on the world group. However, for most cases, the world group is a good default choice.

See tests/test.py for a full benchmark harness (TP/PP grouping, checksums, etc.).

API reference

See Build API reference

Thanks

Thanks to the AI Systems and Optimization team at ScitiX AI and the Wenfei Wu Lab at Peking University.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instanttensor-0.1.1.tar.gz (11.4 MB view details)

Uploaded Source

File details

Details for the file instanttensor-0.1.1.tar.gz.

File metadata

  • Download URL: instanttensor-0.1.1.tar.gz
  • Upload date:
  • Size: 11.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for instanttensor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 131a9b80a2bc796b21aed0ab2df1d0101a1b96a45a1e1631fe06b57358c5b614
MD5 1170969d71871f30294791c58af5c5ee
BLAKE2b-256 1e9a070f0df4f8e4de060ab2026bebdcf8c0fdf3eedbb9c2fc79c5bcdb3d76ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page