Skip to main content

High-performance safetensors model loader

Project description

fastsafetensors is an efficient safetensors model loader. We introduced three major features to optimize model loading performance:

  1. Batched, lazy tensor instantiations
  2. GPU offloading for sharding, type conversions, and device pointer alignment.
  3. GPU Direct Storage enablement for file loading from storage to GPU memory

A major design difference from the original safetensors file loader is NOT to use mmap. It loads tensors on-demand with mmap'ed files, but unfortunately, it cannot fully utilize high-throughput I/O such as NVMe SSDs. So, we asynchronously transfer files in parallel to saturate storage throughput. Then, fastsafetensors lazily instantiates tensors at GPU device memory with DLPack.

Another design change is to offload sharding and other manipulations on tensors to GPUs. The original loader provides slicing for sharding at user programs before copying to device memory. However, it incurrs high CPU usages for host memory accesses. So, we introduce a special APIs to run sharding with torch.distributed collective operations such as broadcast and scatter. The offloading is also applied to other tensor manipulations such as type conversions.

The above two design can be naturally extended to utilize device-to-device data transfers with GPU Direct Storage. The technology helps to minimize copy overheads from NVMe SSDs to GPU memory with host CPU and memory bypassed.

Dependencies

We currently test fastsafetensors only with python 3.11, pytorch 2.1, and cuda-12. Note: when using different versions of pytorch, you may require changes on build environments for libpytorch since it seems slightly changing ABIs.

Install from PyPi (TBD)

pip install fastsfaetensors

Local installation

Prerequisites: Install torch, cuda, and numa headers

make install

Package build

Prerequisites: Install Docker (libtorch 2.1, cuda, and numa are automatically pulled)

make dist

Unit tests

make install-test # install stub'ed fastsafetensors without torch, cuda, and numa
make unittest

Sample code

see example/load.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastsafetensors-0.1.2.tar.gz (28.4 kB view details)

Uploaded Source

Built Distributions

fastsafetensors-0.1.2-cp311-cp311-manylinux_2_34_x86_64.whl (123.9 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.34+ x86-64

fastsafetensors-0.1.2-cp310-cp310-manylinux_2_34_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.34+ x86-64

File details

Details for the file fastsafetensors-0.1.2.tar.gz.

File metadata

  • Download URL: fastsafetensors-0.1.2.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.4

File hashes

Hashes for fastsafetensors-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f20e1eec294eb90af5781883d16e470f91b263169eee4f4583ee2b4646bb37be
MD5 0108d8e017ced8d1dd24bdb95f9db2ab
BLAKE2b-256 7478078915e61a80e70564cffaac9d889886ed7c7b2c8888e7af05ae6294231c

See more details on using hashes here.

File details

Details for the file fastsafetensors-0.1.2-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fastsafetensors-0.1.2-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 562d287885e13fca8a09beabb47560bd64c85f569563d552516f0253b7dc238f
MD5 9053891eb54c6a9e3bfa99ea6f07a582
BLAKE2b-256 25800b69da6aa2fd590082cf6f635fbb54b15fd01210a5cd81a4f33b792e0187

See more details on using hashes here.

File details

Details for the file fastsafetensors-0.1.2-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fastsafetensors-0.1.2-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 8da50c8ba053a791bea1e5ec9ea16cf210001f7ad48426f8fb07bb9ff508d6b5
MD5 c276b003d5e4ba25e85c7184dcce15c5
BLAKE2b-256 629467db6a07b367624b742998b3814afa9b7caf8a7a2fa2074f3edc1e7fda60

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page