Skip to main content

High-performance safetensors model loader

Project description

fastsafetensors is a reimplementation of safetensors model loader to improve efficiency. We introduced three major features to optimize model loading performance:

  1. Batched, lazy tensor instantiations
  2. GPU offloading for sharding, type conversions, and device pointer alignment.
  3. GPU Direct Storage enablement for file loading from storage to GPU memory

A major design difference from the original safetensors file loader is NOT to use mmap. It loads tensors on-demand with mmap'ed files, but unfortunately, it cannot fully utilize high-throughput I/O such as NVMe SSDs. So, we asynchronously transfer files in parallel to saturate storage throughput. Then, fastsafetensors lazily instantiates tensors at GPU device memory with DLPack.

Another design change is to offload sharding and other manipulations on tensors to GPUs. The original loader provides slicing for sharding at user programs before copying to device memory. However, it incurrs high CPU usages for host memory accesses. So, we introduce a special APIs to run sharding with torch.distributed collective operations such as broadcast and scatter. The offloading is also applied to other tensor manipulations such as type conversions.

The above two design can be naturally extended to utilize device-to-device data transfers with GPU Direct Storage. The technology helps to minimize copy overheads from NVMe SSDs to GPU memory with host CPU and memory bypassed.

Dependencies

We currently test fastsafetensors only with python 3.11, pytorch 2.1, and cuda-12. Note: when using different versions of pytorch, you may require changes on build environments for libpytorch since it seems slightly changing ABIs.

Install from PyPi (TBD)

pip install fastsfaetensors

Local installation

Prerequisites: Install torch, cuda, and numa headers

make install

Package build

Prerequisites: Install Docker (libtorch 2.1, cuda, and numa are automatically pulled)

make dist

Unit tests

make install-test # install stub'ed fastsafetensors without torch, cuda, and numa
make unittest

Sample code

see example/load.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastsafetensors-0.1.0.tar.gz (28.4 kB view details)

Uploaded Source

Built Distribution

fastsafetensors-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.34+ x86-64

File details

Details for the file fastsafetensors-0.1.0.tar.gz.

File metadata

  • Download URL: fastsafetensors-0.1.0.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.4

File hashes

Hashes for fastsafetensors-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7620c8ba0e271aaabd9cc3e522ea9fba7a854e9c073618f31c7df5074c4a2260
MD5 b2211592d235e6ea2e96d87d03280711
BLAKE2b-256 90d85be2279634d37ce2beaf1ea2c6b0f9f2dcf7459443d62b803aa77e937fdb

See more details on using hashes here.

File details

Details for the file fastsafetensors-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fastsafetensors-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1b1bcad65b1595078538ec2c5af9b7e17874910092a3f54234ee50a3f917440b
MD5 9df1e8f154423372f4ee8c4e81a15d4c
BLAKE2b-256 dd18ee9b4a52342c583de3163405be9701bde8bb3c77533ffccea2ecda3da990

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page