Skip to main content

High-performance safetensors model loader

Project description

fastsafetensors is an efficient safetensors model loader. We introduced three major features to optimize model loading performance:

  1. Batched, lazy tensor instantiations
  2. GPU offloading for sharding, type conversions, and device pointer alignment.
  3. GPU Direct Storage enablement for file loading from storage to GPU memory

A major design difference from the original safetensors file loader is NOT to use mmap. It loads tensors on-demand with mmap'ed files, but unfortunately, it cannot fully utilize high-throughput I/O such as NVMe SSDs. So, we asynchronously transfer files in parallel to saturate storage throughput. Then, fastsafetensors lazily instantiates tensors at GPU device memory with DLPack.

Another design change is to offload sharding and other manipulations on tensors to GPUs. The original loader provides slicing for sharding at user programs before copying to device memory. However, it incurrs high CPU usages for host memory accesses. So, we introduce a special APIs to run sharding with torch.distributed collective operations such as broadcast and scatter. The offloading is also applied to other tensor manipulations such as type conversions.

The above two design can be naturally extended to utilize device-to-device data transfers with GPU Direct Storage. The technology helps to minimize copy overheads from NVMe SSDs to GPU memory with host CPU and memory bypassed.

Dependencies

We currently test fastsafetensors only with python 3.11, pytorch 2.1, and cuda-12. Note: when using different versions of pytorch, you may require changes on build environments for libpytorch since it seems slightly changing ABIs.

Install from PyPi (TBD)

pip install fastsfaetensors

Local installation

Prerequisites: Install torch, cuda, and numa headers

make install

Package build

Prerequisites: Install Docker (libtorch 2.1, cuda, and numa are automatically pulled)

make dist

Unit tests

make install-test # install stub'ed fastsafetensors without torch, cuda, and numa
make unittest

Sample code

see example/load.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastsafetensors-0.1.1.tar.gz (28.4 kB view details)

Uploaded Source

Built Distributions

fastsafetensors-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.34+ x86-64

fastsafetensors-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.34+ x86-64

File details

Details for the file fastsafetensors-0.1.1.tar.gz.

File metadata

  • Download URL: fastsafetensors-0.1.1.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for fastsafetensors-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1fb730884aeb4ce05d6d5ca64794f16de45a4d3cc18e7fd99b48a0db02338de4
MD5 a3b765e54228e53f07706435466d029c
BLAKE2b-256 28ce474be16485ffcae08f09628a83cc121109136158d35f1448e44a5db3e79d

See more details on using hashes here.

File details

Details for the file fastsafetensors-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fastsafetensors-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e36776b896a477e93fcec7a03d9cab27cfaa70317984e3c72e261d2716ca4737
MD5 bb46d9cd69a8299e7b0981dda4eee93f
BLAKE2b-256 4ea0693698a6e812f32c5fd05aa5adc048e723cb3d6b2fc9324f44d2649361c7

See more details on using hashes here.

File details

Details for the file fastsafetensors-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fastsafetensors-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 f3726f4c0b98f891bb482565713aae5f9f418921f7fab4ed710d0f4ecfd87384
MD5 50c8ead469a6dd0172f3e60bec46d690
BLAKE2b-256 34272343a9f8bf843a801dfa90fae3a3d457ee9c56e1d6ae0f1015685aa03ea2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page