High-performance safetensors model loader
Project description
fastsafetensors is an efficient safetensors model loader. We introduced three major features to optimize model loading performance:
- Batched, lazy tensor instantiations
- GPU offloading for sharding, type conversions, and device pointer alignment.
- GPU Direct Storage enablement for file loading from storage to GPU memory
A major design difference from the original safetensors file loader is NOT to use mmap
.
It loads tensors on-demand with mmap'ed files,
but unfortunately, it cannot fully utilize high-throughput I/O such as NVMe SSDs.
So, we asynchronously transfer files in parallel to saturate storage throughput.
Then, fastsafetensors lazily instantiates tensors at GPU device memory with DLPack.
Another design change is to offload sharding and other manipulations on tensors to GPUs.
The original loader provides slicing for sharding at user programs before copying to device memory. However, it incurrs high CPU usages for host memory accesses.
So, we introduce a special APIs to run sharding with torch.distributed
collective operations such as broadcast
and scatter
.
The offloading is also applied to other tensor manipulations such as type conversions.
The above two design can be naturally extended to utilize device-to-device data transfers with GPU Direct Storage. The technology helps to minimize copy overheads from NVMe SSDs to GPU memory with host CPU and memory bypassed.
Dependencies
We currently test fastsafetensors only with python 3.11, pytorch 2.1, and cuda-12. Note: when using different versions of pytorch, you may require changes on build environments for libpytorch since it seems slightly changing ABIs.
Install from PyPi (TBD)
pip install fastsfaetensors
Local installation
Prerequisites: Install torch, cuda, and numa headers
make install
Package build
Prerequisites: Install Docker (libtorch 2.1, cuda, and numa are automatically pulled)
make dist
Unit tests
make install-test # install stub'ed fastsafetensors without torch, cuda, and numa
make unittest
Sample code
see example/load.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file fastsafetensors-0.1.1.tar.gz
.
File metadata
- Download URL: fastsafetensors-0.1.1.tar.gz
- Upload date:
- Size: 28.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fb730884aeb4ce05d6d5ca64794f16de45a4d3cc18e7fd99b48a0db02338de4 |
|
MD5 | a3b765e54228e53f07706435466d029c |
|
BLAKE2b-256 | 28ce474be16485ffcae08f09628a83cc121109136158d35f1448e44a5db3e79d |
File details
Details for the file fastsafetensors-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl
.
File metadata
- Download URL: fastsafetensors-0.1.1-cp311-cp311-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e36776b896a477e93fcec7a03d9cab27cfaa70317984e3c72e261d2716ca4737 |
|
MD5 | bb46d9cd69a8299e7b0981dda4eee93f |
|
BLAKE2b-256 | 4ea0693698a6e812f32c5fd05aa5adc048e723cb3d6b2fc9324f44d2649361c7 |
File details
Details for the file fastsafetensors-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl
.
File metadata
- Download URL: fastsafetensors-0.1.1-cp310-cp310-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3726f4c0b98f891bb482565713aae5f9f418921f7fab4ed710d0f4ecfd87384 |
|
MD5 | 50c8ead469a6dd0172f3e60bec46d690 |
|
BLAKE2b-256 | 34272343a9f8bf843a801dfa90fae3a3d457ee9c56e1d6ae0f1015685aa03ea2 |