A unified interface for memory efficient per tensor loading of safetensors files as raw bytes from offset, handling CPU/GPU pinned transfers, and converting between tensors and dicts.
Project description
unifiedefficientloader
A unified interface for loading safetensors, handling CPU/GPU pinned transfers, and converting between tensors and dicts.
Installation
You can install this package via pip. Since it heavily relies on torch and safetensors but doesn't strictly force them as hard dependencies for package building/installation, make sure you have them installed in your environment:
pip install unifiedefficientloader
pip install torch safetensors tqdm
Usage
Unified Safetensors Loader
from unifiedefficientloader import UnifiedSafetensorsLoader
# Standard mode (preload all)
with UnifiedSafetensorsLoader("model.safetensors", low_memory=False) as loader:
tensor = loader.get_tensor("weight_name")
# Low memory mode (streaming)
with UnifiedSafetensorsLoader("model.safetensors", low_memory=True) as loader:
for key in loader.keys():
tensor = loader.get_tensor(key)
# Process tensor...
loader.mark_processed(key) # Frees memory
Loading Specific Tensors Dynamically (Header Analysis)
You can analyze the file's header without loading the entire multi-gigabyte safetensors file into memory. This allows you to locate specific data (like embedded JSON dictionaries stored as uint8 tensors) and load only those specific tensors directly from their file offsets.
from unifiedefficientloader import UnifiedSafetensorsLoader, tensor_to_dict
with UnifiedSafetensorsLoader("model.safetensors", low_memory=True) as loader:
# 1. Analyze the header metadata without loading any tensors
# loader._header contains the full safetensors header directory
uint8_tensor_keys = [
key for key, info in loader._header.items()
if isinstance(info, dict) and info.get("dtype") == "U8"
]
# 2. Load ONLY those specific tensors using their keys
for key in uint8_tensor_keys:
# get_tensor dynamically reads only the bytes for this tensor
# based on the offsets found in the header
loaded_tensor = loader.get_tensor(key)
# 3. Decode the uint8 tensor back into a Python dictionary
extracted_dict = tensor_to_dict(loaded_tensor)
print(f"Decoded {key}:", extracted_dict)
Optimized Asynchronous Streaming via ThreadPoolExecutor
For maximum I/O throughput while maintaining strict memory backpressure, use async_stream. This utilizes a ThreadPoolExecutor for background disk reading and a bounded queue to prevent memory exhaustion. By setting pin_memory=True, memory pinning is performed sequentially in the main thread to avoid OS-level lock contention and preserve high DMA transfer speeds.
from unifiedefficientloader import UnifiedSafetensorsLoader, transfer_to_gpu_pinned
with UnifiedSafetensorsLoader("model.safetensors", low_memory=True) as loader:
keys_to_load = loader.keys()
# Create the continuous streaming generator
# prefetch_batches controls how many batches to buffer in memory
stream = loader.async_stream(
keys_to_load,
batch_size=8,
prefetch_batches=2,
pin_memory=True
)
# Iterate directly over the generator
for batch in stream:
for key, pinned_tensor in batch:
# Transfer directly to GPU via DMA (pinning is already done)
gpu_tensor = transfer_to_gpu_pinned(pinned_tensor, device="cuda")
# ... process gpu_tensor ...
loader.mark_processed(key)
Direct-to-GPU Streaming (Zero-Copy)
For the absolute fastest loading times on CUDA devices, use the direct_gpu=True flag. This creates a pipeline that pre-allocates pinned memory pools and GPU memory slabs. Tensors are loaded from disk directly into pinned buffers, and immediately asynchronously copied to the GPU using CUDA streams, hiding the PCIe transfer latency completely behind the disk I/O.
from unifiedefficientloader import UnifiedSafetensorsLoader
with UnifiedSafetensorsLoader("model.safetensors", low_memory=True, direct_gpu=True) as loader:
keys_to_load = loader.keys()
# async_stream will automatically coordinate disk -> pinned buffer -> GPU slab -> tensor header
stream = loader.async_stream(
keys_to_load,
batch_size=8,
prefetch_batches=2,
direct_gpu=True # optional here since we passed it in __init__
)
for batch in stream:
for key, gpu_tensor in batch:
# gpu_tensor is already on the GPU!
assert gpu_tensor.device.type == "cuda"
# ... process gpu_tensor ...
loader.mark_processed(key)
Tensor/Dict Conversion
from unifiedefficientloader import dict_to_tensor, tensor_to_dict
my_dict = {"param": 1.0, "name": "test"}
tensor = dict_to_tensor(my_dict)
recovered_dict = tensor_to_dict(tensor)
Pinned Memory Transfers
import torch
from unifiedefficientloader import transfer_to_gpu_pinned
tensor = torch.randn(100, 100)
# Transfers using pinned memory if CUDA is available, otherwise falls back gracefully
gpu_tensor = transfer_to_gpu_pinned(tensor, device="cuda:0")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unifiedefficientloader-0.2.3.tar.gz.
File metadata
- Download URL: unifiedefficientloader-0.2.3.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8cf530b8dbd97cc99742b3a0a22ad562083b542045aeba587f2041899019b6d3
|
|
| MD5 |
9ff9d8c022a9c1c22777ce96e6baaddf
|
|
| BLAKE2b-256 |
ca32b5d69d5906682b3836236c469697cda2df6bc70c1d7096f2a03194fc6546
|
File details
Details for the file unifiedefficientloader-0.2.3-py3-none-any.whl.
File metadata
- Download URL: unifiedefficientloader-0.2.3-py3-none-any.whl
- Upload date:
- Size: 15.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e4c8fc0461266b041111aa830b630fc7f87ccdaeff0c1739e8ca6eeb3e50ce3
|
|
| MD5 |
641b176fa0aa3fb03c890a53854e8ec6
|
|
| BLAKE2b-256 |
4af7d31e40abb22290d716cab8b6960a651cbe6f16751824c845b1ec830c3d24
|