Skip to main content

A unified interface for memory efficient per tensor loading of safetensors files as raw bytes from offset, handling CPU/GPU pinned transfers, and converting between tensors and dicts.

Project description

unifiedefficientloader

A unified interface for loading safetensors, handling CPU/GPU pinned transfers, and converting between tensors and dicts.

Installation

You can install this package via pip. Since it heavily relies on torch and safetensors but doesn't strictly force them as hard dependencies for package building/installation, make sure you have them installed in your environment:

pip install unifiedefficientloader
pip install torch safetensors tqdm

Usage

Unified Safetensors Loader

from unifiedefficientloader import UnifiedSafetensorsLoader

# Standard mode (preload all)
with UnifiedSafetensorsLoader("model.safetensors", low_memory=False) as loader:
    tensor = loader.get_tensor("weight_name")

# Low memory mode (streaming)
with UnifiedSafetensorsLoader("model.safetensors", low_memory=True) as loader:
    for key in loader.keys():
        tensor = loader.get_tensor(key)
        # Process tensor...
        loader.mark_processed(key) # Frees memory

Loading Specific Tensors Dynamically (Header Analysis)

You can analyze the file's header without loading the entire multi-gigabyte safetensors file into memory. This allows you to locate specific data (like embedded JSON dictionaries stored as uint8 tensors) and load only those specific tensors directly from their file offsets.

from unifiedefficientloader import UnifiedSafetensorsLoader, tensor_to_dict

with UnifiedSafetensorsLoader("model.safetensors", low_memory=True) as loader:
    # 1. Analyze the header metadata without loading any tensors
    # loader._header contains the full safetensors header directory
    uint8_tensor_keys = [
        key for key, info in loader._header.items()
        if isinstance(info, dict) and info.get("dtype") == "U8"
    ]
    
    # 2. Load ONLY those specific tensors using their keys
    for key in uint8_tensor_keys:
        # get_tensor dynamically reads only the bytes for this tensor 
        # based on the offsets found in the header
        loaded_tensor = loader.get_tensor(key)
        
        # 3. Decode the uint8 tensor back into a Python dictionary
        extracted_dict = tensor_to_dict(loaded_tensor)
        print(f"Decoded {key}:", extracted_dict)

Optimized Asynchronous Streaming via ThreadPoolExecutor

For maximum I/O throughput while maintaining strict memory backpressure, use async_stream. This utilizes a ThreadPoolExecutor for background disk reading and a bounded queue to prevent memory exhaustion. By setting pin_memory=True, memory pinning is performed sequentially in the main thread to avoid OS-level lock contention and preserve high DMA transfer speeds.

from unifiedefficientloader import UnifiedSafetensorsLoader, transfer_to_gpu_pinned

with UnifiedSafetensorsLoader("model.safetensors", low_memory=True) as loader:
    keys_to_load = loader.keys()
    
    # Create the continuous streaming generator
    # prefetch_batches controls how many batches to buffer in memory
    stream = loader.async_stream(
        keys_to_load, 
        batch_size=8, 
        prefetch_batches=2, 
        pin_memory=True
    )
    
    # Iterate directly over the generator
    for batch in stream:
        for key, pinned_tensor in batch:
            # Transfer directly to GPU via DMA (pinning is already done)
            gpu_tensor = transfer_to_gpu_pinned(pinned_tensor, device="cuda")
            
            # ... process gpu_tensor ...
            loader.mark_processed(key)

Tensor/Dict Conversion

from unifiedefficientloader import dict_to_tensor, tensor_to_dict

my_dict = {"param": 1.0, "name": "test"}
tensor = dict_to_tensor(my_dict)
recovered_dict = tensor_to_dict(tensor)

Pinned Memory Transfers

import torch
from unifiedefficientloader import transfer_to_gpu_pinned

tensor = torch.randn(100, 100)
# Transfers using pinned memory if CUDA is available, otherwise falls back gracefully
gpu_tensor = transfer_to_gpu_pinned(tensor, device="cuda:0")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unifiedefficientloader-0.2.1.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unifiedefficientloader-0.2.1-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file unifiedefficientloader-0.2.1.tar.gz.

File metadata

  • Download URL: unifiedefficientloader-0.2.1.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for unifiedefficientloader-0.2.1.tar.gz
Algorithm Hash digest
SHA256 ad80751a3b4c6697dec76721cdcee53c57cf0761e80b79d825918ec958f64fb4
MD5 a807c8362094e9b66fdcabd97886c9a0
BLAKE2b-256 2b3dce6f00c09416637ad6807ba41234edf7ab7510688dca6e94fc35ea41cac9

See more details on using hashes here.

File details

Details for the file unifiedefficientloader-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for unifiedefficientloader-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c3b7108eff7cacecdc03d0377878da5a3061a7781a5c4aeba87dbed862fc5cd0
MD5 9c6e2d70b038b42a9e06045464c5abf1
BLAKE2b-256 e313f45d75d0465be4420bc67d3f96a9d57ade53aec167a9162f348eeb3c7671

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page