2 projects
unifiedefficientloader
A unified interface for memory efficient per tensor loading of safetensors files as raw bytes from offset, handling CPU/GPU pinned transfers, and converting between tensors and dicts.
convert-to-quant
Convert safetensors weights to quantized formats (FP8, INT8) with learned rounding optimization