Efficient caching of Hugging Face models using PyTorch serialisation
Project description
HF Torch Cache
Efficient caching layer for Hugging Face models using PyTorch serialisation. Accelerate model initialisation while reducing disk redundancy by converting native Hugging Face checkpoints to optimised PyTorch format.
Features
- 🚀 Faster Initialisation: Skip Hugging Face's config reloading on subsequent loads
- 💾 Disk Efficiency: Eliminate duplicate storage of model artifacts
- 🔍 Auto Model Detection: Dynamically selects appropriate model class from config
- 🧹 Cache Management: Optional cleanup of original Hugging Face cache artifacts
- 🔒 Safety Controls: Configurable weights-only loading for untrusted sources
Installation
pip install hftorchcache
Usage
Simply pass a model name (the HuggingFace repo ID) and the model will be loaded and saved as a torch
.pt file in ~/.cache/hftc/.
from hftc import HFTorchCache
MODEL_NAME = "unsloth/DeepSeek-R1-Distill-Qwen-7B-bnb-4bit"
# Initialise cache manager
cache = HFTorchCache()
# Load model with automatic class detection
model, tokenizer = cache.load(MODEL_NAME)
print(model.device) # "cuda" if GPU available
If it's already been cached, it'll load instantly
There are also options to:
- Delete the original HuggingFace model cache directory (to avoid duplicates)
- Only load from the local HuggingFace cache
- Specify a value to pass as the device (
torch.loaddefaults to using GPU if available) - Loading weights only (but this defeats the purpose of this method, which is to load the entire variable fast, like pickle)
- Specify a particular model class by name or by the type itself (it should detect the model automatically)
cache = HFTorchCache(
cache_dir="/custom/cache/path", # Default: ~/.cache/hftc
cleanup_original=True # Auto-delete original HF cache
)
# Load with explicit device placement and safety controls
model, tokenizer = cache.load(
"unsloth/DeepSeek-R1-Distill-Qwen-7B-bnb-4bit",
model_cls="AutoModelForCausalLM", # Explicit class specification
tokenizer_cls="AutoTokenizer",
map_location=torch.device("cuda:0"),
weights_only=False, # Enable for untrusted sources
local_only=True # Prevent HF Hub fallback
# **model_kwargs # Would be passed to `from_pretrained`
)
Note that you may need additional packages (e.g. bitsandbytes) to load cached models. Accelerate
is a dependency of this package, and low_cpu_mem_usage is always passed as True to from_pretrained.
Cleanup utilities
You can also use the internal _cleanup_hf_cache method to delete the entire model directories of
models you're done with, without trying to load them (as long as HuggingFace can find a snapshot).
cache._cleanup_hf_cache("Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8")
API Reference
HFTorchCache
| Parameter | Type | Default | Description |
|---|---|---|---|
cache_dir |
str | ~/.cache/hftc |
Custom cache directory |
cleanup_original |
bool | True | Remove original HF cache after conversion |
load()
| Parameter | Type | Default | Description |
|---|---|---|---|
model_name |
str | Required | HF model identifier |
model_cls |
str/type | "auto" | Model class specification |
tokenizer_cls |
str/type | "auto" | Tokenizer class specification |
map_location |
str/device | None | Torch device placement |
weights_only |
bool | False | Safe loading for untrusted sources |
local_only |
bool | False | Disable HF hub fallback |
Implementation Notes
- First-Run Behavior: Initial load converts HF checkpoint to optimized PyTorch format
- Subsequent Loads: Directly loads serialised PyTorch artifacts (3-5x faster)
- Device Management: Specify
map_locationto control device placement - Security: Use
weights_only=Truewhen loading untrusted models
License
MIT License - See LICENSE for details
Note: This project is not affiliated with Hugging Face. Use with caution in production environments. Always verify model sources when using weights_only=False.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hftorchcache-0.0.2.tar.gz.
File metadata
- Download URL: hftorchcache-0.0.2.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.22.3 CPython/3.10.16 Linux/6.8.0-51-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63630060fd1856c2bc8e98982ed1f5cc7c0e09af1f8392adcee496d9d286c032
|
|
| MD5 |
0dc56dbd985d9c4b911c0f1bd6c3a613
|
|
| BLAKE2b-256 |
98dd6c6e7d3a59c10f3f1c546750bc939447528de4b4c4599af63d97f995dc39
|
File details
Details for the file hftorchcache-0.0.2-py3-none-any.whl.
File metadata
- Download URL: hftorchcache-0.0.2-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.22.3 CPython/3.10.16 Linux/6.8.0-51-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
539f9a20aad138014b45f39a1cb0ec48e98a5e713fabae2bf4a0b6b405aa3f7f
|
|
| MD5 |
73c3f36e128f3b9ce3ce2c612c5fc3db
|
|
| BLAKE2b-256 |
4ba47eeea1a06885936294cb9d2f4d7ff8a4bd5529d81dece0282e5df0afa3d3
|