Skip to main content

Efficient caching of Hugging Face models using PyTorch serialisation

Project description

HF Torch Cache

License: MIT Python 3.10+

Efficient caching layer for Hugging Face models using PyTorch serialisation. Accelerate model initialisation while reducing disk redundancy by converting native Hugging Face checkpoints to optimised PyTorch format.

Features

  • 🚀 Faster Initialisation: Skip Hugging Face's config reloading on subsequent loads
  • 💾 Disk Efficiency: Eliminate duplicate storage of model artifacts
  • 🔍 Auto Model Detection: Dynamically selects appropriate model class from config
  • 🧹 Cache Management: Optional cleanup of original Hugging Face cache artifacts
  • 🔒 Safety Controls: Configurable weights-only loading for untrusted sources

Installation

pip install hftorchcache

Usage

Simply pass a model name (the HuggingFace repo ID) and the model will be loaded and saved as a torch .pt file in ~/.cache/hftc/.

from hftc import HFTorchCache

MODEL_NAME = "unsloth/DeepSeek-R1-Distill-Qwen-7B-bnb-4bit"

# Initialise cache manager
cache = HFTorchCache()

# Load model with automatic class detection
model, tokenizer = cache.load(MODEL_NAME)

print(model.device) # "cuda" if GPU available

If it's already been cached, it'll load instantly

There are also options to:

  • Delete the original HuggingFace model cache directory (to avoid duplicates)
  • Only load from the local HuggingFace cache
  • Specify a value to pass as the device (torch.load defaults to using GPU if available)
  • Loading weights only (but this defeats the purpose of this method, which is to load the entire variable fast, like pickle)
  • Specify a particular model class by name or by the type itself (it should detect the model automatically)
cache = HFTorchCache(
    cache_dir="/custom/cache/path",  # Default: ~/.cache/hftc
    cleanup_original=True            # Auto-delete original HF cache
)

# Load with explicit device placement and safety controls
model, tokenizer = cache.load(
    "unsloth/DeepSeek-R1-Distill-Qwen-7B-bnb-4bit",
    model_cls="AutoModelForCausalLM",    # Explicit class specification
    tokenizer_cls="AutoTokenizer",
    map_location=torch.device("cuda:0"),
    weights_only=False,                  # Enable for untrusted sources
    local_only=True                      # Prevent HF Hub fallback
    # **model_kwargs                     # Would be passed to `from_pretrained`
)

Note that you may need additional packages (e.g. bitsandbytes) to load cached models. Accelerate is a dependency of this package, and low_cpu_mem_usage is always passed as True to from_pretrained.

Cleanup utilities

You can also use the internal _cleanup_hf_cache method to delete the entire model directories of models you're done with, without trying to load them (as long as HuggingFace can find a snapshot).

cache._cleanup_hf_cache("Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8")

API Reference

HFTorchCache

Parameter Type Default Description
cache_dir str ~/.cache/hftc Custom cache directory
cleanup_original bool True Remove original HF cache after conversion

load()

Parameter Type Default Description
model_name str Required HF model identifier
model_cls str/type "auto" Model class specification
tokenizer_cls str/type "auto" Tokenizer class specification
map_location str/device None Torch device placement
weights_only bool False Safe loading for untrusted sources
local_only bool False Disable HF hub fallback

Implementation Notes

  1. First-Run Behavior: Initial load converts HF checkpoint to optimized PyTorch format
  2. Subsequent Loads: Directly loads serialised PyTorch artifacts (3-5x faster)
  3. Device Management: Specify map_location to control device placement
  4. Security: Use weights_only=True when loading untrusted models

License

MIT License - See LICENSE for details


Note: This project is not affiliated with Hugging Face. Use with caution in production environments. Always verify model sources when using weights_only=False.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hftorchcache-0.0.2.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hftorchcache-0.0.2-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file hftorchcache-0.0.2.tar.gz.

File metadata

  • Download URL: hftorchcache-0.0.2.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.10.16 Linux/6.8.0-51-generic

File hashes

Hashes for hftorchcache-0.0.2.tar.gz
Algorithm Hash digest
SHA256 63630060fd1856c2bc8e98982ed1f5cc7c0e09af1f8392adcee496d9d286c032
MD5 0dc56dbd985d9c4b911c0f1bd6c3a613
BLAKE2b-256 98dd6c6e7d3a59c10f3f1c546750bc939447528de4b4c4599af63d97f995dc39

See more details on using hashes here.

File details

Details for the file hftorchcache-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: hftorchcache-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.10.16 Linux/6.8.0-51-generic

File hashes

Hashes for hftorchcache-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 539f9a20aad138014b45f39a1cb0ec48e98a5e713fabae2bf4a0b6b405aa3f7f
MD5 73c3f36e128f3b9ce3ce2c612c5fc3db
BLAKE2b-256 4ba47eeea1a06885936294cb9d2f4d7ff8a4bd5529d81dece0282e5df0afa3d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page