Skip to main content

A CLI to estimate inference memory requirements for Hugging Face models, written in Python.

Project description


[!WARNING] hf-mem is still experimental and therefore subject to major changes across releases, so please keep in mind that breaking changes may occur until v1.0.0.

hf-mem is a CLI to estimate inference memory requirements for Hugging Face models, written in Python. hf-mem is lightweight, only depends on httpx, as it pulls the Safetensors metadata via HTTP Range requests. It's recommended to run with uv for a better experience.

hf-mem lets you estimate the inference requirements to run any model from the Hugging Face Hub, including Transformers, Diffusers and Sentence Transformers models, as well as any model that contains Safetensors compatible weights.

Read more information about hf-mem in this short-form post.

Usage

Transformers

uvx hf-mem --model-id MiniMaxAI/MiniMax-M2

Diffusers

uvx hf-mem --model-id Qwen/Qwen-Image

Sentence Transformers

uvx hf-mem --model-id google/embeddinggemma-300m

Experimental

By enabling the --experimental flag, you can enable the KV Cache memory estimation for LLMs (...ForCausalLM) and VLMs (...ForConditionalGeneration), even including a custom --max-model-len (defaults to the config.json default), --batch-size (defaults to 1), and the --kv-cache-dtype (defaults to auto which means it uses the default data type set in config.json under torch_dtype or dtype, or rather from quantization_config when applicable).

uvx hf-mem --model-id MiniMaxAI/MiniMax-M2 --experimental

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hf_mem-0.4.1.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hf_mem-0.4.1-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file hf_mem-0.4.1.tar.gz.

File metadata

  • Download URL: hf_mem-0.4.1.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hf_mem-0.4.1.tar.gz
Algorithm Hash digest
SHA256 4afdbb6d5c6618db325f5303bfd4cee06ef5f4a8bdc34840d42f4edb98e33f7a
MD5 fd60956633d394d6693d7608eac7a28e
BLAKE2b-256 7381ddf3c148bca6008bcbd8f50ecfdccee2d65e2270d715a1fd13b1b4a60d49

See more details on using hashes here.

File details

Details for the file hf_mem-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: hf_mem-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hf_mem-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 36d625ef90d3b6df951231f87668415b9c59446b46fcf157275818493251f6ac
MD5 1ac8eb40caa47e1c49285cb2a038119d
BLAKE2b-256 56c25f73e77e6a5f179e57f566aac3486f737a96ba8f46d330fb9ffa9860a10e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page