Skip to main content

A CLI to estimate inference memory requirements for Hugging Face models, written in Python.

Project description


[!WARNING] hf-mem is still experimental and therefore subject to major changes across releases, so please keep in mind that breaking changes may occur until v1.0.0.

hf-mem is a CLI to estimate inference memory requirements for Hugging Face models, written in Python. hf-mem is lightweight, only depends on httpx, as it pulls the Safetensors metadata via HTTP Range requests. It's recommended to run with uv for a better experience.

hf-mem lets you estimate the inference requirements to run any model from the Hugging Face Hub, including Transformers, Diffusers and Sentence Transformers models, as well as any model that contains Safetensors compatible weights.

Read more information about hf-mem in this short-form post.

Usage

Transformers

uvx hf-mem --model-id MiniMaxAI/MiniMax-M2

Diffusers

uvx hf-mem --model-id Qwen/Qwen-Image

Sentence Transformers

uvx hf-mem --model-id google/embeddinggemma-300m

Experimental

By enabling the --experimental flag, you can enable the KV Cache memory estimation for LLMs (...ForCausalLM) and VLMs (...ForConditionalGeneration), even including a custom --max-model-len (defaults to the config.json default), --batch-size (defaults to 1), and the --kv-cache-dtype (defaults to auto which means it uses the default data type set in config.json under torch_dtype or dtype, or rather from quantization_config when applicable).

uvx hf-mem --model-id MiniMaxAI/MiniMax-M2 --experimental

(Optional) Agent Skills

Optionally, you can add hf-mem as an agent skill, which allows the underlying coding agent to discover and use it when provided as a SKILL.md.

More information can be found at Anthropic Agent Skills and how to use them.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hf_mem-0.4.4.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hf_mem-0.4.4-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file hf_mem-0.4.4.tar.gz.

File metadata

  • Download URL: hf_mem-0.4.4.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.29 {"installer":{"name":"uv","version":"0.9.29","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hf_mem-0.4.4.tar.gz
Algorithm Hash digest
SHA256 d91bfdc0e33b248c7f533b47f7adc242906311b4e5709f15a9e28c7ec718c3a1
MD5 8a0147979dacb420fc77dd73b0e0d99e
BLAKE2b-256 51a0857e00bdb2da63bff0de02764f72ee6a5b10e33f641603d2121016fa086f

See more details on using hashes here.

File details

Details for the file hf_mem-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: hf_mem-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.29 {"installer":{"name":"uv","version":"0.9.29","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hf_mem-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6b45d51ab029529c08cc26f7725756f32ff7818f9b9bbe21840472e1a6930a61
MD5 0608ea9be7c145fd5b4f898ee1cfa1c8
BLAKE2b-256 4a3fc490a0ac909935e3f7b423f7609871974c2f2d3531756193b5c0eac40070

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page