Skip to main content

A CLI to estimate inference memory requirements for Hugging Face models, written in Python.

Project description


[!WARNING] hf-mem is still experimental and therefore subject to major changes across releases, so please keep in mind that breaking changes may occur until v1.0.0.

hf-mem is a CLI to estimate inference memory requirements for Hugging Face models, written in Python. hf-mem is lightweight, only depends on httpx, as it pulls the Safetensors metadata via HTTP Range requests. It's recommended to run with uv for a better experience.

hf-mem lets you estimate the inference requirements to run any model from the Hugging Face Hub, including Transformers, Diffusers and Sentence Transformers models, as well as any model that contains Safetensors compatible weights.

Read more information about hf-mem in this short-form post.

Usage

Transformers

uvx hf-mem --model-id MiniMaxAI/MiniMax-M2

Diffusers

uvx hf-mem --model-id Qwen/Qwen-Image

Sentence Transformers

uvx hf-mem --model-id google/embeddinggemma-300m

Experimental

By enabling the --experimental flag, you can enable the KV Cache memory estimation for LLMs (...ForCausalLM) and VLMs (...ForConditionalGeneration), even including a custom --max-model-len (defaults to the config.json default), --batch-size (defaults to 1), and the --kv-cache-dtype (defaults to auto which means it uses the default data type set in config.json under torch_dtype or dtype, or rather from quantization_config when applicable).

uvx hf-mem --model-id MiniMaxAI/MiniMax-M2 --experimental

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hf_mem-0.4.2.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hf_mem-0.4.2-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file hf_mem-0.4.2.tar.gz.

File metadata

  • Download URL: hf_mem-0.4.2.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hf_mem-0.4.2.tar.gz
Algorithm Hash digest
SHA256 4110cfc13e951a1d6d07124556c94140299714fc22c572796a678c4d5838a633
MD5 54172b89f655169a68d96ee1ee03dcc1
BLAKE2b-256 71334409a0d3408ea13786a5b4d081d0fbf7118006e24f499fb94b7aec08f8de

See more details on using hashes here.

File details

Details for the file hf_mem-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: hf_mem-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hf_mem-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8ec7097e4665e522f30fd604ea8cb655a9418da7c58eba743d824e536be62a0d
MD5 cdd7374184c303dd07f438f4324bef76
BLAKE2b-256 e835aaf050694bda4ad75f5a805c9af0a4b39f21299adfb1d69b4d757618affa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page