Skip to main content

LLM model advisor for NVIDIA Jetson and DGX Spark unified-memory devices

Project description

jetfit

LLM model advisor for NVIDIA Jetson and DGX Spark unified-memory devices.

Detects your Jetson hardware, scores LLM models across quality, speed, and memory fit, and tells you exactly which quantization level will run well on your device.

Ships with an interactive TUI (default) and a CLI mode. Supports hardware simulation, calibration, compare view, and plan mode.


Install

pip install jetfit

or with uv:

uv tool install jetfit   # install globally
uvx jetfit               # run without installing

Usage

TUI (default)

jetfit

Launches the interactive terminal UI. The top bar shows your detected platform, available RAM, accelerator type, and minimum JetPack version. Models are listed in a scrollable table sorted by params, with composite score, estimated tok/s, best quantization, memory %, and fit grade per row.

Normal mode

Key Action
j / k Navigate models
g Jump to top / bottom (toggle)
Enter Open detail view
p Open plan mode
m Mark / unmark model for compare
c Open compare view (marked vs selected)
x Clear all marks
v Enter visual select mode
/ Focus search bar
r Cycle provider (family) filter
b Cycle size filter
f Cycle fit filter
s Cycle sort column
- Flip sort direction
F Open advanced filter popup
S Open hardware simulation
A Open advanced config (tune efficiency)
t Cycle theme
h Open help
q Quit

Visual mode (v)

Select a contiguous range of models for bulk comparison.

Key Action
j / k Extend selection
m Mark selected model
c Open compare view for selection
v / Esc Exit visual mode

Detail view (Enter)

Shows full quant ladder for the selected model — size, KV cache, total memory, memory %, estimated tok/s, and fit grade for every quantization level. Navigate rows with j/k; the left panel updates to show specs for the highlighted quant.

Plan mode (p)

Estimates hardware requirements for a model config. Edit Context, Quant, and Target TPS fields. Shows minimum and recommended RAM, feasibility per run path, and upgrade deltas.

Key Action
Tab / j / k Move between fields
Type Edit current field
Backspace Remove characters
Esc / q Exit plan mode

Compare view (c)

Side-by-side comparison of marked models. Rows are attributes (Score, tok/s, Fit, Mem%, Params, Quant, Context); columns are models. Best values are highlighted.

Hardware simulation (S)

Override the active hardware profile to preview recommendations for any supported Jetson or DGX Spark device without leaving the TUI. The system bar shows (sim) when active.

Advanced config (A)

Tune the efficiency factor used for tok/s estimation. Changes apply immediately and all scores are recalculated.

Advanced filter (F)

Set numeric bounds on parameter count and memory utilization %.


CLI

# Detect hardware
jetfit system

# Detect hardware (JSON)
jetfit system --json

# Recommend models for current hardware
jetfit recommend

# Filter by model name
jetfit recommend --model llama

# Fix a specific quant level
jetfit recommend --quant Q4_K_M

# Show all quant levels per model
jetfit recommend --all-quants

# Override available memory
jetfit recommend --available-gb 12.0

# Target a specific hardware profile
jetfit recommend --profile jetson_agx_orin_64gb

# Minimum tok/s threshold
jetfit recommend --min-tps 5.0

# JSON output
jetfit recommend --json

Supported Hardware

Device RAM Bandwidth Accelerator JetPack
Jetson Nano 4 GB 25.6 GB/s DLA+CUDA 4.x
Jetson TX2 NX 4 GB 51.2 GB/s CUDA 5.x
Jetson TX2 4GB 4 GB 51.2 GB/s CUDA 4.x
Jetson TX2 8 GB 59.7 GB/s CUDA 4.x
Jetson TX2i 8 GB 51.2 GB/s CUDA 4.x
Jetson Xavier NX 8GB 8 GB 59.7 GB/s DLA+CUDA 5.x
Jetson Xavier NX 16GB 16 GB 59.7 GB/s DLA+CUDA 5.x
Jetson AGX Xavier 16GB 16 GB 136.5 GB/s DLA+CUDA 5.x
Jetson AGX Xavier 32GB 32 GB 136.5 GB/s DLA+CUDA 5.x
Jetson AGX Xavier 64GB 64 GB 136.5 GB/s DLA+CUDA 5.x
Jetson AGX Xavier Industrial 64 GB 136.5 GB/s DLA+CUDA 5.x
Jetson Orin Nano 4GB 4 GB 51.2 GB/s CUDA 6.x
Jetson Orin Nano 8GB 8 GB 102.4 GB/s CUDA 6.x
Jetson Orin NX 8GB 8 GB 102.4 GB/s DLA+CUDA 6.x
Jetson Orin NX 16GB 16 GB 102.4 GB/s DLA+CUDA 6.x
Jetson AGX Orin 32GB 32 GB 204.8 GB/s DLA+CUDA 6.x
Jetson AGX Orin 64GB 64 GB 204.8 GB/s DLA+CUDA 6.x
Jetson AGX Orin Industrial 64 GB 204.8 GB/s DLA+CUDA 6.x
Jetson AGX Thor T4000 64 GB 273 GB/s FP4+CUDA 6.x
Jetson AGX Thor T5000 128 GB 273 GB/s FP4+CUDA 6.x
DGX Spark (GB10) 128 GB 273 GB/s FP4+CUDA

On macOS or Linux dev machines, jetfit runs in simulation mode — pick any profile with S to preview recommendations.


How it works

  1. Hardware detection — Reads device-tree model and compatible strings (/proc/device-tree/), tegra release (/etc/nv_tegra_release), and available RAM via tegrastats, jtop, or /proc/meminfo (priority order). On non-Jetson machines, falls back to simulation mode with a selectable profile.

  2. Model database — 67 models embedded directly in fit.py. Each entry has a parameter count and real context length sourced from HuggingFace. Memory requirements are computed across a 6-level quantization ladder (Q8_0 through Q2_K) using per-quant bytes-per-parameter values that account for k-quant codebook overhead.

  3. KV cache accounting — Memory estimates include a fp16 KV cache (0.000008 × params_b × 4096 GB) and 0.5 GB runtime overhead, so "fits" means the model will actually load at a typical 4K inference context.

  4. FP4 halving — On devices with FP4 support (Thor, DGX Spark), effective model size is halved before all memory and speed calculations.

  5. Fit levels — Based on (weights + KV cache + overhead) / available_memory:

    Level Utilization
    Perfect ≤ 70%
    Good 71–90%
    Marginal 91–100%
    TooTight > 100%
  6. Speed estimation — Token generation is memory-bandwidth-bound. Estimated tok/s:

    (bandwidth_GB_s / effective_size_GB) × efficiency × quant_speed_multiplier

    Default efficiency is 0.50–0.55 per profile, tunable via A. Quant multipliers range from 1.00× (Q8_0) to 1.80× (Q2_K).

  7. Composite score — Each model gets a 0–100 score combining normalized speed (45%), fit level (35%), and quantization quality (20%). Used for sorting and the score column.

  8. Calibration — Run jetfit calibrate to measure real tok/s on your device and save a per-profile efficiency factor to ~/.config/jetfit/calibration.json. Calibrated profiles show a ✓ cal badge in the system bar.


Project structure

jetfit/
  __init__.py      -- version
  cli.py           -- Click CLI entry point, TUI launch
  hardware.py      -- Jetson/DGX hardware detection
  profiles.py      -- Hardware profile database (22 devices)
  fit.py           -- Scoring engine, quantization ladder, model catalog
  tui.py           -- Textual TUI (app state, rendering, keyboard events)
tests/
  test_hardware.py -- Hardware detection and TUI markup regression tests
  test_fit.py      -- Scoring engine unit tests
  test_calibration.py
  test_ros2.py
pyproject.toml
LICENSE

Dependencies

Package Purpose
click CLI argument parsing
rich CLI table and colored output
textual Terminal UI framework

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jetfit-0.1.0.tar.gz (36.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jetfit-0.1.0-py3-none-any.whl (31.2 kB view details)

Uploaded Python 3

File details

Details for the file jetfit-0.1.0.tar.gz.

File metadata

  • Download URL: jetfit-0.1.0.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for jetfit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 42bc6627f3dde7de80af460d5a54fdc7e5c7787ebc8b69b29280017bf20ad21c
MD5 57dd2c058ac5a6b85ce2d3997d2a0647
BLAKE2b-256 e34a510890259c00ae8006d751f7edd53476cb070508d77b576dde10c88603ef

See more details on using hashes here.

File details

Details for the file jetfit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: jetfit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for jetfit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7c20bc9a85d461ba00242b084041db673efb8f7290aa5c801fbe8204e42d052d
MD5 e58f6c464dcd944cdc4775b0d779189c
BLAKE2b-256 cc95039d335fd32e63b4d0d9b5382452d0769542b6f80988b3c969cdf2709a84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page