LLM model advisor for NVIDIA Jetson and DGX Spark unified-memory devices
Project description
jetfit
LLM model advisor for NVIDIA Jetson and DGX Spark unified-memory devices.
Detects your Jetson hardware, scores LLM models across quality, speed, and memory fit, and tells you exactly which quantization level will run well on your device.
Ships with an interactive TUI (default) and a CLI mode. Supports hardware simulation, calibration, compare view, and plan mode.
Install
pip install jetfit
or with uv:
uv tool install jetfit # install globally
uvx jetfit # run without installing
Usage
TUI (default)
jetfit
Launches the interactive terminal UI. The top bar shows your detected platform, available RAM, accelerator type, and minimum JetPack version. Models are listed in a scrollable table sorted by params, with composite score, estimated tok/s, best quantization, memory %, and fit grade per row.
Normal mode
| Key | Action |
|---|---|
j / k |
Navigate models |
g |
Jump to top / bottom (toggle) |
Enter |
Open detail view |
p |
Open plan mode |
m |
Mark / unmark model for compare |
c |
Open compare view (marked vs selected) |
x |
Clear all marks |
v |
Enter visual select mode |
/ |
Focus search bar |
r |
Cycle provider (family) filter |
b |
Cycle size filter |
f |
Cycle fit filter |
s |
Cycle sort column |
- |
Flip sort direction |
F |
Open advanced filter popup |
S |
Open hardware simulation |
A |
Open advanced config (tune efficiency) |
t |
Cycle theme |
h |
Open help |
q |
Quit |
Visual mode (v)
Select a contiguous range of models for bulk comparison.
| Key | Action |
|---|---|
j / k |
Extend selection |
m |
Mark selected model |
c |
Open compare view for selection |
v / Esc |
Exit visual mode |
Detail view (Enter)
Shows full quant ladder for the selected model — size, KV cache, total memory, memory %, estimated tok/s, and fit grade for every quantization level. Navigate rows with j/k; the left panel updates to show specs for the highlighted quant.
Plan mode (p)
Estimates hardware requirements for a model config. Edit Context, Quant, and Target TPS fields. Shows minimum and recommended RAM, feasibility per run path, and upgrade deltas.
| Key | Action |
|---|---|
Tab / j / k |
Move between fields |
| Type | Edit current field |
Backspace |
Remove characters |
Esc / q |
Exit plan mode |
Compare view (c)
Side-by-side comparison of marked models. Rows are attributes (Score, tok/s, Fit, Mem%, Params, Quant, Context); columns are models. Best values are highlighted.
Hardware simulation (S)
Override the active hardware profile to preview recommendations for any supported Jetson or DGX Spark device without leaving the TUI. The system bar shows (sim) when active.
Advanced config (A)
Tune the efficiency factor used for tok/s estimation. Changes apply immediately and all scores are recalculated.
Advanced filter (F)
Set numeric bounds on parameter count and memory utilization %.
CLI
# Detect hardware
jetfit system
# Detect hardware (JSON)
jetfit system --json
# Recommend models for current hardware
jetfit recommend
# Filter by model name
jetfit recommend --model llama
# Fix a specific quant level
jetfit recommend --quant Q4_K_M
# Show all quant levels per model
jetfit recommend --all-quants
# Override available memory
jetfit recommend --available-gb 12.0
# Target a specific hardware profile
jetfit recommend --profile jetson_agx_orin_64gb
# Minimum tok/s threshold
jetfit recommend --min-tps 5.0
# JSON output
jetfit recommend --json
Supported Hardware
| Device | RAM | Bandwidth | Accelerator | JetPack |
|---|---|---|---|---|
| Jetson Nano | 4 GB | 25.6 GB/s | DLA+CUDA | 4.x |
| Jetson TX2 NX | 4 GB | 51.2 GB/s | CUDA | 5.x |
| Jetson TX2 4GB | 4 GB | 51.2 GB/s | CUDA | 4.x |
| Jetson TX2 | 8 GB | 59.7 GB/s | CUDA | 4.x |
| Jetson TX2i | 8 GB | 51.2 GB/s | CUDA | 4.x |
| Jetson Xavier NX 8GB | 8 GB | 59.7 GB/s | DLA+CUDA | 5.x |
| Jetson Xavier NX 16GB | 16 GB | 59.7 GB/s | DLA+CUDA | 5.x |
| Jetson AGX Xavier 16GB | 16 GB | 136.5 GB/s | DLA+CUDA | 5.x |
| Jetson AGX Xavier 32GB | 32 GB | 136.5 GB/s | DLA+CUDA | 5.x |
| Jetson AGX Xavier 64GB | 64 GB | 136.5 GB/s | DLA+CUDA | 5.x |
| Jetson AGX Xavier Industrial | 64 GB | 136.5 GB/s | DLA+CUDA | 5.x |
| Jetson Orin Nano 4GB | 4 GB | 51.2 GB/s | CUDA | 6.x |
| Jetson Orin Nano 8GB | 8 GB | 102.4 GB/s | CUDA | 6.x |
| Jetson Orin NX 8GB | 8 GB | 102.4 GB/s | DLA+CUDA | 6.x |
| Jetson Orin NX 16GB | 16 GB | 102.4 GB/s | DLA+CUDA | 6.x |
| Jetson AGX Orin 32GB | 32 GB | 204.8 GB/s | DLA+CUDA | 6.x |
| Jetson AGX Orin 64GB | 64 GB | 204.8 GB/s | DLA+CUDA | 6.x |
| Jetson AGX Orin Industrial | 64 GB | 204.8 GB/s | DLA+CUDA | 6.x |
| Jetson AGX Thor T4000 | 64 GB | 273 GB/s | FP4+CUDA | 6.x |
| Jetson AGX Thor T5000 | 128 GB | 273 GB/s | FP4+CUDA | 6.x |
| DGX Spark (GB10) | 128 GB | 273 GB/s | FP4+CUDA | — |
On macOS or Linux dev machines, jetfit runs in simulation mode — pick any profile with S to preview recommendations.
How it works
-
Hardware detection — Reads device-tree model and compatible strings (
/proc/device-tree/), tegra release (/etc/nv_tegra_release), and available RAM viategrastats,jtop, or/proc/meminfo(priority order). On non-Jetson machines, falls back to simulation mode with a selectable profile. -
Model database — 67 models embedded directly in
fit.py. Each entry has a parameter count and real context length sourced from HuggingFace. Memory requirements are computed across a 6-level quantization ladder (Q8_0 through Q2_K) using per-quant bytes-per-parameter values that account for k-quant codebook overhead. -
KV cache accounting — Memory estimates include a fp16 KV cache (
0.000008 × params_b × 4096 GB) and 0.5 GB runtime overhead, so "fits" means the model will actually load at a typical 4K inference context. -
FP4 halving — On devices with FP4 support (Thor, DGX Spark), effective model size is halved before all memory and speed calculations.
-
Fit levels — Based on
(weights + KV cache + overhead) / available_memory:Level Utilization Perfect ≤ 70% Good 71–90% Marginal 91–100% TooTight > 100% -
Speed estimation — Token generation is memory-bandwidth-bound. Estimated tok/s:
(bandwidth_GB_s / effective_size_GB) × efficiency × quant_speed_multiplierDefault efficiency is 0.50–0.55 per profile, tunable via
A. Quant multipliers range from 1.00× (Q8_0) to 1.80× (Q2_K). -
Composite score — Each model gets a 0–100 score combining normalized speed (45%), fit level (35%), and quantization quality (20%). Used for sorting and the score column.
-
Calibration — A per-profile efficiency factor can be saved to
~/.config/jetfit/calibration.json. When present, the system bar shows a✓ calbadge and all speed estimates use the measured value instead of the profile default.
Project structure
jetfit/
__init__.py -- version
cli.py -- Click CLI entry point, TUI launch
hardware.py -- Jetson/DGX hardware detection
profiles.py -- Hardware profile database (22 devices)
fit.py -- Scoring engine, quantization ladder, model catalog
tui.py -- Textual TUI (app state, rendering, keyboard events)
tests/
test_hardware.py -- Hardware detection and TUI markup regression tests
test_fit.py -- Scoring engine unit tests
test_calibration.py
test_ros2.py
pyproject.toml
LICENSE
Dependencies
| Package | Purpose |
|---|---|
click |
CLI argument parsing |
rich |
CLI table and colored output |
textual |
Terminal UI framework |
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jetfit-0.1.1.tar.gz.
File metadata
- Download URL: jetfit-0.1.1.tar.gz
- Upload date:
- Size: 36.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6277c595b883c4e187c06eafc24d182b5005c61c8db61800c8dcde5419acf89c
|
|
| MD5 |
5e18e23e1f34da7d863a1435b3de39c9
|
|
| BLAKE2b-256 |
61c68fa68478e7b0805fd2a8823b2b7308408da41ab7e8a5c39c2741eff8b68b
|
File details
Details for the file jetfit-0.1.1-py3-none-any.whl.
File metadata
- Download URL: jetfit-0.1.1-py3-none-any.whl
- Upload date:
- Size: 31.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a5c12ec61fb2b3e8bdeae7932ae1de85d26ce53ac3f312879c0bf7870fc1544
|
|
| MD5 |
fc7546ca85b724d8437c17d8a56783e9
|
|
| BLAKE2b-256 |
fc5bdb7ff893a21396c655ce1532e0fc6e95d08dfb48661534074a1230e449db
|