Detect hardware and estimate LLM model inference capability
Project description
tamebi
Detect your hardware. Know what you can run.
tamebi is a CLI tool that automatically detects your machine's hardware (CPU, RAM, GPU, disk) and tells you exactly which LLM models you can run — with estimated memory usage, throughput, and time to first token.
Install
pip install tamebi
or with uv:
uv pip install tamebi
Optional extras
# NVIDIA GPU detection (requires NVIDIA drivers)
pip install "tamebi[nvidia]"
# HuggingFace Hub model search
pip install "tamebi[hf]"
# Both
pip install "tamebi[nvidia,hf]"
Quick Start
tamebi check
Example output:
┌──────────────────────────────────────────────────────────────────┐
│ tamebi check — context length: 4,096 tokens, batch size: 1 │
└──────────────────────────────────────────────────────────────────┘
🖥 Hardware Summary
┌──────────────────────┬─────────────────────────────────────────────────────┐
│ Component │ Details │
├──────────────────────┼─────────────────────────────────────────────────────┤
│ CPU │ AMD Ryzen 9 7950X 16-Core Processor │
│ Architecture │ x86_64 │
│ Cores / Threads │ 16 cores / 32 threads @ 4500 MHz │
│ RAM │ 64.0 GB total / 48.2 GB available │
│ GPU │ NVIDIA RTX 4090 — 24.0 GB VRAM (22.5 GB free) │
│ │ CUDA 12.4 | CC 8.9 │
│ Disk │ 834.2 GB free / 1000.0 GB total │
│ OS │ Linux 6.5.0 │
│ Available for infr. │ 22.5 GB (VRAM) │
└──────────────────────┴─────────────────────────────────────────────────────┘
📊 Model Compatibility Matrix
┌────────────────────────┬─────────┬──────────┬──────────┬──────────┐
│ Model │ Params │ FP16 │ INT8 │ INT4 │
├────────────────────────┼─────────┼──────────┼──────────┼──────────┤
│ Llama 3.1 8B │ 8.0B │ ✅ Runs │ ✅ Runs │ ✅ Runs │
│ │ │ 18.7 GB │ 9.7 GB │ 5.1 GB │
│ Llama 3.1 70B │ 70.0B │ ❌ No fit│ ❌ No fit│ ⚠️ Tight │
│ │ │ 163.5 GB │ 81.8 GB │ 41.3 GB │
│ Mistral 7B v0.3 │ 7.3B │ ✅ Runs │ ✅ Runs │ ✅ Runs │
│ Qwen 2.5 7B │ 7.6B │ ✅ Runs │ ✅ Runs │ ✅ Runs │
│ Phi-3 Mini 3.8B │ 3.8B │ ✅ Runs │ ✅ Runs │ ✅ Runs │
│ Gemma 2 9B │ 9.2B │ ❌ No fit│ ✅ Runs │ ✅ Runs │
│ ... │ │ │ │ │
└────────────────────────┴─────────┴──────────┴──────────┴──────────┘
⚡ Performance Estimates (runnable models)
┌────────────────────┬───────┬────────┬─────────┬──────────┬───────────┬──────────┬────────┐
│ Model │ Prec. │ VRAM │ Weights │ KV Cache │ Tokens/s │ TTFT (s) │ Status │
├────────────────────┼───────┼────────┼─────────┼──────────┼───────────┼──────────┼────────┤
│ Llama 3.1 8B │ INT4 │ 5.1 GB │ 4.0 GB │ 0.1 GB │ 120-200 │ 0.1-0.2 │ ✅ │
│ Mistral 7B v0.3 │ INT4 │ 4.7 GB │ 3.7 GB │ 0.1 GB │ 120-200 │ 0.1-0.2 │ ✅ │
│ ... │ │ │ │ │ │ │ │
└────────────────────┴───────┴────────┴─────────┴──────────┴───────────┴──────────┴────────┘
┌─ 🏆 Top Recommendations ───────────────────────────────────────────────────┐
│ 1. Llama 3.1 8B (INT4) — 5.1 GB, ~120-200 tok/s │
│ Example: ollama run llama-3.1-8b:int4 │
│ 2. Qwen 2.5 7B (INT4) — 4.9 GB, ~120-200 tok/s │
│ Example: ollama run qwen-2.5-7b:int4 │
│ 3. Gemma 2 9B (INT4) — 5.8 GB, ~120-200 tok/s │
│ Example: ollama run gemma-2-9b:int4 │
└────────────────────────────────────────────────────────────────────────────┘
CLI Reference
tamebi check
Detect hardware and estimate which LLM models can run.
| Flag | Short | Default | Description |
|---|---|---|---|
--json |
-j |
false |
Output as JSON instead of rich tables |
--context-length |
-c |
4096 |
Context length in tokens. KV cache scales linearly with this — 4K vs 128K changes memory dramatically |
--batch-size |
-b |
1 |
Concurrent requests. Each gets its own KV cache. Set >1 if planning GPU serving |
--online |
false |
Also query HuggingFace Hub for model suggestions | |
--verbose |
false |
Show detailed detection info (driver versions, etc.) |
Examples
# Basic hardware check
tamebi check
# JSON output for scripting
tamebi check --json
# Estimate for serving 4 concurrent users with 8K context
tamebi check --batch-size 4 --context-length 8192
# Include HuggingFace model suggestions
tamebi check --online
Supported Hardware
| Vendor | Detection Method | Details |
|---|---|---|
| NVIDIA | nvidia-ml-py (NVML) |
Model, VRAM, CUDA version, compute capability |
| AMD | rocm-smi (subprocess) |
Model, VRAM (requires ROCm) |
| Apple Silicon | system_profiler |
Chip model (M1/M2/M3/M4), unified memory |
| CPU-only | psutil + py-cpuinfo |
Cores, threads, frequency, architecture |
Built-in Model Catalog
| Model | Params | Family |
|---|---|---|
| Llama 3.1 8B / 70B / 405B | 8B / 70B / 405B | Meta Llama |
| Mistral 7B v0.3 | 7.3B | Mistral AI |
| Mixtral 8x7B | 46.7B (MoE) | Mistral AI |
| Qwen 2.5 7B / 72B | 7.6B / 72.7B | Alibaba |
| Phi-3 Mini / Medium | 3.8B / 14B | Microsoft |
| Gemma 2 9B / 27B | 9.2B / 27.2B | |
| DeepSeek-V2 Lite / 236B | 15.7B / 236B | DeepSeek |
How Estimation Works
Memory is estimated per model and precision:
Total VRAM = Model Weights + KV Cache + Overhead
Model Weights = params (billions) × bytes_per_param
FP32: 4 bytes | FP16: 2 bytes | INT8: 1 byte | INT4: 0.5 bytes
KV Cache = 2 × layers × num_kv_heads × head_dim × context_len × bytes × batch_size
(GQA-aware: uses KV heads, not Q heads — e.g. Llama 3.1 has 8 KV heads vs 32 Q heads)
Overhead = 15% of weights (activations + fragmentation) + 0.5 GB CUDA (NVIDIA only)
Performance estimates (tokens/sec, time to first token) are based on hardware-class lookup tables. They show ranges, not exact numbers — actual performance depends on specific drivers, software stack, and workload.
License
Copyright (c) 2026 Tamebi. All rights reserved. Proprietary and confidential.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tamebi-0.1.0.tar.gz.
File metadata
- Download URL: tamebi-0.1.0.tar.gz
- Upload date:
- Size: 19.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f4252221ea0c3a4ab50802ebf2f039a83c2c9411bf5361b62fb739695c278d4
|
|
| MD5 |
5e4736828e28b641cb7784f9c0476ede
|
|
| BLAKE2b-256 |
b4ac2ea0a18ae66b5cbf6ad1b997954b6daf422b2a86f18a9b6f0d8e6c97263d
|
File details
Details for the file tamebi-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tamebi-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef4bd2526c3eb9009a44697161703dbdd0b1bac1795fd02176ae0dc18631453a
|
|
| MD5 |
f5164c13e9176b8c036a8f2a34ac6c43
|
|
| BLAKE2b-256 |
2a4095235dd235a32b4317c8b2744a3944dd4c5fbd068b708ae27cd0dbe96292
|