Detect hardware and estimate LLM model inference capability
Project description
tamebi
Detect your hardware. Know what you can run.
tamebi is a CLI tool that automatically detects your machine's hardware (CPU, RAM, GPU, disk) and tells you exactly which LLM models you can run — with estimated memory usage, throughput, and time to first token.
Install
pip install tamebi
or with uv:
uv pip install tamebi
NVIDIA, AMD, and Apple Silicon are all detected automatically — no extra flags or extras needed.
Quick Start
tamebi check
CLI Reference
tamebi check
Detect hardware and show what's runnable. Output has three sections:
- Hardware — CPU, RAM, GPU, disk, and available inference memory
- Top Recommendations — the best 3 models for your machine with Ollama run commands
- Runnable Models — all models that fit, with release date, precision, memory breakdown, speed estimate, and TTFT
| Flag | Short | Default | Description |
|---|---|---|---|
--json |
-j |
false |
Output as JSON instead of rich tables |
--context-length |
-c |
4096 |
Context length in tokens. KV cache scales linearly with this — 4K vs 128K changes memory dramatically |
--batch-size |
-b |
1 |
Concurrent requests. Each gets its own KV cache. Set >1 if planning to serve multiple users |
--verbose |
false |
Show detailed detection info (driver versions, etc.) |
tamebi models
Show the full model compatibility matrix — every model in the catalog across all precisions (INT4, INT8, FP16), with fit status and memory at each level.
tamebi models
| Flag | Short | Default | Description |
|---|---|---|---|
--context-length |
-c |
4096 |
Context length for KV cache estimation |
--batch-size |
-b |
1 |
Batch size for KV cache estimation |
tamebi update
Pull the latest model catalog from the remote. The catalog updates automatically in the background but you can force a refresh with this command.
tamebi update
Examples
# Basic hardware check
tamebi check
# JSON output for scripting
tamebi check --json
# Estimate for serving 4 concurrent users with 8K context
tamebi check --batch-size 4 --context-length 8192
# Use each model's native max context window instead of the 4K default
tamebi check --context-length 0
# Browse all models and their compatibility across precisions
tamebi models
# Force-refresh the model catalog
tamebi update
Supported Hardware
| Vendor | Detection Method | Details |
|---|---|---|
| NVIDIA | nvidia-ml-py (NVML) |
Model, VRAM, CUDA version, compute capability |
| AMD | rocm-smi (subprocess) |
Model, VRAM (requires ROCm) |
| Apple Silicon | system_profiler |
Chip model (M1/M2/M3/M4), unified memory |
| CPU-only | psutil + py-cpuinfo |
Cores, threads, frequency, architecture |
Model Catalog
The catalog is automatically updated weekly and covers the latest releases from major labs including Meta, Mistral, Google, Qwen, DeepSeek, GLM, MiniMax, Kimi, Liquid, and AllenAI. Models are fetched directly from HuggingFace Hub — no manual maintenance required.
Run tamebi update at any time to pull the latest catalog.
How Estimation Works
Memory is estimated per model and precision:
Total VRAM = Model Weights + KV Cache + Overhead
Model Weights = params (billions) × bytes_per_param
FP16: 2 bytes | INT8: 1 byte | INT4: 0.5 bytes
KV Cache = 2 × layers × num_kv_heads × head_dim × context_len × bytes × batch_size
(GQA-aware: uses KV heads, not Q heads)
Overhead = 15% of weights (activations + fragmentation) + 0.5 GB (NVIDIA only)
Performance estimates (tokens/sec, time to first token) are based on hardware-class lookup tables. They show ranges, not exact numbers — actual performance depends on drivers, software stack, and workload.
License
Copyright (c) 2026 Tamebi. All rights reserved. Proprietary and confidential.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tamebi-1.1.1.tar.gz.
File metadata
- Download URL: tamebi-1.1.1.tar.gz
- Upload date:
- Size: 21.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07eaadc19e14b2b96a3994d23dbbb8fbba8c044c98fc5a92e1e7ac18908b9c87
|
|
| MD5 |
dc4ff6a2aed6c08055f1b9d1977e36f0
|
|
| BLAKE2b-256 |
79f848dde7749d8b9a52b8fefdf0745d5984085585cdb28bf0d499a0494dd24d
|
File details
Details for the file tamebi-1.1.1-py3-none-any.whl.
File metadata
- Download URL: tamebi-1.1.1-py3-none-any.whl
- Upload date:
- Size: 21.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
602a914430ec6bc840985a774b6dddd52cf3ddfe6aec8205d1fd23d096875991
|
|
| MD5 |
e1dbf388f12539aad965bc4ee9dd3ee5
|
|
| BLAKE2b-256 |
cce50592f2f67542542b04ae79deca4c0623db8b8cd3a4ea5f936af72448bdee
|