Hardware LLM capability scanner — know what runs on your machine
Project description
tinillm
A local-first LLM agent and hardware scanner — chat with your hardware, let it write code.
pipx install tinillm
tinillm
What it does
tinillm is an interactive agent that runs on your local Ollama models. Type
anything at the prompt and it talks to the agent. Type /cmd and it runs a
command.
- Chat with tools. The agent can read files, edit code, run bash, and search the web — all gated by a three-tier permission model (read-only / workspace-write / danger-full-access).
- Plan mode.
/planputs the model in a read-only planning pass. Approve the plan and it executes autonomously — no per-command confirmations. - Hardware-aware.
/scantells you which LLMs fit your machine before you download 30 GB of weights./modelsbrowses real Ollama models with fit analysis. - Linux sandbox. When
bubblewrapis installed, bash calls run in a namespace jail rooted at your workspace. - Session persistence. Every conversation is written to a JSONL log you can
/chat --resume.
Install
pipx install tinillm # recommended: isolated per-tool environment
# or
pip install tinillm
Requires Python 3.11+ and a running Ollama server for the agent. Hardware scanning works without Ollama.
Optional on Linux:
apt install bubblewrap # enables workspace-rooted bash sandbox
The unified prompt
Everything happens at one prompt. There's no separate "chat mode."
│ refactor agent_chat.py to extract a ChatSession dataclass
⚙ read_file(path='tinillm/run/agent_chat.py')
⚙ write_file(path='tinillm/run/chat_session.py', …)
⚙ edit_file(path='tinillm/run/agent_chat.py', …)
done — extracted ChatSession with create/handle_user_turn/handle_slash.
│ /scan
LLM Capability Matrix
~1B Perfect Q8_0 1.9 GB 580 t/s
~7B Perfect Q6_K 6.2 GB 88 t/s
~13B Perfect Q5_K_M 10.1 GB 47 t/s
Free text → the agent. /scan, /models, /doctor → Click commands. In-chat
commands like /plan, /cost, /rewind, /diff, /tasks operate on the
live agent session.
Commands
Agent
| Command | What it does |
|---|---|
| (type anything) | Send a turn to the agent |
/plan |
Enter plan mode (read-only); produce a numbered plan that auto-executes on approval |
/plan-off |
Leave plan mode without executing |
/compact |
Summarize older turns to free context |
/summary |
Print a summary without touching history |
/rewind [N] |
Remove the last N turns |
/diff |
Show files changed this session |
/cost |
Token + time usage so far |
/tasks |
Current agent TODO list |
/load <model> |
Swap the live model mid-session |
/permissions [mode] |
View or set permission mode |
/chat [--resume ID] |
Re-initialise the agent (or resume a session) |
Hardware + models
| Command | What it does |
|---|---|
/scan |
Scan hardware and show which LLM sizes fit |
/scan --verbose |
Include sizes that don't fit |
/scan --json |
Machine-readable output |
/models |
Browse real Ollama models with fit analysis |
/models --fits-only |
Hide models that don't fit |
/models --ollama |
Show which models are locally installed |
/run [tag] |
Launch a model directly in Ollama |
/suggest --use-case coding |
Personalised recommendation |
System
| Command | What it does |
|---|---|
/doctor |
Health check (hardware, Ollama, sandbox) |
/memory show · /remember · /forget |
Persistent per-project memory |
/sessions |
List resumable chat sessions |
/skills |
List installed agent skills |
/init |
Scaffold a TINILLM.md for this repo |
/help |
List every command |
/clear |
Clear the terminal |
/exit |
Quit (Ctrl+D also works) |
Tab-completion works on every slash command, subcommand, and flag.
Plan mode
│ /plan
plan mode ON — read-only. Describe your goal.
│ add a --json flag to the scan command
Plan:
1. read render/scan.py to find the current output path
2. add --json flag in commands.py scan_cmd
3. route to a JSON renderer, gated on the flag
4. add a test in tests/test_scan_cmd.py
plan auto-accepted — executing in workspace-write mode
⚙ read_file(path='tinillm/render/scan.py')
⚙ edit_file(path='tinillm/commands.py', …)
⚙ write_file(path='tinillm/render/scan_json.py', …)
done.
Once the plan is approved, destructive bash and write-tool calls run without confirmation prompts. Autonomy is scoped to the one execution phase — the next user turn gets a fresh confirm-on-destructive policy.
Permissions
Three modes, set per-session:
| Mode | Reads | Workspace writes | Shell | Network |
|---|---|---|---|---|
read-only |
✓ | — | — | — |
workspace-write (default) |
✓ | ✓ | confirm on destructive | — |
danger-full-access |
✓ | ✓ | ✓ | ✓ |
Switch mid-session with /permissions read-only.
Sandbox
On Linux with bubblewrap installed, every bash call is wrapped with
bwrap --unshare-net --bind <workspace> / --ro-bind /usr /usr … — reads are
confined to the workspace, writes can't escape it, and network is off by
default. sandbox_allow_network=true in .tinillmrc opens it.
macOS / Windows: sandbox is a no-op for now; permission enforcer and workspace resolution still apply.
Check status with /doctor or the welcome panel.
Configuration
Per-project: .tinillmrc (TOML) in the repo root:
model = "qwen2.5-coder:14b"
permission_mode = "workspace-write"
auto_accept_plan = true
sandbox_enabled = true
sandbox_allow_network = false
allowed_tools = ["read_file", "write_file", "edit_file", "bash"]
denied_tools = []
Per-user: ~/.tinillm/settings.json (written by /load).
GPU support
| Vendor | Detection method |
|---|---|
| NVIDIA | nvidia-smi → sysfs fallback |
| AMD | rocm-smi → sysfs fallback |
| Apple Silicon | system_profiler (unified memory) |
| Intel Arc | sysfs + lspci |
| Windows (all) | PowerShell WMI |
| Any | vulkaninfo last-resort fallback |
Fit levels
| Level | Meaning |
|---|---|
| Perfect | Fits comfortably at Q4_K_M or better with ≥20% headroom |
| Good | Fits but tightly |
| Marginal | Only at heavy compression / reduced context, or CPU-only |
| TooTight | Won't fit under any quantisation |
Versioning
| Version | Feature |
|---|---|
| 2.4 | Unified REPL · plan autonomy · sandbox surfacing ← current |
| 2.3 | Plan mode · auto-accept · session JSONL |
| 2.2 | Agent tool calling · three-tier permissions |
| 2.1 | Linux bubblewrap sandbox |
| 1.9 | Hardware scanning + model runner |
| 1.8 | Interactive REPL, slash commands |
| 1.1 | First release — hardware scanner |
Part of the tini* family
| Tool | What it does |
|---|---|
| tiniRAG | Privacy-first RAG CLI |
| tinillm | Local LLM agent + hardware scanner |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tinillm-2.4.0.tar.gz.
File metadata
- Download URL: tinillm-2.4.0.tar.gz
- Upload date:
- Size: 127.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5eda83299fa68e271c497987190963b72512a487b976f3c4477136ad53ea0b51
|
|
| MD5 |
103e3db109dfe210ef460dc6b5c0fed4
|
|
| BLAKE2b-256 |
b69701bdb118d168e787140502f66006c41e13064d4d764d67ac53d1a14c407c
|
File details
Details for the file tinillm-2.4.0-py3-none-any.whl.
File metadata
- Download URL: tinillm-2.4.0-py3-none-any.whl
- Upload date:
- Size: 115.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f871272805eef5dd3119ff9ee9d5fd2cfff8329e9aa669ecef94260f2e96f347
|
|
| MD5 |
6fc17840e597457c43f9ecf79d7cda45
|
|
| BLAKE2b-256 |
6392a4d237d9b41813542adc3a46962ee178f31f38a428e946dfa4eb73491320
|