Skip to main content

Hardware LLM capability scanner — know what runs on your machine

Project description

tinillm

A local-first LLM agent and hardware scanner — chat with your hardware, let it write code.

pipx install tinillm
tinillm

What it does

tinillm is an interactive agent that runs on your local Ollama models. Type anything at the prompt and it talks to the agent. Type /cmd and it runs a command.

  • Chat with tools. The agent can read files, edit code, run bash, and search the web — all gated by a three-tier permission model (read-only / workspace-write / danger-full-access).
  • Plan mode. /plan puts the model in a read-only planning pass. Approve the plan and it executes autonomously — no per-command confirmations.
  • Hardware-aware. /scan tells you which LLMs fit your machine before you download 30 GB of weights. /models browses real Ollama models with fit analysis.
  • Linux sandbox. When bubblewrap is installed, bash calls run in a namespace jail rooted at your workspace.
  • Session persistence. Every conversation is written to a JSONL log you can /chat --resume.

Install

pipx install tinillm     # recommended: isolated per-tool environment
# or
pip install tinillm

Requires Python 3.11+ and a running Ollama server for the agent. Hardware scanning works without Ollama.

Optional on Linux:

apt install bubblewrap   # enables workspace-rooted bash sandbox

The unified prompt

Everything happens at one prompt. There's no separate "chat mode."

│ refactor agent_chat.py to extract a ChatSession dataclass
⚙  read_file(path='tinillm/run/agent_chat.py')
⚙  write_file(path='tinillm/run/chat_session.py', …)
⚙  edit_file(path='tinillm/run/agent_chat.py', …)
done — extracted ChatSession with create/handle_user_turn/handle_slash.

│ /scan
  LLM Capability Matrix
  ~1B    Perfect   Q8_0     1.9 GB   580 t/s
  ~7B    Perfect   Q6_K     6.2 GB    88 t/s
  ~13B   Perfect   Q5_K_M  10.1 GB    47 t/s

Free text → the agent. /scan, /models, /doctor → Click commands. In-chat commands like /plan, /cost, /rewind, /diff, /tasks operate on the live agent session.


Commands

Agent

Command What it does
(type anything) Send a turn to the agent
/plan Enter plan mode (read-only); produce a numbered plan that auto-executes on approval
/plan-off Leave plan mode without executing
/compact Summarize older turns to free context
/summary Print a summary without touching history
/rewind [N] Remove the last N turns
/diff Show files changed this session
/cost Token + time usage so far
/tasks Current agent TODO list
/load <model> Swap the live model mid-session
/permissions [mode] View or set permission mode
/chat [--resume ID] Re-initialise the agent (or resume a session)

Hardware + models

Command What it does
/scan Scan hardware and show which LLM sizes fit
/scan --verbose Include sizes that don't fit
/scan --json Machine-readable output
/models Browse real Ollama models with fit analysis
/models --fits-only Hide models that don't fit
/models --ollama Show which models are locally installed
/run [tag] Launch a model directly in Ollama
/suggest --use-case coding Personalised recommendation

System

Command What it does
/doctor Health check (hardware, Ollama, sandbox)
/memory show · /remember · /forget Persistent per-project memory
/sessions List resumable chat sessions
/skills List installed agent skills
/init Scaffold a TINILLM.md for this repo
/help List every command
/clear Clear the terminal
/exit Quit (Ctrl+D also works)

Tab-completion works on every slash command, subcommand, and flag.


Plan mode

│ /plan
plan mode ON — read-only. Describe your goal.

│ add a --json flag to the scan command
Plan:
1. read render/scan.py to find the current output path
2. add --json flag in commands.py scan_cmd
3. route to a JSON renderer, gated on the flag
4. add a test in tests/test_scan_cmd.py

plan auto-accepted — executing in workspace-write mode
⚙  read_file(path='tinillm/render/scan.py')
⚙  edit_file(path='tinillm/commands.py', …)
⚙  write_file(path='tinillm/render/scan_json.py', …)
done.

Once the plan is approved, destructive bash and write-tool calls run without confirmation prompts. Autonomy is scoped to the one execution phase — the next user turn gets a fresh confirm-on-destructive policy.


Permissions

Three modes, set per-session:

Mode Reads Workspace writes Shell Network
read-only
workspace-write (default) confirm on destructive
danger-full-access

Switch mid-session with /permissions read-only.


Sandbox

On Linux with bubblewrap installed, every bash call is wrapped with bwrap --unshare-net --bind <workspace> / --ro-bind /usr /usr … — reads are confined to the workspace, writes can't escape it, and network is off by default. sandbox_allow_network=true in .tinillmrc opens it.

macOS / Windows: sandbox is a no-op for now; permission enforcer and workspace resolution still apply.

Check status with /doctor or the welcome panel.


Configuration

Per-project: .tinillmrc (TOML) in the repo root:

model = "qwen2.5-coder:14b"
permission_mode = "workspace-write"
auto_accept_plan = true
sandbox_enabled = true
sandbox_allow_network = false
allowed_tools = ["read_file", "write_file", "edit_file", "bash"]
denied_tools = []

Per-user: ~/.tinillm/settings.json (written by /load).


GPU support

Vendor Detection method
NVIDIA nvidia-smi → sysfs fallback
AMD rocm-smi → sysfs fallback
Apple Silicon system_profiler (unified memory)
Intel Arc sysfs + lspci
Windows (all) PowerShell WMI
Any vulkaninfo last-resort fallback

Fit levels

Level Meaning
Perfect Fits comfortably at Q4_K_M or better with ≥20% headroom
Good Fits but tightly
Marginal Only at heavy compression / reduced context, or CPU-only
TooTight Won't fit under any quantisation

Versioning

Version Feature
2.4 Unified REPL · plan autonomy · sandbox surfacing ← current
2.3 Plan mode · auto-accept · session JSONL
2.2 Agent tool calling · three-tier permissions
2.1 Linux bubblewrap sandbox
1.9 Hardware scanning + model runner
1.8 Interactive REPL, slash commands
1.1 First release — hardware scanner

Part of the tini* family

Tool What it does
tiniRAG Privacy-first RAG CLI
tinillm Local LLM agent + hardware scanner

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinillm-2.4.0.tar.gz (127.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinillm-2.4.0-py3-none-any.whl (115.4 kB view details)

Uploaded Python 3

File details

Details for the file tinillm-2.4.0.tar.gz.

File metadata

  • Download URL: tinillm-2.4.0.tar.gz
  • Upload date:
  • Size: 127.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tinillm-2.4.0.tar.gz
Algorithm Hash digest
SHA256 5eda83299fa68e271c497987190963b72512a487b976f3c4477136ad53ea0b51
MD5 103e3db109dfe210ef460dc6b5c0fed4
BLAKE2b-256 b69701bdb118d168e787140502f66006c41e13064d4d764d67ac53d1a14c407c

See more details on using hashes here.

File details

Details for the file tinillm-2.4.0-py3-none-any.whl.

File metadata

  • Download URL: tinillm-2.4.0-py3-none-any.whl
  • Upload date:
  • Size: 115.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tinillm-2.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f871272805eef5dd3119ff9ee9d5fd2cfff8329e9aa669ecef94260f2e96f347
MD5 6fc17840e597457c43f9ecf79d7cda45
BLAKE2b-256 6392a4d237d9b41813542adc3a46962ee178f31f38a428e946dfa4eb73491320

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page