Litmus — LLM scenario runner TUI
Project description
Litmus 🧪
Terminal UI for running LLM agent scenarios and comparing their performance.
Litmus executes coding tasks across multiple AI agents and models, runs tests against the results, and produces detailed evaluation reports — all from a single TUI.
What it does
- Detects agents installed on your system (Claude Code, Codex, Aider, Cursor Agent, KiloCode, OpenCode)
- Runs scenarios — each scenario is a coding task with tests and scoring criteria
- Evaluates results — an LLM judge scores agent and model performance across 20 criteria each
- Generates reports — HTML reports with per-scenario breakdowns, logs, and scores
Supported agents
| Agent | Binary | Model listing |
|---|---|---|
| Claude Code | claude |
Built-in list |
| Codex | codex |
Built-in list |
| OpenCode | opencode |
opencode models |
| KiloCode | kilocode |
kilocode models |
| Aider | aider |
aider --list-models |
| Cursor Agent | agent |
agent models |
Litmus auto-detects which agents are available and queries their model lists.
Quick start
Requires Python 3.12+ and uv.
# Run without installing — uv fetches everything automatically
uvx --from git+https://github.com/ivkond/litmus.git litmus
On first launch Litmus will detect installed agents, generate a config, and open the TUI.
Alternative ways to install
# Install as a global tool
uv tool install git+https://github.com/ivkond/litmus.git
litmus
# Or clone for development
git clone https://github.com/ivkond/litmus.git
cd litmus
uv sync
uv run litmus
Once published to PyPI, install will simplify to
uvx litmus.
TUI workflow
- 📋 Models — select agents and models to test
- 🧩 Scenarios — pick which coding tasks to run
- ▶️ Run — watch execution progress in real time
- 📊 Analysis — review LLM-judged scores
- 📄 Reports — browse generated HTML reports
How it works
Each scenario lives in template/<id>/ and contains:
template/1-data-structure/
prompt.txt # Task description sent to the agent
task.txt # Detailed requirements
scoring.csv # Evaluation criteria
project/ # Starter code with tests
Execution pipeline per scenario:
uv sync -> agent call -> pytest -> collect logs
After all runs complete, an LLM judge evaluates the results using 20 agent criteria (tool efficiency, error recovery, reasoning depth...) and 20 model criteria (code correctness, instruction following, hallucination resistance...).
Configuration
On first launch, Litmus generates a config file with detected agents and their settings. Configure the analysis model (any OpenAI-compatible API) through the TUI settings screen.
Scenario packs
Litmus supports exporting and importing scenario archives (.litmus-pack ZIP files) for sharing test suites between machines or teams.
Project structure
src/litmus/
__init__.py # Entry point, PROJECT_ROOT
app.py # Textual TUI (screens, widgets)
agents.py # Agent registry, detection, model listing
run.py # Scenario execution engine
analysis.py # LLM-powered evaluation (20+20 criteria)
report.py # HTML report generation
pack/ # Scenario export/import
Tech stack
- Textual — TUI framework
- Rich — terminal formatting
- Pydantic — structured evaluation models
- OpenAI SDK — LLM judge (any compatible API)
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file litmus_llm-0.2.0.tar.gz.
File metadata
- Download URL: litmus_llm-0.2.0.tar.gz
- Upload date:
- Size: 43.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
991d2b12724bd824814437f317db82da849d253cec12616405a677744e8beb7a
|
|
| MD5 |
8bf7ee85d5ce1598e27284939a3f10a2
|
|
| BLAKE2b-256 |
19d6f88bebafc3e89a87b5f84ebf9d3d8b3f7177dcc7d54965c2d04cb7a11d84
|
File details
Details for the file litmus_llm-0.2.0-py3-none-any.whl.
File metadata
- Download URL: litmus_llm-0.2.0-py3-none-any.whl
- Upload date:
- Size: 52.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b5f0a9a0636688de7e0ab62b021f68ec76d03b5f5123b723add32b82974d53b
|
|
| MD5 |
db320c567ea29b4c5c4422729ec399e7
|
|
| BLAKE2b-256 |
953e4dc8bee6a7cf1dddf5ac3708b21ee823c0cf29d37c15e27bfe3b2e5b9a38
|