Litmus — LLM scenario runner TUI

These details have not been verified by PyPI

Project description

Litmus 🧪

Terminal UI for running LLM agent scenarios and comparing their performance.

Litmus executes coding tasks across multiple AI agents and models, runs tests against the results, and produces detailed evaluation reports — all from a single TUI.

What it does

Detects agents installed on your system (Claude Code, Codex, Aider, Cursor Agent, KiloCode, OpenCode)
Runs scenarios — each scenario is a coding task with tests and scoring criteria
Evaluates results — an LLM judge scores agent and model performance across 20 criteria each
Generates reports — HTML reports with per-scenario breakdowns, logs, and scores

Supported agents

Agent	Binary	Model listing
Claude Code	`claude`	Built-in list
Codex	`codex`	Built-in list
OpenCode	`opencode`	`opencode models`
KiloCode	`kilocode`	`kilocode models`
Aider	`aider`	`aider --list-models`
Cursor Agent	`agent`	`agent models`

Litmus auto-detects which agents are available and queries their model lists.

Quick start

Requires Python 3.12+ and uv.

# Run without installing — uv fetches everything automatically
uvx --from git+https://github.com/ivkond/litmus.git litmus

On first launch Litmus will detect installed agents, generate a config, and open the TUI.

Alternative ways to install

# Install as a global tool
uv tool install git+https://github.com/ivkond/litmus.git
litmus

# Or clone for development
git clone https://github.com/ivkond/litmus.git
cd litmus
uv sync
uv run litmus

Once published to PyPI, install will simplify to uvx litmus.

TUI workflow

📋 Models — select agents and models to test
🧩 Scenarios — pick which coding tasks to run
▶️ Run — watch execution progress in real time
📊 Analysis — review LLM-judged scores
📄 Reports — browse generated HTML reports

How it works

Each scenario lives in template/<id>/ and contains:

template/1-data-structure/
  prompt.txt        # Task description sent to the agent
  task.txt          # Detailed requirements
  scoring.csv       # Evaluation criteria
  project/          # Starter code with tests

Execution pipeline per scenario:

uv sync  ->  agent call  ->  pytest  ->  collect logs

After all runs complete, an LLM judge evaluates the results using 20 agent criteria (tool efficiency, error recovery, reasoning depth...) and 20 model criteria (code correctness, instruction following, hallucination resistance...).

Configuration

On first launch, Litmus generates a config file with detected agents and their settings. Configure the analysis model (any OpenAI-compatible API) through the TUI settings screen.

Scenario packs

Litmus supports exporting and importing scenario archives (.litmus-pack ZIP files) for sharing test suites between machines or teams.

Project structure

src/litmus/
  __init__.py     # Entry point, PROJECT_ROOT
  app.py          # Textual TUI (screens, widgets)
  agents.py       # Agent registry, detection, model listing
  run.py          # Scenario execution engine
  analysis.py     # LLM-powered evaluation (20+20 criteria)
  report.py       # HTML report generation
  pack/           # Scenario export/import

Tech stack

Textual — TUI framework
Rich — terminal formatting
Pydantic — structured evaluation models
OpenAI SDK — LLM judge (any compatible API)

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.1

Mar 25, 2026

0.3.0

Mar 25, 2026

0.2.1

Mar 25, 2026

This version

0.2.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litmus_llm-0.2.0.tar.gz (43.4 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

litmus_llm-0.2.0-py3-none-any.whl (52.3 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file litmus_llm-0.2.0.tar.gz.

File metadata

Download URL: litmus_llm-0.2.0.tar.gz
Upload date: Mar 25, 2026
Size: 43.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for litmus_llm-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`991d2b12724bd824814437f317db82da849d253cec12616405a677744e8beb7a`
MD5	`8bf7ee85d5ce1598e27284939a3f10a2`
BLAKE2b-256	`19d6f88bebafc3e89a87b5f84ebf9d3d8b3f7177dcc7d54965c2d04cb7a11d84`

See more details on using hashes here.

File details

Details for the file litmus_llm-0.2.0-py3-none-any.whl.

File metadata

Download URL: litmus_llm-0.2.0-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 52.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for litmus_llm-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4b5f0a9a0636688de7e0ab62b021f68ec76d03b5f5123b723add32b82974d53b`
MD5	`db320c567ea29b4c5c4422729ec399e7`
BLAKE2b-256	`953e4dc8bee6a7cf1dddf5ac3708b21ee823c0cf29d37c15e27bfe3b2e5b9a38`

See more details on using hashes here.

litmus-llm 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Litmus 🧪

What it does

Supported agents

Quick start

Alternative ways to install

TUI workflow

How it works

Configuration

Scenario packs

Project structure

Tech stack

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes