Skip to main content

Litmus — LLM scenario runner TUI

Project description

Litmus 🧪

CI Security (Bandit) Security (OSV) PyPI Python License: MIT

Terminal UI for running LLM agent scenarios and comparing their performance.

Litmus executes coding tasks across multiple AI agents and models, runs tests against the results, and produces detailed evaluation reports — all from a single TUI.

What it does

  1. Detects agents installed on your system (Claude Code, Codex, Aider, Cursor Agent, KiloCode, OpenCode)
  2. Runs scenarios — each scenario is a coding task with tests and scoring criteria
  3. Evaluates results — an LLM judge scores agent and model performance across 20 criteria each
  4. Generates reports — HTML reports with per-scenario breakdowns, logs, and scores

Supported agents

Agent Binary Model listing
Claude Code claude Built-in list
Codex codex Built-in list
OpenCode opencode opencode models
KiloCode kilocode kilocode models
Aider aider aider --list-models
Cursor Agent agent agent models

Litmus auto-detects which agents are available and queries their model lists.

Quick start

Requires Python 3.12+.

pip install litmus-llm
litmus init      # create a workspace with a sample scenario
litmus           # open the TUI

Or run without installing via uv:

uvx --from litmus-llm litmus

Development setup

git clone https://github.com/ivkond/litmus.git
cd litmus
uv sync
uv run litmus

TUI workflow

  1. 📋 Models — select agents and models to test
  2. 🧩 Scenarios — pick which coding tasks to run
  3. ▶️ Run — watch execution progress in real time
  4. 📊 Analysis — review LLM-judged scores
  5. 📄 Reports — browse generated HTML reports

How it works

Each scenario lives in template/<id>/ and contains:

template/1-data-structure/
  prompt.txt        # Task description sent to the agent
  task.txt          # Detailed requirements
  scoring.csv       # Evaluation criteria
  project/          # Starter code with tests

Execution pipeline per scenario:

uv sync  ->  agent call  ->  pytest  ->  collect logs

After all runs complete, an LLM judge evaluates the results using 20 agent criteria (tool efficiency, error recovery, reasoning depth...) and 20 model criteria (code correctness, instruction following, hallucination resistance...).

Configuration

On first launch, Litmus generates a config file with detected agents and their settings. Configure the analysis model (any OpenAI-compatible API) through the TUI settings screen.

Scenario packs

Litmus supports exporting and importing scenario archives (.litmus-pack ZIP files) for sharing test suites between machines or teams.

Project structure

src/litmus/
  __init__.py       # Entry point, workspace init
  app.py            # Main app, menu screen
  agents.py         # Agent registry, detection, model listing
  run.py            # Scenario execution engine
  analysis.py       # LLM-powered evaluation (20+20 criteria)
  report.py         # HTML report generation
  pack/             # Scenario export/import
  screens/          # TUI screens (models, scenarios, run, results, analysis)

Tech stack

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litmus_llm-0.3.1.tar.gz (47.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litmus_llm-0.3.1-py3-none-any.whl (55.9 kB view details)

Uploaded Python 3

File details

Details for the file litmus_llm-0.3.1.tar.gz.

File metadata

  • Download URL: litmus_llm-0.3.1.tar.gz
  • Upload date:
  • Size: 47.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for litmus_llm-0.3.1.tar.gz
Algorithm Hash digest
SHA256 a895717e1dff8df8864fbb7dbab2fc4f5f6869f3eb4caf348a0ef7c2c1cc44e1
MD5 1ff2486dcd78c49ccd75a18826f71ac0
BLAKE2b-256 c521f2330c286af7c7ec209dfbe6a894da9c536a702afe43228524df1e9068a5

See more details on using hashes here.

File details

Details for the file litmus_llm-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: litmus_llm-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 55.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for litmus_llm-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a65c8c3546fb1240fff816b3d55789bf49699daa50b20972f0e18d095b3eddc9
MD5 5222bb7274aa2c90eab3b652c23c510d
BLAKE2b-256 416cea2c944e72abbd166fb2cff4f9354753f819d430533d1ae558e8ac60eac7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page