Skip to main content

A command-line interface for language models

Project description

llme, a CLI assistant for OpenAI-compatible chat servers

Alternative README

A simple, single-file command-line chat client compatible with the OpenAI API.

(or "I just want to quickly test my model hosted with llama.cpp but don't want to spin up openwebui")

Features

  • OpenAI API Compatible: Works with any self-hosted LLM platform that supports OpenAI chat completions API.
  • Extremely simple: Single file, no installation required (but installation is still available).
  • Command-line interface: Run it from the terminal.
  • Tools included: Ask it to act on your file system and edit files (yolo).

The basic idea is that LLMs are trained on code and OS configuration and already (machine) learnt to select the probable tools to use and actions to take. Therefore, there is no need to teach them to use made-up function and tools with bad json schemas. Just give them a shell, a python interpreter, and let you (only) live (once).

Use it as a helping (dummy assistant) to inspect configuration, source code, run commands, and edit files.

Installation

Quick-start a local LLM server if you don't have one already

Example with llama.cpp if you use homebrew. Look at https://github.com/ggerganov/llama.cpp for other options

brew install llama.cpp
llama-server -hf unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF --ctx-size 0 --jinja

Example with ollama. Look at https://ollama.com/download for other options

curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3-coder:30b

Qwen3-Coder-30b is a nice model. Smaller models can also works. See the benchmark for a comparison.

llme

Chose your preferred installation or execution method.

Install from PyPI (possibly an old version)

pipx install llme-cli
llme --help

Install from GitHub directly (latest dev version)

pipx install -f git+https://github.com/privat/llme.git
llme --help

Clone then install in development mode

git clone https://github.com/privat/llme.git
pipx install -e ./llme
llme --help

Clone and run from source (no installation)

git clone https://github.com/privat/llme.git
pip install -r llme/requirements.txt
./llme/llme/main.py --help

Usage

Run an interactive chat session

llme --base-url "http://localhost:8080/v1" # for default llama-server (llama.cpp)
llme --base-url "http://localhost:11434/v1" # for default ollama server

or if you want a specific model:

llme --base-url "http://localhost:8080/v1" --model "unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF"

Ctrl-C to interrupt a response (or exit).

Set up a config (optional, but recommended):

Edit ~/.config/llme/config.toml Look at config.toml for an example. More about options and configs below.

I assume, from now, that there is a config file...

Prompt engineered

The REPL interface allows you to navigate in the conversation history, fork it, and even edit it. It's easy to replay token generation, try different prompts, update parameters, or gaslight the assistant.

Run one-shot queries

Each prompt is run in order in the same chat session.

llme "What is the capital of France?" \
  "What the content of the current directory?" \
  "What is the current operating system?" \
  "What is the factorial of 153?" \
  "What is the weather at Tokyo right now?"

You can also pipe the query:

echo "What is the capital of France?" | llme

Note that interactive sessions are often better because, if needed, the model is loaded at the start of the command, so is loading while you type. Also no issues with escaping " or '

See below for more detailed information about interactive and batch modes.

Tools included

The LLM has direct access to your shell (and files) and a python interpreter. The user is asked for confirmation before executing any command. Beware, some LLMs might be very persistent and persuasive in running dangerous commands. Do not trust the LLM blindly!

If you chose to not execute a command, it will be skipped, and you can provide an explanation to the LLM or asks for a better command.

Some LLM might insist on not using a tool, asking the user to do it manually, or just simulate the action. A better prompt engineering might help. Proposals to improve the default system prompt are always welcome.

Inspect content of files or stdin

ps aux | llme "Which process consumes the most memory?"

you can also use file paths as assets to a prompt:

llme "how many regular users and regular groups are there in these files?" /etc/passwd /etc/group

Note: the file content and the path will be given to the LLM.

Inspect images (for multimodal models)

Same as for files, but with images — duh, images are files!

llme "What is in this image?" < image.png

you can still use paths:

llme "What is in this image?" image.png

Run yolo

Note: no warranty, yada yada, etc. llme can just kill your OS and cats. Do not run the following command without understanding what it does.

sudo llme --batch --yolo "Distupgrade the system. You are root! Do as you wish."

Options (and config)

$ llme --help
usage: llme [options...] [prompts...]

OpenAI-compatible chat CLI.

positional arguments:
  prompts               An initial list of prompts

options:
  -h, --help            show this help message and exit
  -u, --base-url URL    API base URL [base_url]
  -m, --model NAME      Model name or identifier [model]
  --list-models         List available models then exit
  --api-key SECRET      The API key [api_key]
  -b, --batch           Run non-interactively. Implicit if stdin is not a tty
                        [batch]
  -p, --plain           No colors or tty fanciness. Implicit if stdout is not
                        a tty [plain]
  --bulk                Disable stream-mode. Not that useful but it helps
                        debugging APIs [bulk]
  -o, --chat-output FILE
                        Export the full raw conversation in json
  -i, --chat-input FILE
                        Continue a previous (exported) conversation
  --export-metrics FILE
                        Export metrics, usage, etc. in json
  -s, --system SYSTEM_PROMPT
                        System prompt [system_prompt]
  --temperature TEMPERATURE
                        Temperature of predictions [temperature]
  --tool-mode {markdown,native}
                        How tools and functions are given to the LLM
                        [tool_mode]
  -c, --config FILE     Custom configuration files
  --list-tools          List available tools then exit
  --dump-config         Print the effective config and quit
  --plugin PATH         Add additional tool (python file or directory)
                        [plugins]
  -v, --verbose         Increase verbosity level (can be used multiple times)
  --log-file FILE       Write logs to a file [log_file]
  -Y, --yolo            UNSAFE: Do not ask for confirmation before running
                        tools. Combine with --batch to reach the singularity.
  --version             Display version information and quit

Boolean flags can be negated with `--no-`. Example `--no-plain` to force
colors in a non TTY

Note: Run a fresh --help in case I forgot to update this README.

All options with names in brackets can be set in the config file (base_url for --base-url). They can also be set by environment variables (LLME_BASE_URL for --base-url).

For each option, the precedence order is the following:

  1. The explicit option in the command line (the higher precedence)
  2. The explicit config files (given by --config) in reverse order (last wins)
  3. The environment variables (LLME_SOMETHING)
  4. The user configuration file (~/.config/llme/config.toml)
  5. The system configuration file provided by the package (the lowest precedence)

Slash Commands

Special commands can be executed during the chat. Those starts with a / and can be used when a prompt is expected (interactively or in the command line). The command /help show the available slash commands.

$ llme /help /quit
/models       list available models
/tools        list available tools
/metrics      list current metrics
/history      list condensed conversation history
/full-history list hierarchical conversation history (with forks)
/redo         cancel and regenerate the last assistant message
/undo         cancel the last user message (and the response) [PageUp]
/pass         go forward in history (cancel /undo) [PageDown]
/edit         run EDITOR on the chat (save,editor,load)
/save FILE    save chat
/load FILE    load chat
/clear        clear the conversation history
/goto M       jump after message M (e.g /goto 5c)
/config       list configuration options
/set OPT=VAL  change a config option
/quit         exit the program
/help         show this help

Note: Run a fresh /help in case I forgot to update this README.

Library, plugin system, and custom tools

Important: the API is far from stable.

LLME is usable as a library, so you can use its features. The main advantage for now to import llme is to add new custom tools usable by LLMs.

You can transform a python function into a tool with the annotation @llme.tool. Look at weather_plugin.py for an example.

Usages:

Run the weather plugin as a standalone program (it disables all LLM tools except the weather one).

./examples/weather_plugin.py 'Will it rains tomorrow at Paris?'

Use llme with the --plugin option to add one (or more) plugin and bring in all their tools.

llme --plugin examples/weather_plugin.py 'Will it rains tomorrow at Paris?'

Or whole directories!

llme --plugin examples 'Will it rains tomorrow at Paris?'

Batch mode

llme can be used in batch or in interactive mode.

The batch mode, with --batch, is the default when stdin is not a tty.

If there are no prompts on the command line, then the stdin is read and used as a single big prompt and the program terminates.

Otherwise, each prompt from the command line is used one after the other and the program terminates. If stdin is not a tty, it is read and used as attached data (text or image) send with the first prompt of the command line.

Tools can be used by the assistant in batch mode, but if a confirmation is required, the program will exit with an error (unless --yolo is used).

Interactive mode

The interactive mode, with --no-batch, is the default when stdin is a tty.

When both stdin and stdout are tty, the rich terminal interface with prompt_toolkit is used and provide completion, history, keybindings, etc. Otherwise, it falls back to a simple input() interface that process each line as a prompt.

As with the batch mode, the prompts of the command line are used first, one after the other, then the user can provide prompts interactively.

Tools can be used by the assistant in interactive mode, and the user might be asked for confirmation.

Also, most errors are not fatal in interactive mode.

Development

I do not like Python, nor LLMs, but I needed something simple to test things quickly and play around. My goal is to keep this simple and minimal: it should fit into a single file and still be manageable.

PR are welcome!

TODO

  • OpenAI API features
    • API token (untested)
    • list models
    • stream mode
    • bulk mode (non stream mode)
    • thinking mode
    • multimodal
    • attached files
    • attached images
    • ?
  • Tools
    • markdown tools
    • native tools
    • run shell command
    • run Python code
    • user-defined tools
    • sandboxing
    • whitelist/blacklist
  • User interface & features
    • readline
    • better prompt & history
    • braille spinner
    • model warmup
    • save/load conversation
    • export metrics/usage/statistics
    • slash commands
    • completion for slash commands
    • undo/retry/edit
    • error recovery
    • better tool reporting
    • Usable in pipelines or without a TTY
    • post-processing output
    • attach files in interactive mode
    • ?
  • Customization and models
    • config files
    • config with env vars
    • type check / conversion
    • plugin system
    • better tool selection
    • temperature
    • other hyper parameters
    • handle non-conform thinking & tools
    • detect model features (is that even possible?)
    • bench system & reporting
    • user-defined additional data
    • user-defined filters
  • Code quality
    • docstring and comments
    • small code base
    • small methods
    • better logging
    • tests suites
    • robustness and error handling
    • better separation of CLI and LLM
    • better libification
  • Misc
    • README
    • Vibe README
    • TODO list :p
    • build file
    • PyPI package
    • plugin example
    • ?

OpenAI API

The two HTML routes used by llme are:

Images are uploaded as content parts, for multimodal models.

Tools are integrated with either --tool-mode=native for the native function API (https://platform.openai.com/docs/guides/function-calling), or with --tool-mode=markdown a custom approach intended for models that does not support it (or performs poorly with it). Custom tools can be profited, see the --plugin option.

Issues

  • The various OpenAI compatible servers and models implement different subsets. Compatibility is worked on and there is less random 4xx or 5xx responses. Major local LLM servers and servers were tested. See the benchmark
  • Models are really sensitive to prompts and system prompts, but you can create a custom config file for each.
  • Models are really sensitive to how the messages are structured, unfortunately that is currently hardcoded in the program. I do not want to hard-code many tweaks and workarounds. :(

Thanks

  • openwebui for an inspiration, but too complex and web oriented.
  • gptme for another inspiration, but also too complex and targets too much non-local LLMs.
  • openai-cli for a simpler approach I built on top of.
  • llama.cpp, nexa-sdk and others for your great work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llme_cli-0.1.4.tar.gz (80.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llme_cli-0.1.4-py3-none-any.whl (40.1 kB view details)

Uploaded Python 3

File details

Details for the file llme_cli-0.1.4.tar.gz.

File metadata

  • Download URL: llme_cli-0.1.4.tar.gz
  • Upload date:
  • Size: 80.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for llme_cli-0.1.4.tar.gz
Algorithm Hash digest
SHA256 0deb4eca7b5f03258f04e1333116121484454669cf0981243b3d8a2c94a23d55
MD5 b9fcf3a9f0fb50b0cff1c72d55b513d0
BLAKE2b-256 17a8cc55f1ede29a9993c6c67b2f5da68c2a603831d4d2e23220f713f0154783

See more details on using hashes here.

File details

Details for the file llme_cli-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: llme_cli-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 40.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for llme_cli-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 16dd7f35af09ec2fbb9bf69a5db1c66b4631f45418afc1ac90f302b9728662b8
MD5 d5660fa35e7bf8caba0e40effeec8aea
BLAKE2b-256 97da1259874596026a784f3dfda8fd3d6ac6e24e358d2cfaf392c13812f7762a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page