A sequence-based LLM orchestration framework
Project description
Sequence-LLM
Sequence-LLM is a terminal-first CLI for running local LLMs through llama-server (llama.cpp) with profile-based model switching and automatic server lifecycle management.
It is designed for developers who run multiple local models and want a simple, reproducible workflow without writing shell scripts.
Cross-platform: Windows, Linux, macOS.
Why Sequence-LLM
Running local models often involves:
- Manually starting and stopping servers
- Remembering model paths and ports
- Managing multiple configurations
- Writing ad-hoc scripts to switch models
Sequence-LLM solves this by providing:
- Named model profiles
- Automatic start and shutdown of servers
- Interactive chat interface
- Consistent configuration across machines
Features
- Interactive CLI built with Typer and Rich
- Profile-based model switching (
/brain,/coder, etc.) - Automatic shutdown of previous server before starting a new one
- Health checking with readiness polling
- Cross-platform process management using subprocess and psutil
- OS-aware configuration directory creation
- Conversation history management
- Status panel showing active model and server info
Installation
Requirements
- Python 3.9+
llama-serverbinary from llama.cpp
Install from PyPI:
pip install sequence-llm
Quick Start
Run the CLI:
pip install sequence-llm
seq-llm
On first launch, a configuration file is created automatically.
Config locations:
- Windows:
%APPDATA%\sequence-llm\config.yaml - Linux:
~/.config/sequence-llm/config.yaml - macOS:
~/Library/Application Support/sequence-llm/config.yaml
Configuration Example
llama_server: "/path/to/llama-server"
defaults:
threads: 6
threads_batch: 8
batch_size: 512
profiles:
brain:
name: "Brain Model"
model_path: "/path/to/model.gguf"
system_prompt: "/path/to/system.txt"
port: 8081
ctx_size: 16384
temperature: 0.7
coder:
name: "Coder Model"
model_path: "/path/to/coder.gguf"
system_prompt: "/path/to/coder.txt"
port: 8082
ctx_size: 32768
temperature: 0.3
CLI Usage
/status → show active model and server status
/brain → switch to brain profile
/coder → switch to coder profile
/clear → clear conversation history
/quit → stop server and exit
Typing any text sends a message to the active model.
Example Workflow
- Start CLI
- Automatically load default model
- Switch between models using commands
- Chat interactively without restarting processes manually
Architecture
Core components:
- CLI: interactive interface and command routing
- Server Manager: lifecycle control of llama-server
- API Client: communication with local inference server
- Config System: YAML-based profiles and defaults
Development
Clone repository:
git clone https://github.com/Ananay28425/Sequence-LLM.git
cd Sequence-LLM
pip install -e .
Run tests:
pytest -v
License
MIT License. See LICENSE file for details.
Contributing
Pull requests and issues are welcome.
GitHub: https://github.com/Ananay28425/Sequence-LLM
Sequence-LLM provides a lightweight and predictable way to manage local LLM workflows from the terminal.
Sequence-LLM - Orchestrate LLM sequences with ease.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sequence_llm-0.1.1.tar.gz.
File metadata
- Download URL: sequence_llm-0.1.1.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b9f15b26453f31d334ce8bb451e983b68440d7239be0700c1893f5a11314039
|
|
| MD5 |
4508d1cb1ccff7c8e977dd24f160dfdc
|
|
| BLAKE2b-256 |
cbe641e95c6879c9d3a21982d447ef916a8c6517d60c91a4d3da0d51bc5c52f9
|
File details
Details for the file sequence_llm-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sequence_llm-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd29a3daa1cdd197b26ac6b1768f52c5dffbb79c2a66e33bc937b4c5a5023394
|
|
| MD5 |
03890c63b71a8d56b11979df9772adb4
|
|
| BLAKE2b-256 |
fc25f440d9b01a287ad27db9846db6bafb912c83a25a6066d3dfa211c5f2718e
|