A sequence-based LLM orchestration framework
Project description
Sequence-LLM
Sequence-LLM is a terminal-first orchestration tool for running local large language models through llama.cpp (llama-server) with automatic server lifecycle management, profile-based switching, and reproducible workflows.
It removes the need to manually start servers, remember commands, or write shell scripts when working with multiple models.
Sequence-LLM works with any hardware supported by llama.cpp — CPU, CUDA GPUs, ROCm, Metal, and more.
Cross-platform: Windows, Linux, macOS.
Why Sequence-LLM
Running local models often involves:
- Manually starting and stopping servers
- Remembering model paths and ports
- Managing multiple configurations
- Writing ad-hoc scripts to switch models
- Repeating setup across machines
Sequence-LLM solves this by providing:
- Named model profiles
- Automatic start and shutdown of servers
- Interactive chat interface
- Consistent configuration across machines
- Deterministic, script-free workflows
Who Is This For
- Developers running multiple local models
- AI engineers building local pipelines
- Researchers comparing architectures
- Self-hosting enthusiasts
- GPU workstation users
- CLI-first workflows
If you use tools like llama.cpp, with upcoming support for Ollama, LM Studio, or custom scripts - Sequence-LLM simplifies the workflow.
Features
- Interactive CLI built with Typer and Rich
- Profile-based model switching (
/brain,/coder, etc.) - Automatic shutdown of previous server before starting a new one
- Health checking with readiness polling
- Context-window safety guard (prevents overflow / crashes)
- Cross-platform process management using subprocess and psutil
- OS-aware configuration directory creation
- Conversation history management
- Status panel showing active model and server info
- First-run configuration wizard
Hardware Support
Sequence-LLM does not perform inference itself.
It orchestrates llama-server, meaning it works with:
- CPU inference
- NVIDIA CUDA GPUs
- AMD ROCm GPUs
- Apple Metal
- Any backend supported by llama.cpp
Comparison with Other Tools
| Tool | Primary Focus | Sequence-LLM Advantage |
|---|---|---|
| Ollama | Easy installs | Multi-model orchestration workflow |
| LM Studio | GUI | Lightweight CLI automation |
| Raw llama.cpp | Flexible | No manual scripts needed |
| Open-WebUI | Web UI | Minimal overhead terminal workflow |
Sequence-LLM sits between simplicity and flexibility.
Installation
Requirements
- Python 3.9+
llama-serverbinary from llama.cpp
Install from PyPI:
pip install sequence-llm
Quick Start
Run the CLI:
seq-llm
On first launch, a configuration file is created automatically.
Config locations:
- Windows:
%APPDATA%\sequence-llm\config.yaml - Linux:
~/.config/sequence-llm/config.yaml - macOS:
~/Library/Application Support/sequence-llm/config.yaml
Configuration Example
llama_server: "/path/to/llama-server"
defaults:
threads: 6
threads_batch: 8
batch_size: 512
profiles:
brain:
name: "Brain Model"
model_path: "/path/to/model.gguf"
system_prompt: "/path/to/system.txt"
port: 8081
ctx_size: 16384
temperature: 0.7
coder:
name: "Coder Model"
model_path: "/path/to/coder.gguf"
system_prompt: "/path/to/coder.txt"
port: 8082
ctx_size: 32768
temperature: 0.3
CLI Usage
/status → show active model and server status
/brain → switch to brain profile
/coder → switch to coder profile
/clear → clear conversation history
/quit → stop server and exit
Typing any text sends a message to the active model.
Example Workflow
- Start CLI
- Automatically load default model
- Switch between models using commands
- Chat interactively without restarting processes manually
Architecture
User → CLI → ServerManager → llama-server → Model
↑
Config + Profiles
Core components:
- CLI - interactive interface and command routing
- Server Manager - lifecycle control of llama-server
- API Client - communication with local inference server
- Config System - YAML-based profiles and defaults
Roadmap
Planned evolution:
- v0.3 — Multi-model named workflows
- v0.4 — TUI interface
- v0.5 — Hardware auto-optimization
- v1.0 — Production stability
For Development and Contributors
Clone repository:
git clone https://github.com/Ananay28425/Sequence-LLM.git
cd Sequence-LLM
pip install -e .
Run tests:
pytest -v
License
AGPL-3.0 License. See LICENSE file for details.
Contributing
Pull requests and issues are welcome.
GitHub: https://github.com/Ananay28425/Sequence-LLM
Sequence-LLM provides a lightweight and predictable way to manage local LLM workflows from the terminal.
Sequence-LLM - Orchestrate LLM workflows with ease.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sequence_llm-0.2.1.tar.gz.
File metadata
- Download URL: sequence_llm-0.2.1.tar.gz
- Upload date:
- Size: 34.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dffb6c4ad3bbb0de9f578f00f547ebabff617922b5845f66530370750e1b1b0f
|
|
| MD5 |
f3ba26b9e3a62c762d4ec36c7945c127
|
|
| BLAKE2b-256 |
b16fb112a07c8d9d7a89d3c1c613e76931b9adfd09cc3036ec91d45442e60938
|
File details
Details for the file sequence_llm-0.2.1-py3-none-any.whl.
File metadata
- Download URL: sequence_llm-0.2.1-py3-none-any.whl
- Upload date:
- Size: 32.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2baa63800eec0845ff008f6b5ed8aaca4ae26021f09f2d119f86af179220f48a
|
|
| MD5 |
273447ac806a56d4f0428c59fd6b93e7
|
|
| BLAKE2b-256 |
1bd63d92c10a99a74bbad0a8f1ee8b8fb9820ac8203a0bb6e7675191111cb55c
|