Skip to main content

A sequence-based LLM orchestration framework

Project description

Sequence-LLM

Sequence-LLM is a terminal-first orchestration tool for running local large language models through llama.cpp (llama-server) with automatic server lifecycle management, profile-based switching, and reproducible workflows.

It removes the need to manually start servers, remember commands, or write shell scripts when working with multiple models.

Sequence-LLM works with any hardware supported by llama.cpp — CPU, CUDA GPUs, ROCm, Metal, and more.

Cross-platform: Windows, Linux, macOS.


Why Sequence-LLM

Running local models often involves:

  • Manually starting and stopping servers
  • Remembering model paths and ports
  • Managing multiple configurations
  • Writing ad-hoc scripts to switch models
  • Repeating setup across machines

Sequence-LLM solves this by providing:

  • Named model profiles
  • Automatic start and shutdown of servers
  • Interactive chat interface
  • Consistent configuration across machines
  • Deterministic, script-free workflows

Who Is This For

  • Developers running multiple local models
  • AI engineers building local pipelines
  • Researchers comparing architectures
  • Self-hosting enthusiasts
  • GPU workstation users
  • CLI-first workflows

If you use tools like llama.cpp, with upcoming support for Ollama, LM Studio, or custom scripts - Sequence-LLM simplifies the workflow.


Features

  • Interactive CLI built with Typer and Rich
  • Profile-based model switching (/brain, /coder, etc.)
  • Automatic shutdown of previous server before starting a new one
  • Health checking with readiness polling
  • Context-window safety guard (prevents overflow / crashes)
  • Cross-platform process management using subprocess and psutil
  • OS-aware configuration directory creation
  • Conversation history management
  • Status panel showing active model and server info
  • First-run configuration wizard

Hardware Support

Sequence-LLM does not perform inference itself.

It orchestrates llama-server, meaning it works with:

  • CPU inference
  • NVIDIA CUDA GPUs
  • AMD ROCm GPUs
  • Apple Metal
  • Any backend supported by llama.cpp

Comparison with Other Tools

Tool Primary Focus Sequence-LLM Advantage
Ollama Easy installs Multi-model orchestration workflow
LM Studio GUI Lightweight CLI automation
Raw llama.cpp Flexible No manual scripts needed
Open-WebUI Web UI Minimal overhead terminal workflow

Sequence-LLM sits between simplicity and flexibility.


Installation

Requirements

  • Python 3.9+
  • llama-server binary from llama.cpp

Install from PyPI:

pip install sequence-llm

Quick Start

Run the CLI:

seq-llm

On first launch, a configuration file is created automatically.

Config locations:

  • Windows: %APPDATA%\sequence-llm\config.yaml
  • Linux: ~/.config/sequence-llm/config.yaml
  • macOS: ~/Library/Application Support/sequence-llm/config.yaml

Configuration Example

llama_server: "/path/to/llama-server"

defaults:
  threads: 6
  threads_batch: 8
  batch_size: 512

profiles:
  brain:
    name: "Brain Model"
    model_path: "/path/to/model.gguf"
    system_prompt: "/path/to/system.txt"
    port: 8081
    ctx_size: 16384
    temperature: 0.7

  coder:
    name: "Coder Model"
    model_path: "/path/to/coder.gguf"
    system_prompt: "/path/to/coder.txt"
    port: 8082
    ctx_size: 32768
    temperature: 0.3

CLI Usage

/status   → show active model and server status
/brain    → switch to brain profile
/coder    → switch to coder profile
/clear    → clear conversation history
/quit     → stop server and exit

Typing any text sends a message to the active model.


Example Workflow

  1. Start CLI
  2. Automatically load default model
  3. Switch between models using commands
  4. Chat interactively without restarting processes manually

Architecture

User → CLI → ServerManager → llama-server → Model
           ↑
        Config + Profiles

Core components:

  • CLI - interactive interface and command routing
  • Server Manager - lifecycle control of llama-server
  • API Client - communication with local inference server
  • Config System - YAML-based profiles and defaults

Roadmap

Planned evolution:

  • v0.3 — Multi-model named workflows
  • v0.4 — TUI interface
  • v0.5 — Hardware auto-optimization
  • v1.0 — Production stability

For Development and Contributors

Clone repository:

git clone https://github.com/Ananay28425/Sequence-LLM.git
cd Sequence-LLM
pip install -e .

Run tests:

pytest -v

License

AGPL-3.0 License. See LICENSE file for details.


Contributing

Pull requests and issues are welcome.

GitHub: https://github.com/Ananay28425/Sequence-LLM


Sequence-LLM provides a lightweight and predictable way to manage local LLM workflows from the terminal.

Sequence-LLM - Orchestrate LLM workflows with ease.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequence_llm-0.2.1.tar.gz (34.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sequence_llm-0.2.1-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file sequence_llm-0.2.1.tar.gz.

File metadata

  • Download URL: sequence_llm-0.2.1.tar.gz
  • Upload date:
  • Size: 34.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for sequence_llm-0.2.1.tar.gz
Algorithm Hash digest
SHA256 dffb6c4ad3bbb0de9f578f00f547ebabff617922b5845f66530370750e1b1b0f
MD5 f3ba26b9e3a62c762d4ec36c7945c127
BLAKE2b-256 b16fb112a07c8d9d7a89d3c1c613e76931b9adfd09cc3036ec91d45442e60938

See more details on using hashes here.

File details

Details for the file sequence_llm-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: sequence_llm-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for sequence_llm-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2baa63800eec0845ff008f6b5ed8aaca4ae26021f09f2d119f86af179220f48a
MD5 273447ac806a56d4f0428c59fd6b93e7
BLAKE2b-256 1bd63d92c10a99a74bbad0a8f1ee8b8fb9820ac8203a0bb6e7675191111cb55c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page