FastAPI backend & desktop launcher for running local models via llama.cpp

These details have not been verified by PyPI

Project links

Project description

🦙 LLamaStudio

A desktop chat interface and local server manager for llama.cpp, crafted with FastAPI + HTMX for ultra-lightweight, zero-framework execution.

LLamaStudio is a self-contained local workspace that manages model lifecycles, features a smart VRAM estimator, scans local folders, and lets you search and download models directly from the Hugging Face Hub.

📸 Screenshots & Showcase

1. Main Chat Dashboard

A Pop!_OS-harmonized dark interface with streaming, collapsible markdown reasoning (thinking) processes, and real-time agentic tool execution logs. Main Chat Dashboard

2. GGUF Model Browser & Settings

A dynamic local model explorer that scans your directories and lets you adjust context length, GPU offload layers, CPU threads, flash attention, and KV cache quantizations on the fly. GGUF Model Browser

3. Hugging Face Discover Hub

Browse the entire Hugging Face GGUF catalog. Features a Smart VRAM Offload Estimator calibrated to your hardware, and a floating background download progress card with live speed (MB/s), ETA, and cancel controls. Hugging Face Discover Hub

✨ Key Features

⚡ Zero Node Modules: Built with HTMX, Tailwind CSS (via CDN), and Vanilla JS. It is incredibly fast, responsive, and has a memory footprint of just a few megabytes.
🧭 Hugging Face Discover Tab: Search the public Hugging Face Hub for GGUF models directly inside the app, view readmes, select quantizations, and download files in the background.
🚀 Smart VRAM Estimator: calulated specifically for your hardware (fits fully on RTX 5090 32GB VRAM, partial offload warning, or heavy CPU fallback warning).
📂 Automatic Model Scanning: Scans standard directories (like ~/.lmstudio/models) automatically on startup or via a one-click rescan button.
🪐 Process Lifecycle Manager: The underlying llama-server process only spins up when you explicitly load a model, releasing all system resources and GPU VRAM instantly when you click "Eject".
🔧 Configurable Workspace Sandboxing: Supports sandboxed agentic tool use (file read/write, commands, etc.) with real-time logs in the UI. Workspace and permission defaults are stored in the first-class app config.
👁️ Multimodal Media Chat: Drag raster images or WAV/MP3/FLAC audio into chat, or attach workspace media with lls oneshot --image / --audio; FLAC is normalized to llama.cpp-compatible WAV with ffmpeg.
🎙️ Local Push-to-Talk: Click the microphone once to record and again to stop. A managed whisper.cpp service transcribes locally and places editable text in the chat box; audio is never sent to the chat model.
🖥️ XDG-Compliant Persistence: App config, conversations, and first-class model profiles are stored outside the codebase directory in standard ~/.config/llamastudio/ with automated backward-compatible migrations.
📦 Full Linux & macOS Portability: Server binaries and model directories are resolved dynamically on startup.

🛠️ Installation & Setup

LLamaStudio is compatible with Linux and macOS out-of-the-box. Choose your OS and python virtual environment preference below.

🐧 1. Linux Installation

Prerequisites

Python 3.10+ (Recommended: Python 3.13)
llama.cpp built from source (or pre-compiled binary):
- By default, the app dynamically looks for the llama-server binary globally on your system PATH or locally inside your home directory at ~/llama.cpp/build/bin/llama-server.

Environment Setup

Option A: Install from PyPI

pip install llamastudio

Option B: Conda / Miniconda from source

# 1. Clone the repository
git clone https://github.com/gnulnx/LlamaStudio.git
cd LlamaStudio

# 2. Create and activate a conda environment
conda create -n llamastudio python=3.13 -y
conda activate llamastudio

# 3. Install LlamaStudio and its dependencies
pip install -e .

Option C: Python Virtualenv (`venv`) from source

# 1. Clone the repository
git clone https://github.com/gnulnx/LlamaStudio.git
cd LlamaStudio

# 2. Create and activate a python venv environment
python3 -m venv .venv
source .venv/bin/activate

# 3. Install LlamaStudio and its dependencies
pip install -e .

🖥️ Linux Desktop Launcher Integration (Optional)

To integrate LLamaStudio directly into your Linux Application launcher menu (e.g., GNOME / Pop!_OS):

# 1. Copy the desktop file to your local applications directory
cp llamastudio.desktop ~/.local/share/applications/

# 2. Copy the custom SVG icon to your local icons directory
mkdir -p ~/.local/share/icons/hicolor/128x128/apps/
cp llamastudio.svg ~/.local/share/icons/hicolor/128x128/apps/

# 3. Update your desktop database and icon cache
update-desktop-database ~/.local/share/applications/
gtk-update-icon-cache -f -t ~/.local/share/icons

Note: If you are using a virtualenv, edit the executable path inside ~/.local/share/applications/llamastudio.desktop to point to your specific .venv/bin/python interpreter.

🍏 2. macOS Installation

Prerequisites

Python 3.10+
llama.cpp installed globally via Homebrew (highly recommended for macOS):
```
brew install llama.cpp
```
(This automatically places the llama-server binary globally on your system PATH, which LLamaStudio will auto-detect immediately!)

Environment Setup

Option A: Install from PyPI

pip install llamastudio

Option B: Conda / Miniconda from source

# 1. Clone the repository
git clone https://github.com/gnulnx/LlamaStudio.git
cd LlamaStudio

# 2. Create and activate environment
conda create -n llamastudio python=3.13 -y
conda activate llamastudio

# 3. Install LlamaStudio and its dependencies
pip install -e .

Option C: Python Virtualenv (`venv`) from source

# 1. Clone the repository
git clone https://github.com/gnulnx/LlamaStudio.git
cd LlamaStudio

# 2. Create and activate venv
python3 -m venv .venv
source .venv/bin/activate

# 3. Install LlamaStudio and its dependencies
pip install -e .

🪟 3. Windows Installation

Note: Native Windows execution is currently untested. However, you can run LLamaStudio on Windows seamlessly via WSL2 (Windows Subsystem for Linux) by following the standard Linux Installation guide above.

Pull requests extending native Windows support (e.g., resolving .exe binaries) are highly welcome!

🚀 Running the Application

Option A: Via Unified CLI (`lls` - Recommended)

You can link and install LlamaStudio's CLI utility locally to control the desktop app and server seamlessly:

# Start the desktop application server and open browser UI
lls start

Option B: Via App Launcher Command

After installing from PyPI or source, run:

llamastudio

Via Application Menu (Linux)

Search for LLamaStudio in your desktop search bar (press Super, type "Llama") and click to launch!

🛠️ Unified Command-Line Interface (`lls`)

LlamaStudio features a CLI built using rich-click for visual dashboards and operational efficiency.

CLI Subcommands Reference

Command	Usage	Description
`start`	`lls start`	Starts the desktop app and opens the browser to the right first-run/chat/models/discover view.
`reload`	`lls reload`	Gracefully restarts the desktop FastAPI application backend.
`status`	`lls status`	Visual dashboard of FastAPI backend status, loaded model parameters, and GPU memory (VRAM).
`ls`	`lls ls`	Prints an elegant table of all GGUF models scanned across local directories.
`load`	`lls load [MODEL]`	Boots the server with a GGUF model. If `MODEL` is omitted, prompts you with an interactive menu.
`eject`	`lls eject`	Gracefully unloads the active model to free GPU and CPU RAM.
`oneshot`	`lls oneshot [--image PATH] [--audio PATH] [--no-thinking] [--max-tokens N] "prompt"`	Streams text, optional reasoning, tool calls, and multimodal workspace images/audio directly in your terminal. Use `--no-thinking` for low-latency direct answers.
`speech status`	`lls speech status`	Shows the local Whisper installation, selected model, compute mode, and server state.
`speech install`	`lls speech install [--model small.en]`	Installs pinned, checksum-verified `whisper.cpp` Linux binaries and a local Whisper model.
`speech load/eject`	`lls speech load [MODEL] [--gpu\|--cpu]`	Starts or stops the persistent speech-to-text server independently of the chat model.
`speech transcribe`	`lls speech transcribe AUDIO`	Transcribes a workspace audio file locally, with optional language and English translation controls.
`speech record`	`lls speech record [--device default]`	Starts terminal microphone capture immediately; press Enter to stop and print the transcript.

Set up push-to-talk once, then use the microphone beside the chat input:

lls speech install --model small.en
lls speech status
lls speech record

The browser control is a toggle, not a hold action. The first click starts recording, the red stop button ends it, and the transcript is inserted without auto-sending so it can be corrected first. Browser microphone access requires the loopback URL (http://127.0.0.1:8765) or HTTPS.

For low-latency vision classification, disable reasoning and keep the answer budget small:

lls oneshot --no-thinking --temperature 0 --max-tokens 32 \
  --image camera-frame.png "Answer in 10 words or fewer: what is ahead?"

For audio transcription or translation with an audio-capable model and projector:

lls oneshot --no-thinking --audio recording.flac \
  "Transcribe this audio, translate it to English, and respond briefly."

For example, to boot a model interactively:

$ lls load
Available Scanned Models:
  1. Qwen3.6-35B-A3B-UD-Q5_K_M (25.2 GB)
  2. gemma-4-26B-A4B-it-Q8_0 (25.0 GB)
  3. DeepSeek-R1-Distill-Qwen-32B-Q5_K_M (21.7 GB)

Select a model number to load: 3
Loading model 'DeepSeek-R1-Distill-Qwen-32B-Q5_K_M'...

⚙️ Configuration & Customization

The application runs fully out-of-the-box with no manual configuration. On first launch, LlamaStudio creates its runtime config under:

~/.config/llamastudio/
  config.json
  model_profiles.json
  conversations.json
  logs/

config.json stores app defaults, model search directories, workspace permissions, and launch state. model_profiles.json stores first-class per-model load and inference profiles. Older model_settings.json files are migrated automatically.

🛡️ Workspace Sandboxing & Embodiment

By default, LlamaStudio restricts agent tools (like reading, writing, and listing files) to the configured workspace directory to prevent accidental path traversals. For CLI launches, the first-run workspace defaults to the directory where lls start was run.

Workspace configuration is saved in ~/.config/llamastudio/config.json. Environment variables are still supported for advanced/bootstrap overrides, but normal users should not need a .env file.

Developer details for the config/profile architecture live in DEV.md.

🧪 Testing Suite

LlamaStudio features both standard unit tests and comprehensive GGUF integration tests.

1. Standard Unit Tests

Verify local installation and confirm backend routing, regex parsing, and sandboxing safety behaviors by running our mock-based test suite:

python -m unittest discover tests

2. GGUF Model Integration Tests

For local environments containing active GPUs and downloaded models, you can run the full multi-model GGUF tool-calling integration suite to verify real-time execution robustness across various chat templates:

# Run GGUF model integration tests locally
./tests/test_all.sh

(These tests are automatically skipped in standard CI/CD environments and default pytest runs using @pytest.mark.skipif to keep pipeline checks fast.)

🏗️ Project Structure

LlamaStudio/
├── pyproject.toml         # Package metadata, CLI entrypoint, and dependencies
├── DEV.md                 # Development notes for runtime config and profiles
├── llamastudio.desktop    # GNOME/Linux desktop launcher metadata
├── llamastudio.svg        # Custom application vector icon
├── app/
│   ├── config.py          # Settings & dynamic path configurations
│   ├── config_store.py    # First-class runtime config and model profiles
│   ├── main.py            # FastAPI backend endpoints & routing
│   ├── chat.py            # Conversations registry, templates & chat streaming
│   ├── downloader.py      # Async background download manager (chunked writes)
│   ├── model_manager.py   # Scans local paths and Hugging Face Hub
│   ├── server_manager.py  # llama-server subprocess process lifecycle controller
│   ├── logger.py          # Centralized logger
│   ├── tools.py           # Sandboxed local workspace tools for LLM agent use
│   └── templates/
│       └── index.html     # Interactive HTMX frontend interface
├── tests/
│   └── *.py               # Unit and integration-adjacent test coverage
└── imgs/
    ├── chat_interface.png # Screenshot: Main Chat interface
    ├── model_settings.png # Screenshot: Model explorer & settings
    └── discover_models.png# Screenshot: HF Discover & Downloader panel

📄 License

LLamaStudio is open-source software licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.7

Jul 23, 2026

1.0.6

Jul 22, 2026

1.0.5

May 27, 2026

1.0.3

May 26, 2026

1.0.2

May 26, 2026

1.0.1

May 26, 2026

1.0.0

May 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llamastudio-1.0.7.tar.gz (262.4 kB view details)

Uploaded Jul 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llamastudio-1.0.7-py3-none-any.whl (245.3 kB view details)

Uploaded Jul 23, 2026 Python 3

File details

Details for the file llamastudio-1.0.7.tar.gz.

File metadata

Download URL: llamastudio-1.0.7.tar.gz
Upload date: Jul 23, 2026
Size: 262.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.14

File hashes

Hashes for llamastudio-1.0.7.tar.gz
Algorithm	Hash digest
SHA256	`14b1ff89178ab507d7fbce9b46332dcbc3fc8221b889303ac3e72cedfc1063c3`
MD5	`2644860f09cb9e5e8388a93f0563e56c`
BLAKE2b-256	`318dc7e524f260650aef558f86c621a9cd6ddd6cb6d2daef66c2cc6655daee63`

See more details on using hashes here.

File details

Details for the file llamastudio-1.0.7-py3-none-any.whl.

File metadata

Download URL: llamastudio-1.0.7-py3-none-any.whl
Upload date: Jul 23, 2026
Size: 245.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.14

File hashes

Hashes for llamastudio-1.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0d534ed13fa19a1855aa34ad971974c9a1750d6c85f833fb3f4106507aa4d4d`
MD5	`b6d3c2bd78bd8dcdeca2a4d0159a08b9`
BLAKE2b-256	`ea79241feb779c7f52cd118d20a36f84dd97a72e715943e4aa430a30ed9d2a86`

See more details on using hashes here.

llamastudio 1.0.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🦙 LLamaStudio

📸 Screenshots & Showcase

1. Main Chat Dashboard

2. GGUF Model Browser & Settings

3. Hugging Face Discover Hub

✨ Key Features

🛠️ Installation & Setup

🐧 1. Linux Installation

Prerequisites

Environment Setup

Option A: Install from PyPI

Option B: Conda / Miniconda from source

Option C: Python Virtualenv (venv) from source

🖥️ Linux Desktop Launcher Integration (Optional)

🍏 2. macOS Installation

Prerequisites

Environment Setup

Option A: Install from PyPI

Option B: Conda / Miniconda from source

Option C: Python Virtualenv (venv) from source

🪟 3. Windows Installation

🚀 Running the Application

Option A: Via Unified CLI (lls - Recommended)

Option B: Via App Launcher Command

Via Application Menu (Linux)

🛠️ Unified Command-Line Interface (lls)

CLI Subcommands Reference

⚙️ Configuration & Customization

🛡️ Workspace Sandboxing & Embodiment

🧪 Testing Suite

1. Standard Unit Tests

2. GGUF Model Integration Tests

🏗️ Project Structure

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Option C: Python Virtualenv (`venv`) from source

Option C: Python Virtualenv (`venv`) from source

Option A: Via Unified CLI (`lls` - Recommended)

🛠️ Unified Command-Line Interface (`lls`)