Interactive CLI chat client for vLLM inference servers with persistent sessions and automatic context management

These details have not been verified by PyPI

Project description

Zorac - Self-Hosted Local LLM Chat Client

Python License vLLM Platform GPU

A fun terminal chat client for running local LLMs on consumer hardware. Chat with powerful AI models like Mistral-24B privately on your own RTX 4090/3090 - no cloud, no costs, complete privacy.

Perfect for developers who want a self-hosted ChatGPT alternative running on their gaming PC or homelab server. Also good for local AI coding assistants, agentic workflows and agent development.

Named after ZORAC, the intelligent Ganymean computer from James P. Hogan's The Gentle Giants of Ganymede.

Why Self-Host Your LLM?

Zero ongoing costs - No API fees, run unlimited queries
Complete privacy - Your data never leaves your machine
Low latency - Sub-second responses on local hardware
Use existing hardware - Your gaming GPU works great for AI
Full control - Customize models, parameters, and behavior
Work offline - No internet required after initial setup

Features

Interactive CLI - Natural conversation flow with continuous input prompts
Rich Terminal UI - Beautiful formatted output with optimized readability
- Left-aligned content with 60% width constraint for comfortable reading
- Syntax-highlighted code blocks and formatted markdown
- Clean, modern design without unnecessary clutter
Streaming Responses - Real-time token streaming with live markdown display
Persistent Sessions - Automatically saves and restores conversation history
Smart Context Management - Automatically summarizes old messages when approaching token limits
Token Tracking - Real-time monitoring of token usage with tiktoken
Performance Metrics - Displays tokens/second, response time, and resource usage
Configurable - Adjust all parameters via .env, config file, or runtime commands

Demo

Rich Terminal UI with Live Streaming

Interactive chat with real-time streaming responses, markdown rendering, and performance metrics

Zorac Chat Interface

Token Management & Commands

Built-in commands for session management and token tracking

Token Usage and Commands

Quick Start

1. Install Zorac

Homebrew (macOS/Linux):

brew tap chris-colinsky/zorac
brew install zorac

pip/pipx (All Platforms):

# Using pipx (recommended - isolated environment)
pipx install zorac

# Using pip
pip install zorac

# Using uv
uv tool install zorac

Windows Users: Use WSL (Windows Subsystem for Linux) and follow the Linux/pip instructions.

2. Set Up vLLM Server

You need a vLLM inference server running. See SERVER_SETUP.md for complete setup instructions.

3. Configure & Run

First Run:

When you start Zorac for the first time, you'll be greeted with a setup wizard:

$ zorac

     ███████╗ ██████╗ ██████╗  █████╗  ██████╗
     ╚══███╔╝██╔═══██╗██╔══██╗██╔══██╗██╔════╝
       ███╔╝ ██║   ██║██████╔╝███████║██║
      ███╔╝  ██║   ██║██╔══██╗██╔══██║██║
     ███████╗╚██████╔╝██║  ██║██║  ██║╚██████╗
     ╚══════╝ ╚═════╝ ╚═╝  ╚═╝╚═╝  ╚═╝ ╚═════╝
        intelligence running on localhost

────────────────────── Welcome to Zorac! ──────────────────────

This appears to be your first time running Zorac.
Let's configure your vLLM server connection.

Server Configuration:
  Default: http://localhost:8000/v1
  vLLM Server URL (or press Enter for default):

  Default: stelterlab/Mistral-Small-24B-Instruct-2501-AWQ
  Model name (or press Enter for default):

✓ Configuration saved to ~/.zorac/config.json
You can change these settings anytime with /config

Viewing Configuration:

After setup, you can view or modify your configuration anytime:

# View all settings
You: /config list

Configuration:
  VLLM_BASE_URL:      http://localhost:8000/v1
  VLLM_MODEL:         stelterlab/Mistral-Small-24B-Instruct-2501-AWQ
  MAX_INPUT_TOKENS:   12000
  MAX_OUTPUT_TOKENS:  4000
  TEMPERATURE:        0.1

# Update a setting
You: /config set VLLM_BASE_URL http://YOUR_SERVER:8000/v1
✓ Updated VLLM_BASE_URL in ~/.zorac/config.json

# See all available commands
You: /help

Alternative (Source Users):

If running from source, you can also create a .env file:

VLLM_BASE_URL=http://localhost:8000/v1
VLLM_MODEL=stelterlab/Mistral-Small-24B-Instruct-2501-AWQ

Documentation

User Guides

Installation Guide - All installation methods (binary, source, development)
Configuration Guide - Server setup, token limits, model parameters
Usage Guide - Commands, session management, tips & tricks

Technical Documentation

Development Guide - Contributing, testing, release process
Server Setup - Complete vLLM server installation and optimization
Claude.md - AI assistant development guide
Changelog - Version history and release notes
Contributing - Contribution guidelines

Supported Hardware

This works on consumer gaming GPUs:

GPU	VRAM	Model Size	Performance
RTX 4090	24GB	Up to 24B (AWQ)	60-65 tok/s ⭐
RTX 3090 Ti	24GB	Up to 24B (AWQ)	55-60 tok/s
RTX 3090	24GB	Up to 24B (AWQ)	55-60 tok/s
RTX 4080	16GB	Up to 14B (AWQ)	45-50 tok/s
RTX 4070 Ti	12GB	Up to 7B (AWQ)	40-45 tok/s
RTX 3080	10GB	Up to 7B (AWQ)	35-40 tok/s

Recommended configuration. See SERVER_SETUP.md for optimization details.

Use Cases

Local ChatGPT alternative - Private conversations, no data collection
Coding assistant - Works with Continue.dev, Cline, and other AI coding tools
Agentic workflows - LangChain/LangGraph running entirely local
Content generation - Write, summarize, analyze - all offline
AI experimentation - Test prompts and models without API costs
Learning AI/ML - Understand LLM inference without cloud dependencies

Why Mistral-Small-24B-AWQ?

This application is optimized for Mistral-Small-24B-Instruct-2501-AWQ:

Superior Intelligence - 24B parameters offers significantly better reasoning than 7B/8B models
Consumer Hardware Ready - 4-bit AWQ quantization fits in 24GB VRAM
High Performance - AWQ with Marlin kernel enables 60-65 tok/s on RTX 4090

You can use any vLLM-compatible model by changing the VLLM_MODEL setting.

FAQ

Can I run this without a GPU?

No, this requires an NVIDIA GPU with at least 10GB VRAM. CPU-only inference is too slow for interactive chat (would take minutes per response).

How does this compare to running Ollama?

Zorac uses vLLM for faster inference (60+ tok/s vs Ollama's 20-30 tok/s on the same hardware) and supports more advanced features like tool calling for agentic workflows. Ollama is easier to set up but slower for production use.

Do I need to be online?

Only for the initial model download (~14GB for Mistral-24B-AWQ). After that, everything runs completely offline on your local machine.

Is this legal? Can I use this commercially?

Yes! Mistral-Small is Apache 2.0 licensed, which allows free commercial use. vLLM is also open source (Apache 2.0). No restrictions.

What about AMD GPUs or Mac M-series chips?

This guide is specifically for NVIDIA GPUs using CUDA. For AMD GPUs, you'd need ROCm support (experimental). For Mac M-series, check out MLX or llama.cpp instead.

How much does it cost to run?

Electricity cost for an RTX 4090 running at ~300W is roughly $0.05-0.10 per hour (depending on your electricity rates). Far cheaper than API costs for heavy usage.

What other models can I run?

Any model with vLLM support: Llama, Qwen, Phi, DeepSeek, etc. Just change the VLLM_MODEL setting. Check vLLM's supported models.

Requirements

For Binary Users: Nothing! Just download and run.
For Source Users: Python 3.13+, uv package manager
For Server: NVIDIA GPU with 10GB+ VRAM, vLLM inference server

License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Support

Read the Documentation
Report bugs via GitHub Issues
Request features via GitHub Issues
Check vLLM Documentation for server issues

Star this repo if you find it useful! ⭐

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.4.1

Feb 27, 2026

1.4.0

Feb 25, 2026

1.3.1

Feb 17, 2026

1.3.0

Feb 16, 2026

This version

1.2.0

Feb 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zorac-1.2.0.tar.gz (950.5 kB view details)

Uploaded Feb 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zorac-1.2.0-py3-none-any.whl (20.6 kB view details)

Uploaded Feb 3, 2026 Python 3

File details

Details for the file zorac-1.2.0.tar.gz.

File metadata

Download URL: zorac-1.2.0.tar.gz
Upload date: Feb 3, 2026
Size: 950.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zorac-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a9cc33bcb644ada4fd1a4c799b5fd7062f66d47c0d9880f9aa3770db3b3cfbd5`
MD5	`29742f7106f011ead95f811039482d87`
BLAKE2b-256	`43ad2f3316edfb0ca63b9c3bacdaf201ab8adcea4d73847bb118366142b19829`

See more details on using hashes here.

Provenance

The following attestation bundles were made for zorac-1.2.0.tar.gz:

Publisher: release.yml on chris-colinsky/Zorac

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: zorac-1.2.0.tar.gz
- Subject digest: a9cc33bcb644ada4fd1a4c799b5fd7062f66d47c0d9880f9aa3770db3b3cfbd5
- Sigstore transparency entry: 907828586
- Sigstore integration time: Feb 3, 2026
Source repository:
- Permalink: chris-colinsky/Zorac@3b0ff21767b41cde5b8de177faa271d8dc9a771f
- Branch / Tag: refs/tags/v1.2.0
- Owner: https://github.com/chris-colinsky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3b0ff21767b41cde5b8de177faa271d8dc9a771f
- Trigger Event: push

File details

Details for the file zorac-1.2.0-py3-none-any.whl.

File metadata

Download URL: zorac-1.2.0-py3-none-any.whl
Upload date: Feb 3, 2026
Size: 20.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for zorac-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`650971b31671383116a21dfc6c7e32ccc9b3ca4c72a0fa9cb0f5336b69c22fb9`
MD5	`64856c7eb1eb4006d4699fefb7874379`
BLAKE2b-256	`43d5e61e79adb80de811b29ad22b7732ea076e8363cf9309a209545da507605d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for zorac-1.2.0-py3-none-any.whl:

Publisher: release.yml on chris-colinsky/Zorac

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: zorac-1.2.0-py3-none-any.whl
- Subject digest: 650971b31671383116a21dfc6c7e32ccc9b3ca4c72a0fa9cb0f5336b69c22fb9
- Sigstore transparency entry: 907828597
- Sigstore integration time: Feb 3, 2026
Source repository:
- Permalink: chris-colinsky/Zorac@3b0ff21767b41cde5b8de177faa271d8dc9a771f
- Branch / Tag: refs/tags/v1.2.0
- Owner: https://github.com/chris-colinsky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3b0ff21767b41cde5b8de177faa271d8dc9a771f
- Trigger Event: push

zorac 1.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Zorac - Self-Hosted Local LLM Chat Client

Why Self-Host Your LLM?

Features

Demo

Rich Terminal UI with Live Streaming

Token Management & Commands

Quick Start

1. Install Zorac

2. Set Up vLLM Server

3. Configure & Run

Documentation

User Guides

Technical Documentation

Supported Hardware

Use Cases

Why Mistral-Small-24B-AWQ?

FAQ

Requirements

License

Contributing

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance