The unified LLM runtime — local inference, API proxy, and monitoring. A powerful alternative to Ollama + LiteLLM, built in Rust.
Project description
LinkLLM
The unified LLM runtime — local inference, API proxy, and monitoring in one blazing-fast tool.
A powerful alternative to Ollama + LiteLLM, built from the ground up in Rust.
Quick Start · Features · Installation · Usage · API · Documentation · Contributing
# Install and run your first model in under 60 seconds
curl -fsSL https://install.linkllm.dev | sh
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
linkllm chat mistral
What is LinkLLM?
LinkLLM is a single tool that replaces both Ollama and LiteLLM — plus goes further. It gives you:
- Local inference of any GGUF model (llama.cpp-powered, pure-Rust candle backend)
- API proxy to OpenAI, Gemini, Anthropic, Groq, and any OpenAI-compatible endpoint
- Model management — pull any model from HuggingFace with one command
- Production-ready REST API with OpenAI-compatible routes, auth, rate limiting, TLS
- Real-time monitoring dashboard right inside your terminal
- Multi-model routing with fallback chains and cost tracking
All in a single binary. No Docker required. Works on Windows, macOS, Linux, and Termux.
✨ Features
🦀 Rust-Powered Core
Built on Tokio + Axum — async from the ground up. Memory safe, no garbage collector pauses, minimal footprint.
🤖 Local Model Inference
- Run GGUF models via
llama.cppFFI bindings — same performance, Rust-safe wrapper - Pure Rust inference with candle (no C++ dependency)
- GPU acceleration: CUDA, ROCm, Apple Metal — auto-detected
- Quantization: Q4_K_M, Q5_K_S, Q8_0, F16 and more
🌐 Universal API Proxy
Route requests to any provider through a single unified API:
| Provider | Models |
|---|---|
| OpenAI | gpt-4o, o1, gpt-4-turbo, ... |
| Google Gemini | gemini-2.0-flash, gemini-1.5-pro, ... |
| Anthropic | claude-3-5-sonnet, claude-3-opus, ... |
| Groq | llama3, mixtral (ultra-fast) |
| Together AI | 50+ open models |
| Any OpenAI-compat | Custom base URL |
📦 HuggingFace Model Pull
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
linkllm pull TheBloke/Llama-2-13B-chat-GGUF --quant Q4_K_M
linkllm pull google/gemma-2-9b-it
Resume interrupted downloads. SHA-256 integrity check. Auto-conversion to GGUF.
📊 Terminal Monitoring Dashboard
linkllm monitor
Real-time TUI powered by Ink:
- Tokens/second live graph
- Latency histograms (p50 / p95 / p99)
- Active model memory usage
- Per-provider cost breakdown
- Request log (live tail)
- API key usage tracker
- Error rate + alerts
🔐 Security-First Design
- AES-256-GCM encrypted API key store (OS keychain integration)
- TLS 1.3 by default, mTLS for production
- HMAC request signing in the Rust SDK
- JWT bearer tokens for server access
- Per-key rate limits and quotas
- Sandboxed model inference
🔀 Multi-Model Routing
Define routing rules in linkllm.toml:
[routing]
default = "mistral"
[[routing.rules]]
match = "code"
model = "deepseek-coder"
[[routing.rules]]
match = "long-context"
model = "gemini-1.5-pro"
fallback = ["gpt-4o", "claude-3-opus"]
⚡ Quick Start
1. Install
Linux / macOS / Termux:
curl -fsSL https://install.linkllm.dev | sh
Windows (PowerShell):
irm https://install.linkllm.dev/windows | iex
Homebrew:
brew install linkllm/tap/linkllm
npm (CLI only):
npm install -g linkllm
pip (Python SDK + CLI):
pip install linkllm
From source:
git clone https://github.com/linkllm/linkllm
cd linkllm
cargo build --release
2. Pull a Model
# Pull from HuggingFace (GGUF auto-detected)
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
# Specify quantization
linkllm pull TheBloke/Llama-2-13B-chat-GGUF --quant Q4_K_M
# List downloaded models
linkllm list
3. Chat in Terminal
linkllm chat mistral
linkllm chat gpt-4o # routes to OpenAI (needs API key)
linkllm chat gemini-flash # routes to Google Gemini
4. Start the Server
linkllm serve
# Server running at http://localhost:11434
# OpenAI-compatible API at http://localhost:11434/v1
5. Monitor
linkllm monitor
🔌 API
LinkLLM exposes a fully OpenAI-compatible REST API. Drop it in as a replacement for api.openai.com:
Chat Completions
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "mistral",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'
Python — OpenAI SDK Compatible
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="mistral",
messages=[{"role": "user", "content": "Explain Rust ownership"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
Python — LinkLLM Native SDK
pip install linkllm
import linkllm
client = linkllm.Client()
# Chat with any model — local or API
response = client.chat("mistral", "What is the capital of France?")
print(response.text)
# Streaming
for token in client.stream("gpt-4o", "Write a haiku about Rust"):
print(token, end="", flush=True)
# Pull a model programmatically
client.pull("TheBloke/Mistral-7B-Instruct-v0.2-GGUF")
# List local models
models = client.list()
for m in models:
print(f"{m.name} — {m.size_gb:.1f} GB")
TypeScript / JavaScript
npm install linkllm
import { LinkLLM } from "linkllm";
const client = new LinkLLM({ baseUrl: "http://localhost:11434" });
// Chat
const response = await client.chat({
model: "mistral",
messages: [{ role: "user", content: "Hello from TypeScript!" }],
});
console.log(response.content);
// Streaming
const stream = client.stream({
model: "gpt-4o",
messages: [{ role: "user", content: "Tell me a story" }],
});
for await (const token of stream) {
process.stdout.write(token);
}
Rust SDK
# Cargo.toml
[dependencies]
linkllm = "0.1"
tokio = { version = "1", features = ["full"] }
use linkllm::{Client, ChatMessage, Role};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::new("http://localhost:11434")?;
let response = client
.chat("mistral")
.message(Role::User, "What is Rust?")
.send()
.await?;
println!("{}", response.content());
Ok(())
}
⚙️ Configuration
LinkLLM is configured via ~/.linkllm/config.toml:
[server]
host = "127.0.0.1"
port = 11434
tls = false
[models]
default = "mistral"
model_dir = "~/.linkllm/models"
[inference]
gpu_layers = -1 # -1 = auto (offload all to GPU)
context_size = 4096
threads = 8
[api_keys]
# Encrypted. Use `linkllm key add` to set these safely.
openai = ""
gemini = ""
anthropic = ""
groq = ""
[routing]
default = "mistral"
fallback_chain = ["mistral", "gpt-4o-mini"]
[monitoring]
enabled = true
metrics_port = 9090 # Prometheus-compatible /metrics
log_level = "info"
Managing API Keys
linkllm key add openai sk-...
linkllm key add gemini AIza...
linkllm key add anthropic sk-ant-...
linkllm key list
linkllm key rm openai
Keys are stored encrypted with AES-256-GCM, tied to your OS keychain.
📋 CLI Reference
linkllm <command> [options]
Commands:
serve Start the LinkLLM server
chat [model] Start an interactive chat session
pull <user/model> Pull a model from HuggingFace
push <model> Push a model to the LinkLLM registry
list List all local models
rm <model> Remove a local model
show <model> Show model info and metadata
monitor Open the TUI monitoring dashboard
key <add|rm|list> Manage encrypted API keys
config <get|set> View or update configuration
run <model> Pull (if needed) and start chatting
Options:
--host Server host (default: 127.0.0.1)
--port Server port (default: 11434)
--model-dir Override model storage directory
--log-level Log verbosity: error|warn|info|debug|trace
-v, --version Print version
-h, --help Show help
🆚 Comparison
| LinkLLM | Ollama | LiteLLM | |
|---|---|---|---|
| Local GGUF inference | ✅ | ✅ | ❌ |
| API proxy (OpenAI / Gemini / etc.) | ✅ | ❌ | ✅ |
| HuggingFace model pull | ✅ | Partial | ❌ |
| TUI monitoring dashboard | ✅ | ❌ | Web UI only |
| Multi-model routing + fallback | ✅ | ❌ | ✅ |
| Encrypted API key management | ✅ | ❌ | Partial |
| Rust core (memory safe) | ✅ | Go | Python |
| OpenAI-compatible REST API | ✅ | ✅ | ✅ |
| Native Rust SDK | ✅ | ❌ | ❌ |
| Pure-Rust inference (candle) | ✅ | ❌ | ❌ |
| Mobile / Termux | ✅ | Limited | Limited |
| Cost tracking per request | ✅ | ❌ | ✅ |
| Single binary, no Docker | ✅ | ✅ | ❌ |
🏗️ Architecture
┌─────────────────────────────────────────────────┐
│ User Interface Layer │
│ CLI Chat · TUI Monitor · Model Manager │
│ (TypeScript + Ink) │
└────────────────────┬────────────────────────────┘
│
┌────────────────────▼────────────────────────────┐
│ API Gateway (Rust/Axum) │
│ REST API · Auth · Rate Limiter · TLS │
└────────────────────┬────────────────────────────┘
│
┌────────────────────▼────────────────────────────┐
│ Core Engine (Rust/Tokio) │
│ Router · Pipeline · Context · Metrics │
└──────┬─────────────┬───────────────┬────────────┘
│ │ │
┌──────▼──────┐ ┌────▼─────┐ ┌──────▼──────┐
│ Local GGUF │ │ Python │ │ API Proxy │
│ llama.cpp │ │ Bridge │ │ OAI/Gemini/ │
│ + candle │ │ HF │ │ Anthropic │
└─────────────┘ └──────────┘ └─────────────┘
See the full Architecture Document for details.
📦 Packages
| Package | Registry | Install |
|---|---|---|
linkllm (binary) |
GitHub Releases | curl -fsSL https://install.linkllm.dev | sh |
linkllm (CLI) |
npm | npm install -g linkllm |
linkllm (Python SDK) |
PyPI | pip install linkllm |
linkllm (Rust SDK) |
crates.io | cargo add linkllm |
🚀 Roadmap
- Core Rust engine + Axum server
- OpenAI-compatible API
- llama.cpp GGUF inference
- HuggingFace model pull
- API proxy (OpenAI, Gemini, Anthropic)
- TUI monitoring dashboard
- Encrypted API key management
- Multi-model routing (in progress)
- candle pure-Rust inference
- WebUI dashboard
- Model fine-tuning support
- Plugin / middleware system
- LoRA adapter merge
- Distributed inference
- LinkLLM Cloud (hosted)
🤝 Contributing
Contributions are welcome! Please read CONTRIBUTING.md before submitting a PR.
git clone https://github.com/linkllm/linkllm
cd linkllm
# Build Rust core
cargo build
# Run tests
cargo test
# Build CLI
cd cli && npm install && npm run build
# Run Python bridge tests
cd python && pip install -e ".[dev]" && pytest
Good first issues are labeled good-first-issue on GitHub.
📄 License
MIT License © 2025 AJ Ashik
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file linkllm-0.0.1.tar.gz.
File metadata
- Download URL: linkllm-0.0.1.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9341ff06792c59b68452fcaa73ada53b496e446567c98bd73369170dbca32b56
|
|
| MD5 |
51bca29eeaaa6b7a1f6e23ce78c42af2
|
|
| BLAKE2b-256 |
47486fb8b56bbef75380096caf473706fe035e54ea62a0534d8651dcecf08c36
|
File details
Details for the file linkllm-0.0.1-py3-none-any.whl.
File metadata
- Download URL: linkllm-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
690095269dccb2a26eff83a59833b363231173675d8271d1042ede9c7c9367cc
|
|
| MD5 |
b8139bb9a173ce498ece208533821d0d
|
|
| BLAKE2b-256 |
df0f84a55e48c7c9ec0aa981be00f02b8a30bee633b1f08649bea6400aefab86
|