Skip to main content

The unified LLM runtime — local inference, API proxy, and monitoring. A powerful alternative to Ollama + LiteLLM, built in Rust.

Project description

LinkLLM

LinkLLM

The unified LLM runtime — local inference, API proxy, and monitoring in one blazing-fast tool.
A powerful alternative to Ollama + LiteLLM, built from the ground up in Rust.

Latest Release Crates.io PyPI npm License Stars

Quick Start · Features · Installation · Usage · API · Documentation · Contributing

# Install and run your first model in under 60 seconds
curl -fsSL https://install.linkllm.dev | sh
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
linkllm chat mistral

What is LinkLLM?

LinkLLM is a single tool that replaces both Ollama and LiteLLM — plus goes further. It gives you:

  • Local inference of any GGUF model (llama.cpp-powered, pure-Rust candle backend)
  • API proxy to OpenAI, Gemini, Anthropic, Groq, and any OpenAI-compatible endpoint
  • Model management — pull any model from HuggingFace with one command
  • Production-ready REST API with OpenAI-compatible routes, auth, rate limiting, TLS
  • Real-time monitoring dashboard right inside your terminal
  • Multi-model routing with fallback chains and cost tracking

All in a single binary. No Docker required. Works on Windows, macOS, Linux, and Termux.


✨ Features

🦀 Rust-Powered Core

Built on Tokio + Axum — async from the ground up. Memory safe, no garbage collector pauses, minimal footprint.

🤖 Local Model Inference

  • Run GGUF models via llama.cpp FFI bindings — same performance, Rust-safe wrapper
  • Pure Rust inference with candle (no C++ dependency)
  • GPU acceleration: CUDA, ROCm, Apple Metal — auto-detected
  • Quantization: Q4_K_M, Q5_K_S, Q8_0, F16 and more

🌐 Universal API Proxy

Route requests to any provider through a single unified API:

Provider Models
OpenAI gpt-4o, o1, gpt-4-turbo, ...
Google Gemini gemini-2.0-flash, gemini-1.5-pro, ...
Anthropic claude-3-5-sonnet, claude-3-opus, ...
Groq llama3, mixtral (ultra-fast)
Together AI 50+ open models
Any OpenAI-compat Custom base URL

📦 HuggingFace Model Pull

linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
linkllm pull TheBloke/Llama-2-13B-chat-GGUF --quant Q4_K_M
linkllm pull google/gemma-2-9b-it

Resume interrupted downloads. SHA-256 integrity check. Auto-conversion to GGUF.

📊 Terminal Monitoring Dashboard

linkllm monitor

Real-time TUI powered by Ink:

  • Tokens/second live graph
  • Latency histograms (p50 / p95 / p99)
  • Active model memory usage
  • Per-provider cost breakdown
  • Request log (live tail)
  • API key usage tracker
  • Error rate + alerts

🔐 Security-First Design

  • AES-256-GCM encrypted API key store (OS keychain integration)
  • TLS 1.3 by default, mTLS for production
  • HMAC request signing in the Rust SDK
  • JWT bearer tokens for server access
  • Per-key rate limits and quotas
  • Sandboxed model inference

🔀 Multi-Model Routing

Define routing rules in linkllm.toml:

[routing]
default = "mistral"

[[routing.rules]]
match = "code"
model = "deepseek-coder"

[[routing.rules]]
match = "long-context"
model = "gemini-1.5-pro"
fallback = ["gpt-4o", "claude-3-opus"]

⚡ Quick Start

1. Install

Linux / macOS / Termux:

curl -fsSL https://install.linkllm.dev | sh

Windows (PowerShell):

irm https://install.linkllm.dev/windows | iex

Homebrew:

brew install linkllm/tap/linkllm

npm (CLI only):

npm install -g linkllm

pip (Python SDK + CLI):

pip install linkllm

From source:

git clone https://github.com/linkllm/linkllm
cd linkllm
cargo build --release

2. Pull a Model

# Pull from HuggingFace (GGUF auto-detected)
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF

# Specify quantization
linkllm pull TheBloke/Llama-2-13B-chat-GGUF --quant Q4_K_M

# List downloaded models
linkllm list

3. Chat in Terminal

linkllm chat mistral
linkllm chat gpt-4o          # routes to OpenAI (needs API key)
linkllm chat gemini-flash    # routes to Google Gemini

4. Start the Server

linkllm serve
# Server running at http://localhost:11434
# OpenAI-compatible API at http://localhost:11434/v1

5. Monitor

linkllm monitor

🔌 API

LinkLLM exposes a fully OpenAI-compatible REST API. Drop it in as a replacement for api.openai.com:

Chat Completions

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "mistral",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Python — OpenAI SDK Compatible

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="mistral",
    messages=[{"role": "user", "content": "Explain Rust ownership"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Python — LinkLLM Native SDK

pip install linkllm
import linkllm

client = linkllm.Client()

# Chat with any model — local or API
response = client.chat("mistral", "What is the capital of France?")
print(response.text)

# Streaming
for token in client.stream("gpt-4o", "Write a haiku about Rust"):
    print(token, end="", flush=True)

# Pull a model programmatically
client.pull("TheBloke/Mistral-7B-Instruct-v0.2-GGUF")

# List local models
models = client.list()
for m in models:
    print(f"{m.name}{m.size_gb:.1f} GB")

TypeScript / JavaScript

npm install linkllm
import { LinkLLM } from "linkllm";

const client = new LinkLLM({ baseUrl: "http://localhost:11434" });

// Chat
const response = await client.chat({
  model: "mistral",
  messages: [{ role: "user", content: "Hello from TypeScript!" }],
});
console.log(response.content);

// Streaming
const stream = client.stream({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Tell me a story" }],
});

for await (const token of stream) {
  process.stdout.write(token);
}

Rust SDK

# Cargo.toml
[dependencies]
linkllm = "0.1"
tokio = { version = "1", features = ["full"] }
use linkllm::{Client, ChatMessage, Role};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::new("http://localhost:11434")?;

    let response = client
        .chat("mistral")
        .message(Role::User, "What is Rust?")
        .send()
        .await?;

    println!("{}", response.content());
    Ok(())
}

⚙️ Configuration

LinkLLM is configured via ~/.linkllm/config.toml:

[server]
host = "127.0.0.1"
port = 11434
tls = false

[models]
default = "mistral"
model_dir = "~/.linkllm/models"

[inference]
gpu_layers = -1        # -1 = auto (offload all to GPU)
context_size = 4096
threads = 8

[api_keys]
# Encrypted. Use `linkllm key add` to set these safely.
openai = ""
gemini = ""
anthropic = ""
groq = ""

[routing]
default = "mistral"
fallback_chain = ["mistral", "gpt-4o-mini"]

[monitoring]
enabled = true
metrics_port = 9090    # Prometheus-compatible /metrics
log_level = "info"

Managing API Keys

linkllm key add openai sk-...
linkllm key add gemini AIza...
linkllm key add anthropic sk-ant-...
linkllm key list
linkllm key rm openai

Keys are stored encrypted with AES-256-GCM, tied to your OS keychain.


📋 CLI Reference

linkllm <command> [options]

Commands:
  serve               Start the LinkLLM server
  chat [model]        Start an interactive chat session
  pull <user/model>   Pull a model from HuggingFace
  push <model>        Push a model to the LinkLLM registry
  list                List all local models
  rm <model>          Remove a local model
  show <model>        Show model info and metadata
  monitor             Open the TUI monitoring dashboard
  key <add|rm|list>   Manage encrypted API keys
  config <get|set>    View or update configuration
  run <model>         Pull (if needed) and start chatting

Options:
  --host              Server host (default: 127.0.0.1)
  --port              Server port (default: 11434)
  --model-dir         Override model storage directory
  --log-level         Log verbosity: error|warn|info|debug|trace
  -v, --version       Print version
  -h, --help          Show help

🆚 Comparison

LinkLLM Ollama LiteLLM
Local GGUF inference
API proxy (OpenAI / Gemini / etc.)
HuggingFace model pull Partial
TUI monitoring dashboard Web UI only
Multi-model routing + fallback
Encrypted API key management Partial
Rust core (memory safe) Go Python
OpenAI-compatible REST API
Native Rust SDK
Pure-Rust inference (candle)
Mobile / Termux Limited Limited
Cost tracking per request
Single binary, no Docker

🏗️ Architecture

┌─────────────────────────────────────────────────┐
│              User Interface Layer                │
│   CLI Chat · TUI Monitor · Model Manager        │
│              (TypeScript + Ink)                  │
└────────────────────┬────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│              API Gateway (Rust/Axum)             │
│   REST API · Auth · Rate Limiter · TLS          │
└────────────────────┬────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│            Core Engine (Rust/Tokio)              │
│   Router · Pipeline · Context · Metrics         │
└──────┬─────────────┬───────────────┬────────────┘
       │             │               │
┌──────▼──────┐ ┌────▼─────┐ ┌──────▼──────┐
│  Local GGUF │ │  Python  │ │  API Proxy  │
│  llama.cpp  │ │  Bridge  │ │ OAI/Gemini/ │
│  + candle   │ │    HF    │ │  Anthropic  │
└─────────────┘ └──────────┘ └─────────────┘

See the full Architecture Document for details.


📦 Packages

Package Registry Install
linkllm (binary) GitHub Releases curl -fsSL https://install.linkllm.dev | sh
linkllm (CLI) npm npm install -g linkllm
linkllm (Python SDK) PyPI pip install linkllm
linkllm (Rust SDK) crates.io cargo add linkllm

🚀 Roadmap

  • Core Rust engine + Axum server
  • OpenAI-compatible API
  • llama.cpp GGUF inference
  • HuggingFace model pull
  • API proxy (OpenAI, Gemini, Anthropic)
  • TUI monitoring dashboard
  • Encrypted API key management
  • Multi-model routing (in progress)
  • candle pure-Rust inference
  • WebUI dashboard
  • Model fine-tuning support
  • Plugin / middleware system
  • LoRA adapter merge
  • Distributed inference
  • LinkLLM Cloud (hosted)

🤝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md before submitting a PR.

git clone https://github.com/linkllm/linkllm
cd linkllm

# Build Rust core
cargo build

# Run tests
cargo test

# Build CLI
cd cli && npm install && npm run build

# Run Python bridge tests
cd python && pip install -e ".[dev]" && pytest

Good first issues are labeled good-first-issue on GitHub.


📄 License

MIT License © 2025 AJ Ashik


Built with ❤️ in Rust · Twitter · Discord · Docs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkllm-0.0.1.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linkllm-0.0.1-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file linkllm-0.0.1.tar.gz.

File metadata

  • Download URL: linkllm-0.0.1.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for linkllm-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9341ff06792c59b68452fcaa73ada53b496e446567c98bd73369170dbca32b56
MD5 51bca29eeaaa6b7a1f6e23ce78c42af2
BLAKE2b-256 47486fb8b56bbef75380096caf473706fe035e54ea62a0534d8651dcecf08c36

See more details on using hashes here.

File details

Details for the file linkllm-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: linkllm-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for linkllm-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 690095269dccb2a26eff83a59833b363231173675d8271d1042ede9c7c9367cc
MD5 b8139bb9a173ce498ece208533821d0d
BLAKE2b-256 df0f84a55e48c7c9ec0aa981be00f02b8a30bee633b1f08649bea6400aefab86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page