The unified LLM runtime — local inference, API proxy, and monitoring. A powerful alternative to Ollama + LiteLLM, built in Rust.

These details have not been verified by PyPI

Project links

Project description

LinkLLM

The unified LLM runtime — local inference, API proxy, and monitoring in one blazing-fast tool.
A powerful alternative to Ollama + LiteLLM, built from the ground up in Rust.

Quick Start · Features · Installation · Usage · API · Documentation · Contributing

# Install and run your first model in under 60 seconds
curl -fsSL https://install.linkllm.dev | sh
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
linkllm chat mistral

What is LinkLLM?

LinkLLM is a single tool that replaces both Ollama and LiteLLM — plus goes further. It gives you:

Local inference of any GGUF model (llama.cpp-powered, pure-Rust candle backend)
API proxy to OpenAI, Gemini, Anthropic, Groq, and any OpenAI-compatible endpoint
Model management — pull any model from HuggingFace with one command
Production-ready REST API with OpenAI-compatible routes, auth, rate limiting, TLS
Real-time monitoring dashboard right inside your terminal
Multi-model routing with fallback chains and cost tracking

All in a single binary. No Docker required. Works on Windows, macOS, Linux, and Termux.

✨ Features

🦀 Rust-Powered Core

Built on Tokio + Axum — async from the ground up. Memory safe, no garbage collector pauses, minimal footprint.

🤖 Local Model Inference

Run GGUF models via llama.cpp FFI bindings — same performance, Rust-safe wrapper
Pure Rust inference with candle (no C++ dependency)
GPU acceleration: CUDA, ROCm, Apple Metal — auto-detected
Quantization: Q4_K_M, Q5_K_S, Q8_0, F16 and more

🌐 Universal API Proxy

Route requests to any provider through a single unified API:

Provider	Models
OpenAI	gpt-4o, o1, gpt-4-turbo, ...
Google Gemini	gemini-2.0-flash, gemini-1.5-pro, ...
Anthropic	claude-3-5-sonnet, claude-3-opus, ...
Groq	llama3, mixtral (ultra-fast)
Together AI	50+ open models
Any OpenAI-compat	Custom base URL

📦 HuggingFace Model Pull

linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF
linkllm pull TheBloke/Llama-2-13B-chat-GGUF --quant Q4_K_M
linkllm pull google/gemma-2-9b-it

Resume interrupted downloads. SHA-256 integrity check. Auto-conversion to GGUF.

📊 Terminal Monitoring Dashboard

linkllm monitor

Real-time TUI powered by Ink:

Tokens/second live graph
Latency histograms (p50 / p95 / p99)
Active model memory usage
Per-provider cost breakdown
Request log (live tail)
API key usage tracker
Error rate + alerts

🔐 Security-First Design

AES-256-GCM encrypted API key store (OS keychain integration)
TLS 1.3 by default, mTLS for production
HMAC request signing in the Rust SDK
JWT bearer tokens for server access
Per-key rate limits and quotas
Sandboxed model inference

🔀 Multi-Model Routing

Define routing rules in linkllm.toml:

[routing]
default = "mistral"

[[routing.rules]]
match = "code"
model = "deepseek-coder"

[[routing.rules]]
match = "long-context"
model = "gemini-1.5-pro"
fallback = ["gpt-4o", "claude-3-opus"]

⚡ Quick Start

1. Install

Linux / macOS / Termux:

curl -fsSL https://install.linkllm.dev | sh

Windows (PowerShell):

irm https://install.linkllm.dev/windows | iex

Homebrew:

brew install linkllm/tap/linkllm

npm (CLI only):

npm install -g linkllm

pip (Python SDK + CLI):

pip install linkllm

From source:

git clone https://github.com/linkllm/linkllm
cd linkllm
cargo build --release

2. Pull a Model

# Pull from HuggingFace (GGUF auto-detected)
linkllm pull mistralai/Mistral-7B-Instruct-v0.3-GGUF

# Specify quantization
linkllm pull TheBloke/Llama-2-13B-chat-GGUF --quant Q4_K_M

# List downloaded models
linkllm list

3. Chat in Terminal

linkllm chat mistral
linkllm chat gpt-4o          # routes to OpenAI (needs API key)
linkllm chat gemini-flash    # routes to Google Gemini

4. Start the Server

linkllm serve
# Server running at http://localhost:11434
# OpenAI-compatible API at http://localhost:11434/v1

5. Monitor

linkllm monitor

🔌 API

LinkLLM exposes a fully OpenAI-compatible REST API. Drop it in as a replacement for api.openai.com:

Chat Completions

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "mistral",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Python — OpenAI SDK Compatible

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="mistral",
    messages=[{"role": "user", "content": "Explain Rust ownership"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Python — LinkLLM Native SDK

pip install linkllm

import linkllm

client = linkllm.Client()

# Chat with any model — local or API
response = client.chat("mistral", "What is the capital of France?")
print(response.text)

# Streaming
for token in client.stream("gpt-4o", "Write a haiku about Rust"):
    print(token, end="", flush=True)

# Pull a model programmatically
client.pull("TheBloke/Mistral-7B-Instruct-v0.2-GGUF")

# List local models
models = client.list()
for m in models:
    print(f"{m.name} — {m.size_gb:.1f} GB")

TypeScript / JavaScript

npm install linkllm

import { LinkLLM } from "linkllm";

const client = new LinkLLM({ baseUrl: "http://localhost:11434" });

// Chat
const response = await client.chat({
  model: "mistral",
  messages: [{ role: "user", content: "Hello from TypeScript!" }],
});
console.log(response.content);

// Streaming
const stream = client.stream({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Tell me a story" }],
});

for await (const token of stream) {
  process.stdout.write(token);
}

Rust SDK

# Cargo.toml
[dependencies]
linkllm = "0.1"
tokio = { version = "1", features = ["full"] }

use linkllm::{Client, ChatMessage, Role};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::new("http://localhost:11434")?;

    let response = client
        .chat("mistral")
        .message(Role::User, "What is Rust?")
        .send()
        .await?;

    println!("{}", response.content());
    Ok(())
}

⚙️ Configuration

LinkLLM is configured via ~/.linkllm/config.toml:

[server]
host = "127.0.0.1"
port = 11434
tls = false

[models]
default = "mistral"
model_dir = "~/.linkllm/models"

[inference]
gpu_layers = -1        # -1 = auto (offload all to GPU)
context_size = 4096
threads = 8

[api_keys]
# Encrypted. Use `linkllm key add` to set these safely.
openai = ""
gemini = ""
anthropic = ""
groq = ""

[routing]
default = "mistral"
fallback_chain = ["mistral", "gpt-4o-mini"]

[monitoring]
enabled = true
metrics_port = 9090    # Prometheus-compatible /metrics
log_level = "info"

Managing API Keys

linkllm key add openai sk-...
linkllm key add gemini AIza...
linkllm key add anthropic sk-ant-...
linkllm key list
linkllm key rm openai

Keys are stored encrypted with AES-256-GCM, tied to your OS keychain.

📋 CLI Reference

linkllm <command> [options]

Commands:
  serve               Start the LinkLLM server
  chat [model]        Start an interactive chat session
  pull <user/model>   Pull a model from HuggingFace
  push <model>        Push a model to the LinkLLM registry
  list                List all local models
  rm <model>          Remove a local model
  show <model>        Show model info and metadata
  monitor             Open the TUI monitoring dashboard
  key <add|rm|list>   Manage encrypted API keys
  config <get|set>    View or update configuration
  run <model>         Pull (if needed) and start chatting

Options:
  --host              Server host (default: 127.0.0.1)
  --port              Server port (default: 11434)
  --model-dir         Override model storage directory
  --log-level         Log verbosity: error|warn|info|debug|trace
  -v, --version       Print version
  -h, --help          Show help

🆚 Comparison

	LinkLLM	Ollama	LiteLLM
Local GGUF inference	✅	✅	❌
API proxy (OpenAI / Gemini / etc.)	✅	❌	✅
HuggingFace model pull	✅	Partial	❌
TUI monitoring dashboard	✅	❌	Web UI only
Multi-model routing + fallback	✅	❌	✅
Encrypted API key management	✅	❌	Partial
Rust core (memory safe)	✅	Go	Python
OpenAI-compatible REST API	✅	✅	✅
Native Rust SDK	✅	❌	❌
Pure-Rust inference (candle)	✅	❌	❌
Mobile / Termux	✅	Limited	Limited
Cost tracking per request	✅	❌	✅
Single binary, no Docker	✅	✅	❌

🏗️ Architecture

┌─────────────────────────────────────────────────┐
│              User Interface Layer                │
│   CLI Chat · TUI Monitor · Model Manager        │
│              (TypeScript + Ink)                  │
└────────────────────┬────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│              API Gateway (Rust/Axum)             │
│   REST API · Auth · Rate Limiter · TLS          │
└────────────────────┬────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│            Core Engine (Rust/Tokio)              │
│   Router · Pipeline · Context · Metrics         │
└──────┬─────────────┬───────────────┬────────────┘
       │             │               │
┌──────▼──────┐ ┌────▼─────┐ ┌──────▼──────┐
│  Local GGUF │ │  Python  │ │  API Proxy  │
│  llama.cpp  │ │  Bridge  │ │ OAI/Gemini/ │
│  + candle   │ │    HF    │ │  Anthropic  │
└─────────────┘ └──────────┘ └─────────────┘

See the full Architecture Document for details.

📦 Packages

Package	Registry	Install
`linkllm` (binary)	GitHub Releases	`curl -fsSL https://install.linkllm.dev \| sh`
`linkllm` (CLI)	npm	`npm install -g linkllm`
`linkllm` (Python SDK)	PyPI	`pip install linkllm`
`linkllm` (Rust SDK)	crates.io	`cargo add linkllm`

🚀 Roadmap

Core Rust engine + Axum server
OpenAI-compatible API
llama.cpp GGUF inference
HuggingFace model pull
API proxy (OpenAI, Gemini, Anthropic)
TUI monitoring dashboard
Encrypted API key management
Multi-model routing (in progress)
candle pure-Rust inference
WebUI dashboard
Model fine-tuning support
Plugin / middleware system
LoRA adapter merge
Distributed inference
LinkLLM Cloud (hosted)

🤝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md before submitting a PR.

git clone https://github.com/linkllm/linkllm
cd linkllm

# Build Rust core
cargo build

# Run tests
cargo test

# Build CLI
cd cli && npm install && npm run build

# Run Python bridge tests
cd python && pip install -e ".[dev]" && pytest

Good first issues are labeled good-first-issue on GitHub.

📄 License

_{Built with ❤️ in Rust · Twitter · Discord · Docs}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.1

Mar 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkllm-0.0.1.tar.gz (8.5 kB view details)

Uploaded Mar 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

linkllm-0.0.1-py3-none-any.whl (8.8 kB view details)

Uploaded Mar 14, 2026 Python 3

File details

Details for the file linkllm-0.0.1.tar.gz.

File metadata

Download URL: linkllm-0.0.1.tar.gz
Upload date: Mar 14, 2026
Size: 8.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for linkllm-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`9341ff06792c59b68452fcaa73ada53b496e446567c98bd73369170dbca32b56`
MD5	`51bca29eeaaa6b7a1f6e23ce78c42af2`
BLAKE2b-256	`47486fb8b56bbef75380096caf473706fe035e54ea62a0534d8651dcecf08c36`

See more details on using hashes here.

File details

Details for the file linkllm-0.0.1-py3-none-any.whl.

File metadata

Download URL: linkllm-0.0.1-py3-none-any.whl
Upload date: Mar 14, 2026
Size: 8.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for linkllm-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`690095269dccb2a26eff83a59833b363231173675d8271d1042ede9c7c9367cc`
MD5	`b8139bb9a173ce498ece208533821d0d`
BLAKE2b-256	`df0f84a55e48c7c9ec0aa981be00f02b8a30bee633b1f08649bea6400aefab86`

See more details on using hashes here.

linkllm 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LinkLLM

What is LinkLLM?

✨ Features

🦀 Rust-Powered Core

🤖 Local Model Inference

🌐 Universal API Proxy

📦 HuggingFace Model Pull

📊 Terminal Monitoring Dashboard

🔐 Security-First Design

🔀 Multi-Model Routing

⚡ Quick Start

1. Install

2. Pull a Model

3. Chat in Terminal

4. Start the Server

5. Monitor

🔌 API

Chat Completions

Python — OpenAI SDK Compatible

Python — LinkLLM Native SDK

TypeScript / JavaScript

Rust SDK

⚙️ Configuration

Managing API Keys

📋 CLI Reference

🆚 Comparison

🏗️ Architecture

📦 Packages

🚀 Roadmap

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes