Python client for OffGrid LLM - Run AI models completely offline
Project description
OffGrid Python Client
Python client library for OffGrid LLM - Run AI models completely offline.
Installation
pip install offgrid
Quick Start
import offgrid
# Connect to server
client = offgrid.Client() # localhost:11611
# Chat
response = client.chat("What is Python?")
print(response)
# List available models
models = client.list_models()
for m in models:
print(f"- {m['id']}")
Full Usage
Chat
from offgrid import Client
client = Client()
# Basic chat (uses first available model)
response = client.chat("Explain quantum computing")
print(response)
# Specify model
response = client.chat("Hello!", model="Llama-3.2-3B-Instruct")
# With system prompt
response = client.chat(
"Write a poem about AI",
model="Llama-3.2-3B-Instruct",
system="You are a creative poet.",
temperature=0.9
)
# Streaming
for chunk in client.chat("Tell me a long story", stream=True):
print(chunk, end="", flush=True)
# Full conversation
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there! How can I help?"},
{"role": "user", "content": "What's the weather like?"}
]
response = client.chat(messages=messages)
API Key Authentication
from offgrid import Client
# With API key (if server has OFFGRID_API_KEY set)
client = Client(api_key="your-secret-key")
# All requests automatically include the Authorization header
response = client.chat("Hello!")
Session Management
# Sessions preserve conversation context on the server
sessions = client.sessions
# Create a new session
session = sessions.create("my-chat")
# List all sessions
for s in sessions.list():
print(f"- {s['name']}: {len(s.get('messages', []))} messages")
# Chat with session (context is preserved)
response1 = sessions.chat_with_session("my-chat", "My name is Alice")
response2 = sessions.chat_with_session("my-chat", "What is my name?")
# response2 will correctly reference "Alice"
# Add messages manually
sessions.add_message("my-chat", "user", "Hello")
sessions.add_message("my-chat", "assistant", "Hi there!")
# Get session details
details = sessions.get("my-chat")
print(f"Messages: {len(details['messages'])}")
# Delete session
sessions.delete("my-chat")
Server Statistics
# Get comprehensive server statistics
stats = client.stats()
# Server info
print(f"Uptime: {stats['server']['uptime']}")
print(f"Version: {stats['server']['version']}")
# Inference metrics
print(f"Total requests: {stats['inference']['aggregate']['total_requests']}")
print(f"Total tokens: {stats['inference']['aggregate']['total_tokens']}")
# System resources
print(f"CPU usage: {stats['resources']['cpu_usage_percent']:.1f}%")
print(f"Memory: {stats['resources']['memory_used_mb']}MB")
# RAG status
if stats['rag']['enabled']:
print(f"Documents: {stats['rag']['documents']}")
Model Management
# List installed models
for model in client.list_models():
print(model['id'])
# Search for models
results = client.models.search("llama", ram=8)
for model in results:
print(f"{model['id']} - {model['size_gb']}GB")
# Download a model
client.models.download(
"bartowski/Llama-3.2-3B-Instruct-GGUF",
"Llama-3.2-3B-Instruct-Q4_K_M.gguf",
progress_callback=lambda pct, done, total: print(f"\r{pct:.1f}%", end="")
)
# Delete a model
client.models.delete("old-model")
# Import from USB
imported = client.models.import_usb("/media/usb")
# Export to USB
client.models.export_usb("Llama-3.2-3B-Instruct-Q4_K_M", "/media/usb")
Knowledge Base (RAG)
# Add documents
client.kb.add("notes.md")
client.kb.add("meeting", content="Meeting notes from today...")
client.kb.add_directory("./docs", extensions=[".md", ".txt", ".pdf"])
# List documents
for doc in client.kb.list():
print(f"{doc['id']}: {doc['chunks']} chunks")
# Search
results = client.kb.search("project deadline")
for r in results:
print(f"[{r['score']:.2f}] {r['content'][:100]}...")
# Chat with Knowledge Base context
response = client.chat(
"What are the main action items from the meeting?",
use_kb=True
)
# Remove documents
client.kb.remove("notes.md")
client.kb.clear() # Remove all
AI Agents (New in v0.2.3)
Run autonomous agents that can use tools to complete tasks:
# Run an agent task
result = client.agent.run(
"Calculate 127 * 48 + 356",
model="llama3.2:3b"
)
print(result["result"])
# List available tools
tools = client.agent.tools()
for tool in tools:
status = "✓" if tool["enabled"] else "✗"
print(f"[{status}] {tool['name']}: {tool['description']}")
# Toggle tools on/off
client.agent.disable_tool("shell") # Security: disable shell access
client.agent.enable_tool("calculator")
# Complex multi-step tasks
result = client.agent.run(
"Read the VERSION file and tell me what version it is",
model="llama3.2:3b",
max_steps=5
)
Built-in Tools:
calculator- Evaluate mathematical expressionscurrent_time- Get current date/timeread_file- Read file contentswrite_file- Write content to fileslist_files- List directory contentsshell- Execute shell commandshttp_get- Make HTTP GET requests
MCP Integration (New in v0.2.3)
Connect external tools via Model Context Protocol:
# Add an MCP server
client.agent.mcp.add(
"filesystem",
"npx -y @modelcontextprotocol/server-filesystem /tmp"
)
# List configured servers
servers = client.agent.mcp.list()
for s in servers:
print(f"{s['name']}: {s['url']}")
# Test a server connection
result = client.agent.mcp.test(url="npx -y @modelcontextprotocol/server-github")
print(f"Found {len(result.get('tools', []))} tools")
# Remove a server
client.agent.mcp.remove("filesystem")
LoRA Adapters (New in v0.2.3)
Manage LoRA adapters for fine-tuned models:
# List registered adapters
adapters = client.lora.list()
for a in adapters:
print(f"{a['name']}: {a['path']}")
# Register a new adapter
client.lora.register(
"coding-assistant",
"/path/to/code-lora.gguf",
scale=0.8
)
# Get adapter details
adapter = client.lora.get("coding-assistant")
# Remove an adapter
client.lora.remove("coding-assistant")
Audio: Speech-to-Text & Text-to-Speech (New in v0.2.4)
Transcribe audio files and generate speech completely offline:
# Setup: Download Whisper model for transcription
client.audio.setup_whisper("base") # Options: tiny, base, small, medium, large
# Setup: Download a voice for text-to-speech
client.audio.setup_piper("en_US-amy-medium")
# Transcribe audio (Speech-to-Text)
text = client.audio.transcribe("recording.wav", model="base")
print(f"Transcription: {text}")
# Transcribe with options
text = client.audio.transcribe(
"recording.mp3",
model="small", # Larger = more accurate
language="en", # Optional: specify language
response_format="text" # text, json, verbose_json
)
# Generate speech (Text-to-Speech)
audio_data = client.audio.speak("Hello, how are you today?", voice="en_US-amy-medium")
with open("output.wav", "wb") as f:
f.write(audio_data)
# Text-to-speech with options
audio_data = client.audio.speak(
"Welcome to OffGrid!",
voice="en_US-amy-medium",
speed=1.0, # 0.5 = slow, 2.0 = fast
response_format="wav" # wav, mp3, opus, flac
)
# List available voices
voices = client.audio.voices()
for v in voices:
print(f"- {v['id']}: {v['language']}")
# List Whisper models
models = client.audio.models()
for m in models:
status = "✓ installed" if m["installed"] else "✗ not installed"
print(f"- {m['id']} ({m['size']}): {status}")
# Check audio status
status = client.audio.status()
print(f"Whisper installed: {status['whisper']['installed']}")
print(f"Piper installed: {status['piper']['installed']}")
Available Whisper Models:
| Model | Size | RAM | Speed | Quality |
|---|---|---|---|---|
| tiny | 75MB | ~1GB | Fastest | Basic |
| base | 142MB | ~1GB | Fast | Good |
| small | 466MB | ~2GB | Medium | Better |
| medium | 1.5GB | ~5GB | Slower | Great |
| large | 2.9GB | ~10GB | Slowest | Best |
Popular Voices:
en_US-amy-medium- American English, femaleen_US-ryan-medium- American English, maleen_GB-alba-medium- British English, femalede_DE-thorsten-medium- German, malefr_FR-siwis-medium- French, female
System Configuration (New in v0.2.3)
# Get server configuration and feature flags
config = client.config()
print(f"Version: {config['version']}")
print(f"Multi-user mode: {config['multi_user_mode']}")
print(f"Agent enabled: {config['features']['agent']}")
# Get real-time system stats
stats = client.system_stats()
print(f"CPU: {stats['cpu_percent']}%")
print(f"Memory: {stats['memory_percent']}%")
Embeddings
# Single text
embedding = client.embed("Hello world")
print(f"Dimensions: {len(embedding)}")
# Multiple texts
embeddings = client.embed(["Hello", "World", "AI"])
System Info
# Check server health
if client.health():
print("Server is running")
# Get detailed info
info = client.info()
print(f"Uptime: {info['uptime']}")
print(f"CPU: {info['system']['cpu_percent']}%")
print(f"Memory: {info['system']['memory_percent']}%")
Configuration
from offgrid import Client
# Default: localhost:11611
client = Client()
# Custom server URL
client = Client(host="http://192.168.1.100:11611")
# Just hostname (auto-adds http://)
client = Client(host="192.168.1.100:11611")
# With API key authentication
client = Client(api_key="your-secret-key")
# Custom timeout (for slow models)
client = Client(timeout=600) # 10 minutes
# Combined options
client = Client(
host="http://192.168.1.100:11611",
api_key="your-secret-key",
timeout=300
)
Automatic Retry
The client automatically retries failed requests with exponential backoff:
- Up to 3 retry attempts
- Delays: 1s → 2s → 4s between retries
- Only retries on connection errors, not HTTP errors
# Retries are automatic
response = client.chat("Hello!") # Will retry on transient failures
Error Handling
from offgrid import Client, OffGridError
client = Client()
try:
response = client.chat("Hello")
except OffGridError as e:
print(f"Error: {e.message}")
if e.code:
print(f"Code: {e.code}")
Requirements
- Python 3.8+
- OffGrid LLM server running (
offgrid serve) - No external dependencies (uses only stdlib)
Links
- OffGrid LLM - Main project
- API Reference
- Issue Tracker
License
MIT License
Custom server URL
client = Client(host="http://192.168.1.100:11611")
Just hostname (auto-adds http://)
client = Client(host="192.168.1.100:11611")
Custom timeout (for slow models)
client = Client(timeout=600) # 10 minutes
## Error Handling
```python
from offgrid import Client, OffGridError
client = Client()
try:
response = client.chat("Hello")
except OffGridError as e:
print(f"Error: {e.message}")
if e.code:
print(f"Code: {e.code}")
Requirements
- Python 3.8+
- OffGrid LLM server running (
offgrid serve) - No external dependencies (uses only stdlib)
Links
- OffGrid LLM - Main project
- API Reference
- Issue Tracker
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file offgrid-0.1.7.tar.gz.
File metadata
- Download URL: offgrid-0.1.7.tar.gz
- Upload date:
- Size: 25.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bd61b1e1c7cd52712917e5dd3327ad77024f6c30fe428e514b45c72a8806845
|
|
| MD5 |
e03c821f6952fdf4da070f00ce9c42f1
|
|
| BLAKE2b-256 |
8c1b671a3ef6963777284de921c45e120466e2bd561a9f311f6ad2c1b19678ab
|
File details
Details for the file offgrid-0.1.7-py3-none-any.whl.
File metadata
- Download URL: offgrid-0.1.7-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72fa1c0c78f83f9af84932ea97abc41dd5e4598bafb4097577bbd305e2d14b4b
|
|
| MD5 |
c43895e258272a1464b5e3450b026aee
|
|
| BLAKE2b-256 |
68fa20a7f06bd4f8ab19135078eeba7d6fec245ee9fba7ab6f26d61e6bd06a30
|