Python client for OffGrid LLM - Run AI models completely offline

These details have not been verified by PyPI

Project links

Project description

OffGrid Python Client

Python client library for OffGrid LLM - Run AI models completely offline.

Installation

pip install offgrid

Quick Start

import offgrid

# Connect to server
client = offgrid.Client()  # localhost:11611

# Chat
response = client.chat("What is Python?")
print(response)

# List available models
models = client.list_models()
for m in models:
    print(f"- {m['id']}")

Full Usage

Chat

from offgrid import Client

client = Client()

# Basic chat (uses first available model)
response = client.chat("Explain quantum computing")
print(response)

# Specify model
response = client.chat("Hello!", model="Llama-3.2-3B-Instruct")

# With system prompt
response = client.chat(
    "Write a poem about AI",
    model="Llama-3.2-3B-Instruct",
    system="You are a creative poet.",
    temperature=0.9
)

# Streaming
for chunk in client.chat("Tell me a long story", stream=True):
    print(chunk, end="", flush=True)

# Full conversation
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there! How can I help?"},
    {"role": "user", "content": "What's the weather like?"}
]
response = client.chat(messages=messages)

API Key Authentication

from offgrid import Client

# With API key (if server has OFFGRID_API_KEY set)
client = Client(api_key="your-secret-key")

# All requests automatically include the Authorization header
response = client.chat("Hello!")

Session Management

# Sessions preserve conversation context on the server
sessions = client.sessions

# Create a new session
session = sessions.create("my-chat")

# List all sessions
for s in sessions.list():
    print(f"- {s['name']}: {len(s.get('messages', []))} messages")

# Chat with session (context is preserved)
response1 = sessions.chat_with_session("my-chat", "My name is Alice")
response2 = sessions.chat_with_session("my-chat", "What is my name?")
# response2 will correctly reference "Alice"

# Add messages manually
sessions.add_message("my-chat", "user", "Hello")
sessions.add_message("my-chat", "assistant", "Hi there!")

# Get session details
details = sessions.get("my-chat")
print(f"Messages: {len(details['messages'])}")

# Delete session
sessions.delete("my-chat")

Server Statistics

# Get comprehensive server statistics
stats = client.stats()

# Server info
print(f"Uptime: {stats['server']['uptime']}")
print(f"Version: {stats['server']['version']}")

# Inference metrics
print(f"Total requests: {stats['inference']['aggregate']['total_requests']}")
print(f"Total tokens: {stats['inference']['aggregate']['total_tokens']}")

# System resources
print(f"CPU usage: {stats['resources']['cpu_usage_percent']:.1f}%")
print(f"Memory: {stats['resources']['memory_used_mb']}MB")

# RAG status
if stats['rag']['enabled']:
    print(f"Documents: {stats['rag']['documents']}")

Model Management

# List installed models
for model in client.list_models():
    print(model['id'])

# Search for models
results = client.models.search("llama", ram=8)
for model in results:
    print(f"{model['id']} - {model['size_gb']}GB")

# Download a model
client.models.download(
    "bartowski/Llama-3.2-3B-Instruct-GGUF",
    "Llama-3.2-3B-Instruct-Q4_K_M.gguf",
    progress_callback=lambda pct, done, total: print(f"\r{pct:.1f}%", end="")
)

# Delete a model
client.models.delete("old-model")

# Import from USB
imported = client.models.import_usb("/media/usb")

# Export to USB
client.models.export_usb("Llama-3.2-3B-Instruct-Q4_K_M", "/media/usb")

Knowledge Base (RAG)

# Add documents
client.kb.add("notes.md")
client.kb.add("meeting", content="Meeting notes from today...")
client.kb.add_directory("./docs", extensions=[".md", ".txt", ".pdf"])

# List documents
for doc in client.kb.list():
    print(f"{doc['id']}: {doc['chunks']} chunks")

# Search
results = client.kb.search("project deadline")
for r in results:
    print(f"[{r['score']:.2f}] {r['content'][:100]}...")

# Chat with Knowledge Base context
response = client.chat(
    "What are the main action items from the meeting?",
    use_kb=True
)

# Remove documents
client.kb.remove("notes.md")
client.kb.clear()  # Remove all

AI Agents (New in v0.2.3)

Run autonomous agents that can use tools to complete tasks:

# Run an agent task
result = client.agent.run(
    "Calculate 127 * 48 + 356",
    model="llama3.2:3b"
)
print(result["result"])

# List available tools
tools = client.agent.tools()
for tool in tools:
    status = "✓" if tool["enabled"] else "✗"
    print(f"[{status}] {tool['name']}: {tool['description']}")

# Toggle tools on/off
client.agent.disable_tool("shell")  # Security: disable shell access
client.agent.enable_tool("calculator")

# Complex multi-step tasks
result = client.agent.run(
    "Read the VERSION file and tell me what version it is",
    model="llama3.2:3b",
    max_steps=5
)

Built-in Tools:

calculator - Evaluate mathematical expressions
current_time - Get current date/time
read_file - Read file contents
write_file - Write content to files
list_files - List directory contents
shell - Execute shell commands
http_get - Make HTTP GET requests

MCP Integration (New in v0.2.3)

Connect external tools via Model Context Protocol:

# Add an MCP server
client.agent.mcp.add(
    "filesystem",
    "npx -y @modelcontextprotocol/server-filesystem /tmp"
)

# List configured servers
servers = client.agent.mcp.list()
for s in servers:
    print(f"{s['name']}: {s['url']}")

# Test a server connection
result = client.agent.mcp.test(url="npx -y @modelcontextprotocol/server-github")
print(f"Found {len(result.get('tools', []))} tools")

# Remove a server
client.agent.mcp.remove("filesystem")

LoRA Adapters (New in v0.2.3)

Manage LoRA adapters for fine-tuned models:

# List registered adapters
adapters = client.lora.list()
for a in adapters:
    print(f"{a['name']}: {a['path']}")

# Register a new adapter
client.lora.register(
    "coding-assistant",
    "/path/to/code-lora.gguf",
    scale=0.8
)

# Get adapter details
adapter = client.lora.get("coding-assistant")

# Remove an adapter
client.lora.remove("coding-assistant")

Audio: Speech-to-Text & Text-to-Speech (New in v0.2.4)

Transcribe audio files and generate speech completely offline:

# Setup: Download Whisper model for transcription
client.audio.setup_whisper("base")  # Options: tiny, base, small, medium, large

# Setup: Download a voice for text-to-speech  
client.audio.setup_piper("en_US-amy-medium")

# Transcribe audio (Speech-to-Text)
text = client.audio.transcribe("recording.wav", model="base")
print(f"Transcription: {text}")

# Transcribe with options
text = client.audio.transcribe(
    "recording.mp3",
    model="small",       # Larger = more accurate
    language="en",       # Optional: specify language
    response_format="text"  # text, json, verbose_json
)

# Generate speech (Text-to-Speech)
audio_data = client.audio.speak("Hello, how are you today?", voice="en_US-amy-medium")
with open("output.wav", "wb") as f:
    f.write(audio_data)

# Text-to-speech with options
audio_data = client.audio.speak(
    "Welcome to OffGrid!",
    voice="en_US-amy-medium",
    speed=1.0,              # 0.5 = slow, 2.0 = fast
    response_format="wav"   # wav, mp3, opus, flac
)

# List available voices
voices = client.audio.voices()
for v in voices:
    print(f"- {v['id']}: {v['language']}")

# List Whisper models
models = client.audio.models()
for m in models:
    status = "✓ installed" if m["installed"] else "✗ not installed"
    print(f"- {m['id']} ({m['size']}): {status}")

# Check audio status
status = client.audio.status()
print(f"Whisper installed: {status['whisper']['installed']}")
print(f"Piper installed: {status['piper']['installed']}")

Available Whisper Models:

Model	Size	RAM	Speed	Quality
tiny	75MB	~1GB	Fastest	Basic
base	142MB	~1GB	Fast	Good
small	466MB	~2GB	Medium	Better
medium	1.5GB	~5GB	Slower	Great
large	2.9GB	~10GB	Slowest	Best

Popular Voices:

en_US-amy-medium - American English, female
en_US-ryan-medium - American English, male
en_GB-alba-medium - British English, female
de_DE-thorsten-medium - German, male
fr_FR-siwis-medium - French, female

System Configuration (New in v0.2.3)

# Get server configuration and feature flags
config = client.config()
print(f"Version: {config['version']}")
print(f"Multi-user mode: {config['multi_user_mode']}")
print(f"Agent enabled: {config['features']['agent']}")

# Get real-time system stats
stats = client.system_stats()
print(f"CPU: {stats['cpu_percent']}%")
print(f"Memory: {stats['memory_percent']}%")

Embeddings

# Single text
embedding = client.embed("Hello world")
print(f"Dimensions: {len(embedding)}")

# Multiple texts
embeddings = client.embed(["Hello", "World", "AI"])

System Info

# Check server health
if client.health():
    print("Server is running")

# Get detailed info
info = client.info()
print(f"Uptime: {info['uptime']}")
print(f"CPU: {info['system']['cpu_percent']}%")
print(f"Memory: {info['system']['memory_percent']}%")

Configuration

from offgrid import Client

# Default: localhost:11611
client = Client()

# Custom server URL
client = Client(host="http://192.168.1.100:11611")

# Just hostname (auto-adds http://)
client = Client(host="192.168.1.100:11611")

# With API key authentication
client = Client(api_key="your-secret-key")

# Custom timeout (for slow models)
client = Client(timeout=600)  # 10 minutes

# Combined options
client = Client(
    host="http://192.168.1.100:11611",
    api_key="your-secret-key",
    timeout=300
)

Automatic Retry

The client automatically retries failed requests with exponential backoff:

Up to 3 retry attempts
Delays: 1s → 2s → 4s between retries
Only retries on connection errors, not HTTP errors

# Retries are automatic
response = client.chat("Hello!")  # Will retry on transient failures

Error Handling

from offgrid import Client, OffGridError

client = Client()

try:
    response = client.chat("Hello")
except OffGridError as e:
    print(f"Error: {e.message}")
    if e.code:
        print(f"Code: {e.code}")

Requirements

Python 3.8+
OffGrid LLM server running (offgrid serve)
No external dependencies (uses only stdlib)

License

MIT License

Custom server URL

client = Client(host="http://192.168.1.100:11611")

Just hostname (auto-adds http://)

client = Client(host="192.168.1.100:11611")

Custom timeout (for slow models)

client = Client(timeout=600) # 10 minutes


## Error Handling

```python
from offgrid import Client, OffGridError

client = Client()

try:
    response = client.chat("Hello")
except OffGridError as e:
    print(f"Error: {e.message}")
    if e.code:
        print(f"Code: {e.code}")

Requirements

Python 3.8+
OffGrid LLM server running (offgrid serve)
No external dependencies (uses only stdlib)

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.11

Dec 29, 2025

This version

0.1.8

Dec 19, 2025

0.1.7

Dec 17, 2025

0.1.6

Dec 12, 2025

0.1.3

Dec 7, 2025

0.1.2

Dec 3, 2025

0.1.1

Dec 2, 2025

0.1.0

Dec 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

offgrid-0.1.8.tar.gz (26.7 kB view details)

Uploaded Dec 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

offgrid-0.1.8-py3-none-any.whl (23.1 kB view details)

Uploaded Dec 19, 2025 Python 3

File details

Details for the file offgrid-0.1.8.tar.gz.

File metadata

Download URL: offgrid-0.1.8.tar.gz
Upload date: Dec 19, 2025
Size: 26.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for offgrid-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`f41216c442394aba3ee4855c1f28328bf53a9566a73e9fc7f6aef6b143d7ac71`
MD5	`5c194d5e261fcdc55e75e00fb50bffd0`
BLAKE2b-256	`312e43a6284c8ae6ad888df4a642a93b7516177545cc86bfc0f0502ffd8fdef5`

See more details on using hashes here.

File details

Details for the file offgrid-0.1.8-py3-none-any.whl.

File metadata

Download URL: offgrid-0.1.8-py3-none-any.whl
Upload date: Dec 19, 2025
Size: 23.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for offgrid-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bb9d3bab8018c7a63ef1399216d281be134440775bbb3bcd9a012fef30e5e2b4`
MD5	`b0ffdfefaed8b130fea63364d726db21`
BLAKE2b-256	`072f4e0f27b01dc57aa42ab5c6855639d2af900ae8b1a4f5725191a5f0d96790`

See more details on using hashes here.

offgrid 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OffGrid Python Client

Installation

Quick Start

Full Usage

Chat

API Key Authentication

Session Management

Server Statistics

Model Management

Knowledge Base (RAG)

AI Agents (New in v0.2.3)

MCP Integration (New in v0.2.3)

LoRA Adapters (New in v0.2.3)

Audio: Speech-to-Text & Text-to-Speech (New in v0.2.4)

System Configuration (New in v0.2.3)

Embeddings

System Info

Configuration

Automatic Retry

Error Handling

Requirements

Links

License

Custom server URL

Just hostname (auto-adds http://)

Custom timeout (for slow models)

Requirements

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes