Skip to main content

Python client for OffGrid LLM - Run AI models completely offline

Project description

OffGrid Python Client

Python client library for OffGrid LLM - Run AI models completely offline.

Installation

pip install offgrid

Quick Start

import offgrid

# Connect to server
client = offgrid.Client()  # localhost:11611

# Chat
response = client.chat("What is Python?")
print(response)

# List available models
models = client.list_models()
for m in models:
    print(f"- {m['id']}")

Full Usage

Chat

from offgrid import Client

client = Client()

# Basic chat (uses first available model)
response = client.chat("Explain quantum computing")
print(response)

# Specify model
response = client.chat("Hello!", model="Llama-3.2-3B-Instruct")

# With system prompt
response = client.chat(
    "Write a poem about AI",
    model="Llama-3.2-3B-Instruct",
    system="You are a creative poet.",
    temperature=0.9
)

# Streaming
for chunk in client.chat("Tell me a long story", stream=True):
    print(chunk, end="", flush=True)

# Full conversation
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there! How can I help?"},
    {"role": "user", "content": "What's the weather like?"}
]
response = client.chat(messages=messages)

API Key Authentication

from offgrid import Client

# With API key (if server has OFFGRID_API_KEY set)
client = Client(api_key="your-secret-key")

# All requests automatically include the Authorization header
response = client.chat("Hello!")

Session Management

# Sessions preserve conversation context on the server
sessions = client.sessions

# Create a new session
session = sessions.create("my-chat")

# List all sessions
for s in sessions.list():
    print(f"- {s['name']}: {len(s.get('messages', []))} messages")

# Chat with session (context is preserved)
response1 = sessions.chat_with_session("my-chat", "My name is Alice")
response2 = sessions.chat_with_session("my-chat", "What is my name?")
# response2 will correctly reference "Alice"

# Add messages manually
sessions.add_message("my-chat", "user", "Hello")
sessions.add_message("my-chat", "assistant", "Hi there!")

# Get session details
details = sessions.get("my-chat")
print(f"Messages: {len(details['messages'])}")

# Delete session
sessions.delete("my-chat")

Server Statistics

# Get comprehensive server statistics
stats = client.stats()

# Server info
print(f"Uptime: {stats['server']['uptime']}")
print(f"Version: {stats['server']['version']}")

# Inference metrics
print(f"Total requests: {stats['inference']['aggregate']['total_requests']}")
print(f"Total tokens: {stats['inference']['aggregate']['total_tokens']}")

# System resources
print(f"CPU usage: {stats['resources']['cpu_usage_percent']:.1f}%")
print(f"Memory: {stats['resources']['memory_used_mb']}MB")

# RAG status
if stats['rag']['enabled']:
    print(f"Documents: {stats['rag']['documents']}")

Model Management

# List installed models
for model in client.list_models():
    print(model['id'])

# Search for models
results = client.models.search("llama", ram=8)
for model in results:
    print(f"{model['id']} - {model['size_gb']}GB")

# Download a model
client.models.download(
    "bartowski/Llama-3.2-3B-Instruct-GGUF",
    "Llama-3.2-3B-Instruct-Q4_K_M.gguf",
    progress_callback=lambda pct, done, total: print(f"\r{pct:.1f}%", end="")
)

# Delete a model
client.models.delete("old-model")

# Import from USB
imported = client.models.import_usb("/media/usb")

# Export to USB
client.models.export_usb("Llama-3.2-3B-Instruct-Q4_K_M", "/media/usb")

Knowledge Base (RAG)

# Add documents
client.kb.add("notes.md")
client.kb.add("meeting", content="Meeting notes from today...")
client.kb.add_directory("./docs", extensions=[".md", ".txt", ".pdf"])

# List documents
for doc in client.kb.list():
    print(f"{doc['id']}: {doc['chunks']} chunks")

# Search
results = client.kb.search("project deadline")
for r in results:
    print(f"[{r['score']:.2f}] {r['content'][:100]}...")

# Chat with Knowledge Base context
response = client.chat(
    "What are the main action items from the meeting?",
    use_kb=True
)

# Remove documents
client.kb.remove("notes.md")
client.kb.clear()  # Remove all

AI Agents (New in v0.2.3)

Run autonomous agents that can use tools to complete tasks:

# Run an agent task
result = client.agent.run(
    "Calculate 127 * 48 + 356",
    model="llama3.2:3b"
)
print(result["result"])

# List available tools
tools = client.agent.tools()
for tool in tools:
    status = "✓" if tool["enabled"] else "✗"
    print(f"[{status}] {tool['name']}: {tool['description']}")

# Toggle tools on/off
client.agent.disable_tool("shell")  # Security: disable shell access
client.agent.enable_tool("calculator")

# Complex multi-step tasks
result = client.agent.run(
    "Read the VERSION file and tell me what version it is",
    model="llama3.2:3b",
    max_steps=5
)

Built-in Tools:

  • calculator - Evaluate mathematical expressions
  • current_time - Get current date/time
  • read_file - Read file contents
  • write_file - Write content to files
  • list_files - List directory contents
  • shell - Execute shell commands
  • http_get - Make HTTP GET requests

MCP Integration (New in v0.2.3)

Connect external tools via Model Context Protocol:

# Add an MCP server
client.agent.mcp.add(
    "filesystem",
    "npx -y @modelcontextprotocol/server-filesystem /tmp"
)

# List configured servers
servers = client.agent.mcp.list()
for s in servers:
    print(f"{s['name']}: {s['url']}")

# Test a server connection
result = client.agent.mcp.test(url="npx -y @modelcontextprotocol/server-github")
print(f"Found {len(result.get('tools', []))} tools")

# Remove a server
client.agent.mcp.remove("filesystem")

LoRA Adapters (New in v0.2.3)

Manage LoRA adapters for fine-tuned models:

# List registered adapters
adapters = client.lora.list()
for a in adapters:
    print(f"{a['name']}: {a['path']}")

# Register a new adapter
client.lora.register(
    "coding-assistant",
    "/path/to/code-lora.gguf",
    scale=0.8
)

# Get adapter details
adapter = client.lora.get("coding-assistant")

# Remove an adapter
client.lora.remove("coding-assistant")

System Configuration (New in v0.2.3)

# Get server configuration and feature flags
config = client.config()
print(f"Version: {config['version']}")
print(f"Multi-user mode: {config['multi_user_mode']}")
print(f"Agent enabled: {config['features']['agent']}")

# Get real-time system stats
stats = client.system_stats()
print(f"CPU: {stats['cpu_percent']}%")
print(f"Memory: {stats['memory_percent']}%")

Embeddings

# Single text
embedding = client.embed("Hello world")
print(f"Dimensions: {len(embedding)}")

# Multiple texts
embeddings = client.embed(["Hello", "World", "AI"])

System Info

# Check server health
if client.health():
    print("Server is running")

# Get detailed info
info = client.info()
print(f"Uptime: {info['uptime']}")
print(f"CPU: {info['system']['cpu_percent']}%")
print(f"Memory: {info['system']['memory_percent']}%")

Configuration

from offgrid import Client

# Default: localhost:11611
client = Client()

# Custom server URL
client = Client(host="http://192.168.1.100:11611")

# Just hostname (auto-adds http://)
client = Client(host="192.168.1.100:11611")

# With API key authentication
client = Client(api_key="your-secret-key")

# Custom timeout (for slow models)
client = Client(timeout=600)  # 10 minutes

# Combined options
client = Client(
    host="http://192.168.1.100:11611",
    api_key="your-secret-key",
    timeout=300
)

Automatic Retry

The client automatically retries failed requests with exponential backoff:

  • Up to 3 retry attempts
  • Delays: 1s → 2s → 4s between retries
  • Only retries on connection errors, not HTTP errors
# Retries are automatic
response = client.chat("Hello!")  # Will retry on transient failures

Error Handling

from offgrid import Client, OffGridError

client = Client()

try:
    response = client.chat("Hello")
except OffGridError as e:
    print(f"Error: {e.message}")
    if e.code:
        print(f"Code: {e.code}")

Requirements

  • Python 3.8+
  • OffGrid LLM server running (offgrid serve)
  • No external dependencies (uses only stdlib)

Links

License

MIT License

Custom server URL

client = Client(host="http://192.168.1.100:11611")

Just hostname (auto-adds http://)

client = Client(host="192.168.1.100:11611")

Custom timeout (for slow models)

client = Client(timeout=600) # 10 minutes


## Error Handling

```python
from offgrid import Client, OffGridError

client = Client()

try:
    response = client.chat("Hello")
except OffGridError as e:
    print(f"Error: {e.message}")
    if e.code:
        print(f"Code: {e.code}")

Requirements

  • Python 3.8+
  • OffGrid LLM server running (offgrid serve)
  • No external dependencies (uses only stdlib)

Links

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

offgrid-0.1.3.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

offgrid-0.1.3-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file offgrid-0.1.3.tar.gz.

File metadata

  • Download URL: offgrid-0.1.3.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.8.20

File hashes

Hashes for offgrid-0.1.3.tar.gz
Algorithm Hash digest
SHA256 64aefbc2be6fc008f757ff14ffce537ff21e99e25466cacd6d842cd5ba549a0a
MD5 d34879e47f6a23b82e41b9e5fac68927
BLAKE2b-256 a6d1ec318242fa5f864ea1a9bd24552b9ce1aefc40ac80c60c3b8aac7ee4d3e5

See more details on using hashes here.

File details

Details for the file offgrid-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: offgrid-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.8.20

File hashes

Hashes for offgrid-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3de41151a008598ad0ccfc028f99b3f4466ce606e3efd75b8de0b8bc31cfc025
MD5 4d1ff69ddaa3ce98c7d2233cda9caea4
BLAKE2b-256 8995093d2742b684abea2c7d30972b0a0bf416b0fc01e1868549993ef2aeb32a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page