Skip to main content

A clean interface for interacting with the Lemonade LLM server

Project description

🍋 Lemonade Python SDK

License: MIT Python 3.8+

A robust, production-grade Python wrapper for the Lemonade C++ Backend.

This SDK provides a clean, pythonic interface for interacting with local LLMs running on Lemonade. It was built to power the Sorana (a personal AI knowledge workspace), extracting the core integration logic into a standalone, open-source library for the developer community.

🚀 Key Features

  • Auto-Discovery: Automatically scans 8 discrete ports (8000, 8020, 8040, 8060, 8080, 9000, 13305, 11434) to find active Lemonade instances. Distinguishes between real Ollama and Lemonade on port 11434.
  • Low-Overhead Architecture: Designed as a thin, efficient wrapper to leverage Lemonade's C++ performance with minimal Python latency.
  • Health Checks & Server Stats: Lightweight /api/v1/health endpoint plus get_stats() for token usage, requests served, and performance metrics.
  • Type-Safe Client: Full Python type hinting for better developer experience (IDE autocompletion).
  • Model Labels & Capabilities: Detect vision, reasoning, coding, and other model capabilities via the official Lemonade labels system.
  • Embeddings API: Generate text embeddings for semantic search, RAG, and clustering (FLM & llamacpp backends).
  • Audio API: Whisper speech-to-text and Kokoro text-to-speech.
  • Reranking API: Reorder documents by relevance for better RAG results.
  • Image Generation: Create images from text prompts using Stable Diffusion.
  • WebSocket Streaming: Real-time audio transcription with VAD.
  • Model Cache Utilities: Check local HuggingFace cache for installed models without starting the server.

📦 Installation

pip install .

Alternatively, you can install it directly from GitHub:

pip install git+https://github.com/Tetramatrix/lemonade-python-sdk.git

⚡ Quick Start

1. Connecting to Lemonade

The SDK automatically handles port discovery, so you don't need to hardcode localhost:8000.

from lemonade_sdk import LemonadeClient, find_available_lemonade_port

# Auto-discover running instance
port = find_available_lemonade_port()
if port:
    client = LemonadeClient(base_url=f"http://localhost:{port}")
    if client.health_check():
        print(f"Connected to Lemonade on port {port}")
else:
    print("No Lemonade instance found.")

1.1 Health Check & Stats

# Check if server is alive (uses /api/v1/health endpoint)
if client.health_check():
    print("Lemonade is running!")

# Get server statistics (performance metrics from last request)
stats = client.get_stats()
if stats:
    print(f"Time to first token: {stats.get('time_to_first_token', 0):.2f}s")
    print(f"Tokens/sec: {stats.get('tokens_per_second', 0):.1f}")
    print(f"Input tokens: {stats.get('input_tokens', 0)}")
    print(f"Output tokens: {stats.get('output_tokens', 0)}")
    print(f"Prompt tokens: {stats.get('prompt_tokens', 0)}")

Available stats fields: time_to_first_token, tokens_per_second, input_tokens, output_tokens, decode_token_times, prompt_tokens.

2. Chat Completion

response = client.chat_completion(
    model="Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Hello World in C++"}
    ],
    temperature=0.7
)

print(response['choices'][0]['message']['content'])

3. Model Management

# List all available models
models = client.list_models()
for m in models:
    print(f"Found model: {m['id']}")

# Load a specific model into memory
client.load_model("Mistral-7B-v0.1")

3.1 Model Labels & Capability Detection

Lemonade models include a labels array that describes their capabilities. The SDK provides ModelInfo objects for easy capability checking.

from lemonade_sdk import ModelInfo, LemonadeClient

client = LemonadeClient()

# Get all models with capability info
models = client.list_models_with_info()
for model in models:
    print(f"{model.name}: {model.get_capabilities_summary()}")

# Check if a specific model supports vision
if client.has_vision("Qwen3.5-122B"):
    print("This model can process images!")

# Find all vision models
vision_models = client.list_vision_models()
for m in vision_models:
    print(f"Vision model: {m.name}")

# Check other capabilities
print(client.has_reasoning("Qwen3.5-122B"))   # Extended thinking
print(client.has_tool_calling("Qwen3.5-122B")) # Function calling
print(client.has_coding("Qwen3.5-122B"))       # Code generation
print(client.has_embeddings(model_id))          # Embedding model
print(client.has_reranking(model_id))           # Reranking model
print(client.has_image_generation(model_id))    # Stable Diffusion

Official Lemonade Labels:

Label Meaning
vision Model supports image input (VLM)
reasoning Model uses extended thinking/chain-of-thought
coding Optimized for code generation tasks
tool-calling Supports function/tool calling
embeddings Text embedding model
reranking Reranking model (for RAG pipelines)
image Image generation model (Stable Diffusion etc.)
transcription Audio transcription model (speech-to-text)
realtime-transcription Real-time audio transcription model
stt Speech-to-text model
speech Speech-capable model (STT or TTS)
hot Featured/recommended by Lemonade
custom User-added model

You can also use labels directly:

from lemonade_sdk import LABEL_VISION, ModelInfo

model = ModelInfo.from_api_response(api_data)
if model.has_label(LABEL_VISION):
    print("This is a vision model")

3.2 Model Installation & Cache Utilities - NEW

Check if models are installed in your local HuggingFace cache without starting the Lemonade server.

from lemonade_sdk import (
    get_hf_cache_dir,
    find_model_in_cache,
    is_model_installed,
    list_installed_models,
    is_whisper_model_installed,
    is_llm_model_installed,
)

# Get the HuggingFace cache directory
cache_dir = get_hf_cache_dir()
print(f"Cache location: {cache_dir}")

# Check if a specific model file exists
path = find_model_in_cache(
    repo_id="ggerganov/whisper.cpp",
    filename="ggml-large-v3-turbo.bin"
)
if path:
    print(f"Found at: {path}")

# Quick boolean check
if is_model_installed("ggerganov/whisper.cpp", "ggml-large-v3-turbo.bin"):
    print("Whisper model is installed!")

# Check Whisper models by name (handles naming variations)
if is_whisper_model_installed("Whisper-Large-v3-Turbo"):
    print("Whisper Large v3 Turbo is ready")

# Check LLM models with fuzzy matching
if is_llm_model_installed("Qwen3.5-122B-A10B-UD-IQ3_S"):
    print("LLM model is installed")

# List all installed model files
models = list_installed_models()
for m in models:
    print(f"{m['repo_id']}/{m['filename']} ({m['size_bytes'] / 1e9:.1f} GB)")

Use cases:

  • Verify model downloads completed before starting Lemonade
  • Build custom model managers with install-state awareness
  • Scan cache for disk usage reporting
  • Pre-flight checks before expensive operations

4. Embeddings (NEW)

Generate text embeddings for semantic search, RAG pipelines, and clustering.

# List available embedding models (filtered by 'embeddings' label)
embedding_models = client.list_embedding_models()
for model in embedding_models:
    print(f"Embedding model: {model['id']}")

# Generate embeddings for single text
response = client.embeddings(
    input="Hello, world!",
    model="nomic-embed-text-v1-GGUF"
)

embedding_vector = response["data"][0]["embedding"]
print(f"Vector length: {len(embedding_vector)}")

# Generate embeddings for multiple texts
texts = ["Text 1", "Text 2", "Text 3"]
response = client.embeddings(
    input=texts,
    model="nomic-embed-text-v1-GGUF"
)

for item in response["data"]:
    print(f"Text {item['index']}: {len(item['embedding'])} dimensions")

Supported Backends: (Lemonade)

  • FLM (FastFlowLM) - NPU-accelerated on Windows
  • llamacpp (.GGUF models) - CPU/GPU
  • ❌ ONNX/OGA - Not supported

5. Audio Transcription (Whisper) - NEW

Transcribe audio files to text using Whisper.

# List available audio models (Whisper + Kokoro)
audio_models = client.list_audio_models()
for model in audio_models:
    print(f"Audio model: {model['id']}")

# Transcribe an audio file
result = client.transcribe_audio(
    file_path="meeting.wav",
    model="Whisper-Tiny",
    language="en",  # Optional: None for auto-detection
    response_format="json"  # Options: "json", "text", "verbose_json"
)

if "error" not in result:
    print(f"Transcription: {result['text']}")
    # Verbose format also includes: duration, language, segments

Supported Models:

  • Whisper-Tiny (~39M parameters)
  • Whisper-Base (~74M parameters)
  • Whisper-Small (~244M parameters)

Supported Formats: WAV, MP3, FLAC, OGG, WebM

Backend: whisper.cpp (NPU-accelerated on Windows)

6. Text-to-Speech (Kokoro) - NEW

Generate speech from text using Kokoro TTS.

# Generate speech and save to file
client.text_to_speech(
    input_text="Hello, Lemonade can now speak!",
    model="kokoro-v1",
    voice="shimmer",  # Options: shimmer, corey, af_bella, am_adam, etc.
    speed=1.0,  # 0.5 - 2.0
    response_format="mp3",  # Options: mp3, wav, opus, pcm, aac, flac
    output_file="speech.mp3"  # Saves directly to file
)

# Or get audio bytes directly
audio_bytes = client.text_to_speech(
    input_text="Short test!",
    model="kokoro-v1",
    voice="corey",
    response_format="mp3"
)

with open("speech.mp3", "wb") as f:
    f.write(audio_bytes)

Supported Models:

  • kokoro-v1 (~82M parameters)

Available Voices:

Voice ID Language Gender
shimmer EN Female
corey EN Male
af_bella, af_nicole EN-US Female
am_adam, am_michael EN-US Male
bf_emma, bf_isabella EN-GB Female
bm_george, bm_lewis EN-GB Male

Audio Formats: MP3, WAV, OPUS, PCM, AAC, FLAC

Backend: Kokoros (.onnx, CPU)

7. Reranking (NEW)

Rerank documents based on relevance to a query.

result = client.rerank(
    query="What is the capital of France?",
    documents=[
        "Berlin is the capital of Germany.",
        "Paris is the capital of France.",
        "London is the capital of the UK."
    ],
    model="bge-reranker-v2-m3-GGUF"
)

# Results sorted by relevance score
for r in result["results"]:
    print(f"Rank {r['index']}: Score={r['relevance_score']:.2f}")

Supported Models:

  • bge-reranker-v2-m3-GGUF
  • Other BGE reranker models

Backend: llamacpp (.GGUF only, not available for FLM or OGA)

8. Image Generation (NEW)

Generate images from text prompts using Stable Diffusion.

# Generate and save to file
client.generate_image(
    prompt="A sunset over mountains with lake reflection",
    model="SD-Turbo",
    size="512x512",
    steps=4,  # SD-Turbo needs only 4 steps
    cfg_scale=1.0,
    output_file="sunset.png"
)

# Or get image bytes
image_bytes = client.generate_image(
    prompt="A cute cat",
    model="SD-Turbo"
)

Supported Models:

  • SD-Turbo (fast, 4 steps)
  • SDXL-Turbo (fast, 4 steps)
  • SD-1.5 (standard, 20 steps)
  • SDXL-Base-1.0 (high quality, 20 steps)

Image Sizes: 512x512, 1024x1024, or custom

Backend: stable-diffusion.cpp

9. WebSocket Streaming (NEW)

Real-time audio transcription with Voice Activity Detection (VAD).

from lemonade_sdk import WhisperWebSocketClient

# Create streaming client
stream = client.create_whisper_stream(model="Whisper-Tiny")
stream.connect()

# Set callback for transcriptions
def on_transcript(text):
    print(f"Heard: '{text}'")

stream.on_transcription(on_transcript)

# Stream audio file (PCM16, 16kHz, mono)
for text in stream.stream("audio.pcm"):
    pass  # Callback handles output

# Or stream from microphone (requires pyaudio)
# for text in stream.stream_microphone():
#     print(f"Heard: {text}")

stream.disconnect()

Audio Format: 16kHz, mono, PCM16 (16-bit)

Features:

  • Voice Activity Detection (VAD)
  • Real-time streaming
  • Microphone support (with pyaudio)
  • Configurable sensitivity

Backend: whisper.cpp (NPU-accelerated on Windows)

📚 Documentation

🖼️ Production Showcase:

This SDK powers 3 real-world production applications:

Sorana — personal AI knowledge workspace

  • SDK drives your personal AI knowledge workspace, a second brain that actually acts.
  • SDK handles auto-discovery and connection to local Lemonade instances (zero config)

Aicono — AI Desktop Icon Organizer (Featured in CHIP Magazine 🇩🇪)

  • SDK drives AI inference for grouping and categorizing desktop icons
  • Reached millions of readers via COMPUTERBILD and CHIP, two of Germany's largest IT publications

TabNeuron — AI-Powered Tab Organizer

  • SDK enables local AI inference for grouping and categorizing browser tabs
  • Desktop companion app + browser extension, demonstrating SDK viability in lightweight client architectures

🛠️ Project Structure

  • client.py: Main entry point for API interactions (chat, embeddings, audio, reranking, images, model management).
  • port_scanner.py: Utilities for detecting Lemonade instances across 8 discrete ports (8000, 8020, 8040, 8060, 8080, 9000, 13305, 11434).
  • model_discovery.py: Logic for fetching and parsing model metadata.
  • model_info.py: ModelInfo class with capability detection via labels (vision, reasoning, coding, etc.).
  • model_install_check.py: Utilities for checking HuggingFace cache for installed models.
  • request_builder.py: Helper functions to construct compliant payloads (chat, embeddings, audio, reranking, images).
  • audio_stream.py: WebSocket client for real-time audio transcription with VAD.
  • utils.py: Additional utility functions.
  • model_recovery.py: LemonadeModelRecovery class for handling model installation and recovery.

🤝 Contributing

Contributions are welcome! This project is intended to help the AMD Ryzen AI and Lemonade community build downstream applications faster.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lemonade_python_sdk-1.0.10.tar.gz (32.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lemonade_python_sdk-1.0.10-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file lemonade_python_sdk-1.0.10.tar.gz.

File metadata

  • Download URL: lemonade_python_sdk-1.0.10.tar.gz
  • Upload date:
  • Size: 32.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for lemonade_python_sdk-1.0.10.tar.gz
Algorithm Hash digest
SHA256 000b33f2f804d30b3f2f4df851bb768c8937c645bc833998dacfcd4047ceb1be
MD5 f126c6a2e65edd9a2a38dd4413ffa35c
BLAKE2b-256 f47d4eef96c3edd905e577db4122bac808b7cf1e15c21ff06a82f06e74ce0df3

See more details on using hashes here.

File details

Details for the file lemonade_python_sdk-1.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for lemonade_python_sdk-1.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 3aa823da4261e155bb80c40218edc6643ef0c4d0f023143569a5fb7be96b12db
MD5 2f5091b308a92336c24b6575f1a7e770
BLAKE2b-256 28b55eee3056e24193da508ea33d0d6a5a794306c8386860d0a7cc102785124d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page