A clean interface for interacting with the Lemonade LLM server

These details have not been verified by PyPI

Project links

Project description

🍋 Lemonade Python SDK

A robust, production-grade Python wrapper for the Lemonade C++ Backend.

This SDK provides a clean, pythonic interface for interacting with local LLMs running on Lemonade. It was built to power Sorana (a visual workspace for AI), extracting the core integration logic into a standalone, open-source library for the developer community.

🚀 Key Features

Auto-Discovery: Automatically scans multiple ports and hosts to find active Lemonade instances.
Low-Overhead Architecture: Designed as a thin, efficient wrapper to leverage Lemonade's C++ performance with minimal Python latency.
Health Checks & Server Stats: Lightweight /api/v1/health endpoint plus get_stats() for token usage, requests served, and performance metrics.
Type-Safe Client: Full Python type hinting for better developer experience (IDE autocompletion).
Model Management: Simple API to load, unload, and list models dynamically.
Embeddings API: Generate text embeddings for semantic search, RAG, and clustering (FLM & llamacpp backends).
Audio API: Whisper speech-to-text and Kokoro text-to-speech.
Reranking API: Reorder documents by relevance for better RAG results.
Image Generation: Create images from text prompts using Stable Diffusion.
WebSocket Streaming: Real-time audio transcription with VAD.

📦 Installation

pip install .

Alternatively, you can install it directly from GitHub:

pip install git+https://github.com/Tetramatrix/lemonade-python-sdk.git

⚡ Quick Start

1. Connecting to Lemonade

The SDK automatically handles port discovery, so you don't need to hardcode localhost:8000.

from lemonade_sdk import LemonadeClient, find_available_lemonade_port

# Auto-discover running instance
port = find_available_lemonade_port()
if port:
    client = LemonadeClient(base_url=f"http://localhost:{port}")
    if client.health_check():
        print(f"Connected to Lemonade on port {port}")
else:
    print("No Lemonade instance found.")

1.1 Health Check & Stats

# Check if server is alive (uses /api/v1/health endpoint)
if client.health_check():
    print("Lemonade is running!")

# Get server statistics (performance metrics from last request)
stats = client.get_stats()
if stats:
    print(f"Time to first token: {stats.get('time_to_first_token', 0):.2f}s")
    print(f"Tokens/sec: {stats.get('tokens_per_second', 0):.1f}")
    print(f"Input tokens: {stats.get('input_tokens', 0)}")
    print(f"Output tokens: {stats.get('output_tokens', 0)}")
    print(f"Prompt tokens: {stats.get('prompt_tokens', 0)}")

Available stats fields: time_to_first_token, tokens_per_second, input_tokens, output_tokens, decode_token_times, prompt_tokens.

2. Chat Completion

response = client.chat_completion(
    model="Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Hello World in C++"}
    ],
    temperature=0.7
)

print(response['choices'][0]['message']['content'])

3. Model Management

# List all available models
models = client.list_models()
for m in models:
    print(f"Found model: {m['id']}")

# Load a specific model into memory
client.load_model("Mistral-7B-v0.1")

4. Embeddings (NEW)

Generate text embeddings for semantic search, RAG pipelines, and clustering.

# List available embedding models (filtered by 'embeddings' label)
embedding_models = client.list_embedding_models()
for model in embedding_models:
    print(f"Embedding model: {model['id']}")

# Generate embeddings for single text
response = client.embeddings(
    input="Hello, world!",
    model="nomic-embed-text-v1-GGUF"
)

embedding_vector = response["data"][0]["embedding"]
print(f"Vector length: {len(embedding_vector)}")

# Generate embeddings for multiple texts
texts = ["Text 1", "Text 2", "Text 3"]
response = client.embeddings(
    input=texts,
    model="nomic-embed-text-v1-GGUF"
)

for item in response["data"]:
    print(f"Text {item['index']}: {len(item['embedding'])} dimensions")

Supported Backends: (Lemonade)

✅ FLM (FastFlowLM) - NPU-accelerated on Windows
✅ llamacpp (.GGUF models) - CPU/GPU
❌ ONNX/OGA - Not supported

5. Audio Transcription (Whisper) - NEW

Transcribe audio files to text using Whisper.

# List available audio models (Whisper + Kokoro)
audio_models = client.list_audio_models()
for model in audio_models:
    print(f"Audio model: {model['id']}")

# Transcribe an audio file
result = client.transcribe_audio(
    file_path="meeting.wav",
    model="Whisper-Tiny",
    language="en",  # Optional: None for auto-detection
    response_format="json"  # Options: "json", "text", "verbose_json"
)

if "error" not in result:
    print(f"Transcription: {result['text']}")
    # Verbose format also includes: duration, language, segments

Supported Models:

Whisper-Tiny (~39M parameters)
Whisper-Base (~74M parameters)
Whisper-Small (~244M parameters)

Supported Formats: WAV, MP3, FLAC, OGG, WebM

Backend: whisper.cpp (NPU-accelerated on Windows)

6. Text-to-Speech (Kokoro) - NEW

Generate speech from text using Kokoro TTS.

# Generate speech and save to file
client.text_to_speech(
    input_text="Hello, Lemonade can now speak!",
    model="kokoro-v1",
    voice="shimmer",  # Options: shimmer, corey, af_bella, am_adam, etc.
    speed=1.0,  # 0.5 - 2.0
    response_format="mp3",  # Options: mp3, wav, opus, pcm, aac, flac
    output_file="speech.mp3"  # Saves directly to file
)

# Or get audio bytes directly
audio_bytes = client.text_to_speech(
    input_text="Short test!",
    model="kokoro-v1",
    voice="corey",
    response_format="mp3"
)

with open("speech.mp3", "wb") as f:
    f.write(audio_bytes)

Supported Models:

kokoro-v1 (~82M parameters)

Available Voices:

Voice ID	Language	Gender
`shimmer`	EN	Female
`corey`	EN	Male
`af_bella`, `af_nicole`	EN-US	Female
`am_adam`, `am_michael`	EN-US	Male
`bf_emma`, `bf_isabella`	EN-GB	Female
`bm_george`, `bm_lewis`	EN-GB	Male

Audio Formats: MP3, WAV, OPUS, PCM, AAC, FLAC

Backend: Kokoros (.onnx, CPU)

7. Reranking (NEW)

Rerank documents based on relevance to a query.

result = client.rerank(
    query="What is the capital of France?",
    documents=[
        "Berlin is the capital of Germany.",
        "Paris is the capital of France.",
        "London is the capital of the UK."
    ],
    model="bge-reranker-v2-m3-GGUF"
)

# Results sorted by relevance score
for r in result["results"]:
    print(f"Rank {r['index']}: Score={r['relevance_score']:.2f}")

Supported Models:

bge-reranker-v2-m3-GGUF
Other BGE reranker models

Backend: llamacpp (.GGUF only, not available for FLM or OGA)

8. Image Generation (NEW)

Generate images from text prompts using Stable Diffusion.

# Generate and save to file
client.generate_image(
    prompt="A sunset over mountains with lake reflection",
    model="SD-Turbo",
    size="512x512",
    steps=4,  # SD-Turbo needs only 4 steps
    cfg_scale=1.0,
    output_file="sunset.png"
)

# Or get image bytes
image_bytes = client.generate_image(
    prompt="A cute cat",
    model="SD-Turbo"
)

Supported Models:

SD-Turbo (fast, 4 steps)
SDXL-Turbo (fast, 4 steps)
SD-1.5 (standard, 20 steps)
SDXL-Base-1.0 (high quality, 20 steps)

Image Sizes: 512x512, 1024x1024, or custom

Backend: stable-diffusion.cpp

9. WebSocket Streaming (NEW)

Real-time audio transcription with Voice Activity Detection (VAD).

from lemonade_sdk import WhisperWebSocketClient

# Create streaming client
stream = client.create_whisper_stream(model="Whisper-Tiny")
stream.connect()

# Set callback for transcriptions
def on_transcript(text):
    print(f"Heard: '{text}'")

stream.on_transcription(on_transcript)

# Stream audio file (PCM16, 16kHz, mono)
for text in stream.stream("audio.pcm"):
    pass  # Callback handles output

# Or stream from microphone (requires pyaudio)
# for text in stream.stream_microphone():
#     print(f"Heard: {text}")

stream.disconnect()

Audio Format: 16kHz, mono, PCM16 (16-bit)

Features:

Voice Activity Detection (VAD)
Real-time streaming
Microphone support (with pyaudio)
Configurable sensitivity

Backend: whisper.cpp (NPU-accelerated on Windows)

📚 Documentation

Embeddings API - Complete guide for using embeddings
Audio API - Whisper transcription and Kokoro TTS (documentation)
Implementation Plan - Audio API implementation roadmap
Lemonade Server Docs - Official Lemonade documentation

🖼️ Production Showcase:

This SDK powers 3 real-world production applications:

Sorana — AI Visual Workspace

SDK drives semantic AI grouping of files and folders onto a spatial 2D canvas
SDK handles auto-discovery and connection to local Lemonade instances (zero config)

Aicono — AI Desktop Icon Organizer (Featured in CHIP Magazine 🇩🇪)

SDK drives AI inference for grouping and categorizing desktop icons
Reached millions of readers via CHIP, one of Germany's largest IT publications

TabNeuron — AI-Powered Tab Organizer

SDK enables local AI inference for grouping and categorizing browser tabs
Desktop companion app + browser extension, demonstrating SDK viability in lightweight client architectures

🛠️ Project Structure

client.py: Main entry point for API interactions (chat, embeddings, audio, reranking, images, model management).
port_scanner.py: Utilities for detecting Lemonade instances across ports (8000-9000).
model_discovery.py: Logic for fetching and parsing model metadata.
request_builder.py: Helper functions to construct compliant payloads (chat, embeddings, audio, reranking, images).
audio_stream.py: WebSocket client for real-time audio transcription with VAD.
utils.py: Additional utility functions.

🤝 Contributing

Contributions are welcome! This project is intended to help the AMD Ryzen AI and Lemonade community build downstream applications faster.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.10

May 19, 2026

1.0.9

May 19, 2026

1.0.8

May 19, 2026

1.0.7

Apr 9, 2026

1.0.6

Apr 9, 2026

This version

1.0.5

Apr 5, 2026

1.0.4

Apr 5, 2026

1.0.3

Mar 21, 2026

1.0.2

Mar 21, 2026

1.0.1

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lemonade_python_sdk-1.0.5.tar.gz (18.8 kB view details)

Uploaded Apr 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lemonade_python_sdk-1.0.5-py3-none-any.whl (21.1 kB view details)

Uploaded Apr 5, 2026 Python 3

File details

Details for the file lemonade_python_sdk-1.0.5.tar.gz.

File metadata

Download URL: lemonade_python_sdk-1.0.5.tar.gz
Upload date: Apr 5, 2026
Size: 18.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for lemonade_python_sdk-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`2d63e5ff8c81d0565a3b93cdc2942dfe2194f41afa3c5432f89c5b38fc7d39b0`
MD5	`4360d2b13937974b2e02a6087bfc07fe`
BLAKE2b-256	`13d1300b0e1a5ecccdad8e14d1e46bdb597428bc098abb5f42d717f05fcf6efb`

See more details on using hashes here.

File details

Details for the file lemonade_python_sdk-1.0.5-py3-none-any.whl.

File metadata

Download URL: lemonade_python_sdk-1.0.5-py3-none-any.whl
Upload date: Apr 5, 2026
Size: 21.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for lemonade_python_sdk-1.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a7525332cc18c4f07d1ac8c639dcb6acd37d1c98e9f8a062228e8c7544e8f516`
MD5	`7a51c7d39f786d8191614154cdce9b99`
BLAKE2b-256	`f5a716b63159fbff2769f4216be75e2c3efb651df7516aa413e044d3ee204c8d`

See more details on using hashes here.

lemonade-python-sdk 1.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🍋 Lemonade Python SDK

🚀 Key Features

📦 Installation

⚡ Quick Start

1. Connecting to Lemonade

1.1 Health Check & Stats

2. Chat Completion

3. Model Management

4. Embeddings (NEW)

5. Audio Transcription (Whisper) - NEW

6. Text-to-Speech (Kokoro) - NEW

7. Reranking (NEW)

8. Image Generation (NEW)

9. WebSocket Streaming (NEW)

📚 Documentation

🖼️ Production Showcase:

🛠️ Project Structure

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes