Skip to main content

A clean interface for interacting with the Lemonade LLM server

Project description

🍋 Lemonade Python SDK

License: MIT Python 3.8+

A robust, production-grade Python wrapper for the Lemonade C++ Backend.

This SDK provides a clean, pythonic interface for interacting with local LLMs running on Lemonade. It was built to power Sorana (a visual workspace for AI), extracting the core integration logic into a standalone, open-source library for the developer community.

🚀 Key Features

  • Auto-Discovery: Automatically scans multiple ports and hosts to find active Lemonade instances.
  • Low-Overhead Architecture: Designed as a thin, efficient wrapper to leverage Lemonade's C++ performance with minimal Python latency.
  • Health Checks & Recovery: Built-in utilities to verify server status and handle connection drops.
  • Type-Safe Client: Full Python type hinting for better developer experience (IDE autocompletion).
  • Model Management: Simple API to load, unload, and list models dynamically.
  • Embeddings API: Generate text embeddings for semantic search, RAG, and clustering (FLM & llamacpp backends).

📦 Installation

pip install .

Alternatively, you can install it directly from GitHub:

pip install git+[https://github.com/Tetramatrix/lemonade-python-sdk.git](https://github.com/Tetramatrix/lemonade-python-sdk.git)

⚡ Quick Start

1. Connecting to Lemonade

The SDK automatically handles port discovery, so you don't need to hardcode localhost:8000.

from lemonade_integration.client import LemonadeClient
from lemonade_integration.port_scanner import find_available_lemonade_port

# Auto-discover running instance
port = find_available_lemonade_port()
if port:
    client = LemonadeClient(base_url=f"http://localhost:{port}")
    if client.health_check():
        print(f"Connected to Lemonade on port {port}")
else:
    print("No Lemonade instance found.")

2. Chat Completion

response = client.chat_completion(
    model="Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Hello World in C++"}
    ],
    temperature=0.7
)

print(response['choices'][0]['message']['content'])

3. Model Management

# List all available models
models = client.list_models()
for m in models:
    print(f"Found model: {m['id']}")

# Load a specific model into memory
client.load_model("Mistral-7B-v0.1")

4. Embeddings (NEW)

Generate text embeddings for semantic search, RAG pipelines, and clustering.

# List available embedding models (filtered by 'embeddings' label)
embedding_models = client.list_embedding_models()
for model in embedding_models:
    print(f"Embedding model: {model['id']}")

# Generate embeddings for single text
response = client.embeddings(
    input="Hello, world!",
    model="nomic-embed-text-v1-GGUF"
)

embedding_vector = response["data"][0]["embedding"]
print(f"Vector length: {len(embedding_vector)}")

# Generate embeddings for multiple texts
texts = ["Text 1", "Text 2", "Text 3"]
response = client.embeddings(
    input=texts,
    model="nomic-embed-text-v1-GGUF"
)

for item in response["data"]:
    print(f"Text {item['index']}: {len(item['embedding'])} dimensions")

Supported Backends:

  • FLM (FastFlowLM) - NPU-accelerated on Windows
  • llamacpp (.GGUF models) - CPU/GPU
  • ❌ ONNX/OGA - Not supported

🖼️ Production Showcase: Sorana

This SDK was extracted from the core engine of Sorana, a professional visual workspace for AI. It demonstrates the SDK's capability to handle complex, real-world requirements on AMD Ryzen AI hardware:

  • Low Latency: Powers sub-second response times for multi-model chat interfaces.
  • Dynamic Workflows: Manages the loading and unloading of 20+ different LLMs based on user activity to optimize local NPU/GPU memory.
  • Zero-Config UX: Uses the built-in port scanner to automatically connect the Sorana frontend to the Lemonade backend without user intervention.

🛠️ Project Structure

  • client.py: Main entry point for API interactions (chat, embeddings, model management).
  • port_scanner.py: Utilities for detecting Lemonade instances across ports (8000-9000).
  • model_discovery.py: Logic for fetching and parsing model metadata.
  • request_builder.py: Helper functions to construct compliant payloads (chat, embeddings).
  • utils.py: Additional utility functions.

📚 Documentation

🤝 Contributing

Contributions are welcome! This project is intended to help the AMD Ryzen AI and Lemonade community build downstream applications faster.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lemonade_python_sdk-1.0.1.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lemonade_python_sdk-1.0.1-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file lemonade_python_sdk-1.0.1.tar.gz.

File metadata

  • Download URL: lemonade_python_sdk-1.0.1.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for lemonade_python_sdk-1.0.1.tar.gz
Algorithm Hash digest
SHA256 ec4876ad2ea7605d5c2bc24c3ba48292cf63821f25d9d72d4a23d90e3d00a4de
MD5 6fd04ad74e46c4871f2d97c563dc6c43
BLAKE2b-256 bb281aa9bc38d97a8e8db18ac3c580c1b8ae3d780b3b83ff3f45baf10be365ca

See more details on using hashes here.

File details

Details for the file lemonade_python_sdk-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for lemonade_python_sdk-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 014c51e24946a1d36d8920dadaf0dcc8c689c3c2576204cab4e623b9dbe916f4
MD5 cb80276ad45cb87c7647dfa00c3db883
BLAKE2b-256 b99ad7ada3f9d841e5e39ef9689771312e4135358e5be8e5c30c6015749b0683

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page