Skip to main content

A clean interface for interacting with the Lemonade LLM server

Project description

🍋 Lemonade Python SDK

License: MIT Python 3.8+

A robust, production-grade Python wrapper for the Lemonade C++ Backend.

This SDK provides a clean, pythonic interface for interacting with local LLMs running on Lemonade. It was built to power Sorana (a visual workspace for AI), extracting the core integration logic into a standalone, open-source library for the developer community.

🚀 Key Features

  • Auto-Discovery: Automatically scans multiple ports and hosts to find active Lemonade instances.
  • Low-Overhead Architecture: Designed as a thin, efficient wrapper to leverage Lemonade's C++ performance with minimal Python latency.
  • Health Checks & Recovery: Built-in utilities to verify server status and handle connection drops.
  • Type-Safe Client: Full Python type hinting for better developer experience (IDE autocompletion).
  • Model Management: Simple API to load, unload, and list models dynamically.
  • Embeddings API: Generate text embeddings for semantic search, RAG, and clustering (FLM & llamacpp backends).

📦 Installation

pip install .

Alternatively, you can install it directly from GitHub:

pip install git+[https://github.com/Tetramatrix/lemonade-python-sdk.git](https://github.com/Tetramatrix/lemonade-python-sdk.git)

⚡ Quick Start

1. Connecting to Lemonade

The SDK automatically handles port discovery, so you don't need to hardcode localhost:8000.

from lemonade_integration.client import LemonadeClient
from lemonade_integration.port_scanner import find_available_lemonade_port

# Auto-discover running instance
port = find_available_lemonade_port()
if port:
    client = LemonadeClient(base_url=f"http://localhost:{port}")
    if client.health_check():
        print(f"Connected to Lemonade on port {port}")
else:
    print("No Lemonade instance found.")

2. Chat Completion

response = client.chat_completion(
    model="Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Hello World in C++"}
    ],
    temperature=0.7
)

print(response['choices'][0]['message']['content'])

3. Model Management

# List all available models
models = client.list_models()
for m in models:
    print(f"Found model: {m['id']}")

# Load a specific model into memory
client.load_model("Mistral-7B-v0.1")

4. Embeddings (NEW)

Generate text embeddings for semantic search, RAG pipelines, and clustering.

# List available embedding models (filtered by 'embeddings' label)
embedding_models = client.list_embedding_models()
for model in embedding_models:
    print(f"Embedding model: {model['id']}")

# Generate embeddings for single text
response = client.embeddings(
    input="Hello, world!",
    model="nomic-embed-text-v1-GGUF"
)

embedding_vector = response["data"][0]["embedding"]
print(f"Vector length: {len(embedding_vector)}")

# Generate embeddings for multiple texts
texts = ["Text 1", "Text 2", "Text 3"]
response = client.embeddings(
    input=texts,
    model="nomic-embed-text-v1-GGUF"
)

for item in response["data"]:
    print(f"Text {item['index']}: {len(item['embedding'])} dimensions")

Supported Backends:

  • FLM (FastFlowLM) - NPU-accelerated on Windows
  • llamacpp (.GGUF models) - CPU/GPU
  • ❌ ONNX/OGA - Not supported

🖼️ Production Showcase: Sorana

This SDK was extracted from the core engine of Sorana, a professional visual workspace for AI. It demonstrates the SDK's capability to handle complex, real-world requirements on AMD Ryzen AI hardware:

  • Low Latency: Powers sub-second response times for multi-model chat interfaces.
  • Dynamic Workflows: Manages the loading and unloading of 20+ different LLMs based on user activity to optimize local NPU/GPU memory.
  • Zero-Config UX: Uses the built-in port scanner to automatically connect the Sorana frontend to the Lemonade backend without user intervention.

🛠️ Project Structure

  • client.py: Main entry point for API interactions (chat, embeddings, model management).
  • port_scanner.py: Utilities for detecting Lemonade instances across ports (8000-9000).
  • model_discovery.py: Logic for fetching and parsing model metadata.
  • request_builder.py: Helper functions to construct compliant payloads (chat, embeddings).
  • utils.py: Additional utility functions.

📚 Documentation

🤝 Contributing

Contributions are welcome! This project is intended to help the AMD Ryzen AI and Lemonade community build downstream applications faster.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lemonade_integration-1.0.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lemonade_integration-1.0.0-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file lemonade_integration-1.0.0.tar.gz.

File metadata

  • Download URL: lemonade_integration-1.0.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for lemonade_integration-1.0.0.tar.gz
Algorithm Hash digest
SHA256 817b9ad67e1b06c9fa116745ad3a5caeb9dbf862db74a1df8b28d2c51137a023
MD5 b098226d0293a4709f3f7bd696b2b94b
BLAKE2b-256 40056a69cc0cadd12940d97f99094dcc1ff117079dfd76133cc20c4f441098b2

See more details on using hashes here.

File details

Details for the file lemonade_integration-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for lemonade_integration-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fdeab6b4443d7666ed2b73c6c1d0f03ed776249877e494262512b63f4a44e252
MD5 28743bb5864d7bef3793196ad726a755
BLAKE2b-256 7dedb3b74fdb4501227b8ac3f0d195442fdc8df68cfd598812835f7dbd88a69e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page