A clean interface for interacting with the Lemonade LLM server
Project description
🍋 Lemonade Python SDK
A robust, production-grade Python wrapper for the Lemonade C++ Backend.
This SDK provides a clean, pythonic interface for interacting with local LLMs running on Lemonade. It was built to power Sorana (a visual workspace for AI), extracting the core integration logic into a standalone, open-source library for the developer community.
🚀 Key Features
- Auto-Discovery: Automatically scans multiple ports and hosts to find active Lemonade instances.
- Low-Overhead Architecture: Designed as a thin, efficient wrapper to leverage Lemonade's C++ performance with minimal Python latency.
- Health Checks & Recovery: Built-in utilities to verify server status and handle connection drops.
- Type-Safe Client: Full Python type hinting for better developer experience (IDE autocompletion).
- Model Management: Simple API to load, unload, and list models dynamically.
- Embeddings API: Generate text embeddings for semantic search, RAG, and clustering (FLM & llamacpp backends).
📦 Installation
pip install .
Alternatively, you can install it directly from GitHub:
pip install git+[https://github.com/Tetramatrix/lemonade-python-sdk.git](https://github.com/Tetramatrix/lemonade-python-sdk.git)
⚡ Quick Start
1. Connecting to Lemonade
The SDK automatically handles port discovery, so you don't need to hardcode localhost:8000.
from lemonade_integration.client import LemonadeClient
from lemonade_integration.port_scanner import find_available_lemonade_port
# Auto-discover running instance
port = find_available_lemonade_port()
if port:
client = LemonadeClient(base_url=f"http://localhost:{port}")
if client.health_check():
print(f"Connected to Lemonade on port {port}")
else:
print("No Lemonade instance found.")
2. Chat Completion
response = client.chat_completion(
model="Llama-3-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Hello World in C++"}
],
temperature=0.7
)
print(response['choices'][0]['message']['content'])
3. Model Management
# List all available models
models = client.list_models()
for m in models:
print(f"Found model: {m['id']}")
# Load a specific model into memory
client.load_model("Mistral-7B-v0.1")
4. Embeddings (NEW)
Generate text embeddings for semantic search, RAG pipelines, and clustering.
# List available embedding models (filtered by 'embeddings' label)
embedding_models = client.list_embedding_models()
for model in embedding_models:
print(f"Embedding model: {model['id']}")
# Generate embeddings for single text
response = client.embeddings(
input="Hello, world!",
model="nomic-embed-text-v1-GGUF"
)
embedding_vector = response["data"][0]["embedding"]
print(f"Vector length: {len(embedding_vector)}")
# Generate embeddings for multiple texts
texts = ["Text 1", "Text 2", "Text 3"]
response = client.embeddings(
input=texts,
model="nomic-embed-text-v1-GGUF"
)
for item in response["data"]:
print(f"Text {item['index']}: {len(item['embedding'])} dimensions")
Supported Backends:
- ✅ FLM (FastFlowLM) - NPU-accelerated on Windows
- ✅ llamacpp (.GGUF models) - CPU/GPU
- ❌ ONNX/OGA - Not supported
🖼️ Production Showcase: Sorana
This SDK was extracted from the core engine of Sorana, a professional visual workspace for AI. It demonstrates the SDK's capability to handle complex, real-world requirements on AMD Ryzen AI hardware:
- Low Latency: Powers sub-second response times for multi-model chat interfaces.
- Dynamic Workflows: Manages the loading and unloading of 20+ different LLMs based on user activity to optimize local NPU/GPU memory.
- Zero-Config UX: Uses the built-in port scanner to automatically connect the Sorana frontend to the Lemonade backend without user intervention.
🛠️ Project Structure
- client.py: Main entry point for API interactions (chat, embeddings, model management).
- port_scanner.py: Utilities for detecting Lemonade instances across ports (8000-9000).
- model_discovery.py: Logic for fetching and parsing model metadata.
- request_builder.py: Helper functions to construct compliant payloads (chat, embeddings).
- utils.py: Additional utility functions.
📚 Documentation
- Embeddings API - Complete guide for using embeddings
- Lemonade Server Docs - Official Lemonade documentation
🤝 Contributing
Contributions are welcome! This project is intended to help the AMD Ryzen AI and Lemonade community build downstream applications faster.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lemonade_integration-1.0.0.tar.gz.
File metadata
- Download URL: lemonade_integration-1.0.0.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
817b9ad67e1b06c9fa116745ad3a5caeb9dbf862db74a1df8b28d2c51137a023
|
|
| MD5 |
b098226d0293a4709f3f7bd696b2b94b
|
|
| BLAKE2b-256 |
40056a69cc0cadd12940d97f99094dcc1ff117079dfd76133cc20c4f441098b2
|
File details
Details for the file lemonade_integration-1.0.0-py3-none-any.whl.
File metadata
- Download URL: lemonade_integration-1.0.0-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdeab6b4443d7666ed2b73c6c1d0f03ed776249877e494262512b63f4a44e252
|
|
| MD5 |
28743bb5864d7bef3793196ad726a755
|
|
| BLAKE2b-256 |
7dedb3b74fdb4501227b8ac3f0d195442fdc8df68cfd598812835f7dbd88a69e
|