An SDK for programmatically running local AI.
Project description
Programmable AI on any device.
Run LLMs locally on any hardware. If you can build a container you can deploy AI.
Ramalama Python SDK
Build local-first AI apps on top of the Ramalama CLI. The SDK provisions models in containers and exposes a simple API for on-device inference.
Capabilities
- LLM chat with OpenAI-compatible endpoints for direct requests.
- Speech-to-Text (STT) with Whisper (coming soon).
Installation
Requirements
- Docker or Podman running locally.
- Python 3.10+
pypi
pip install ramalama-sdk
Quick Start
Basic Chat
from ramalama_sdk import RamalamaModel
with RamalamaModel(model="tinyllama") as model:
response = model.chat("How tall is Michael Jordan")
print(response["content"])
Michael Jordan is 6 feet 6 inches (1.98 m) tall.
For multiturn conversations the chat method accepts an additional history argument which can also be used to set system prompts.
sys_prompt = {
"role": "system",
"content": "Respond to all conversations as if you were a dog with variations of bark and woof."
}
history = [sys_prompt]
with RamalamaModel(model="tinyllama") as model:
response = model.chat("How tall is Michael Jordan?", history)
print(response["content"])
Woof woof. Bark bark bark. Rrr-woooooof.
Arf arf arf arf arf arf. Ruff!
Model Management
Models can be pulled from a variety of sources including HuggingFace, Ollama, ModelScope, any OCI registry, local files, and any downloadable URL.
with RamalamaModel(model="hf://ggml-org/gpt-oss-20b-GGUF") as model:
response = model.chat("How tall is Michael Jordan")
print(response["content"])
The full suite of supported prefixes can be found below.
| Transport | Prefixes/Schemes | Description |
|---|---|---|
| huggingface | huggingface://, hf://, hf.co/ |
HuggingFace model hub |
| modelscope | modelscope://, ms:// |
ModelScope |
| ollama | ollama://, ollama.com/library/ |
Ollama model library |
| rlcr | rlcr:// |
Ramalama Container Registry |
| oci | oci://, docker:// |
OCI container images / Docker registries |
| url | http://, https:// |
Generic URLs (HTTP/HTTPS) |
| file | file:// |
Local file paths |
Runtime Customization
The Model exposes a variety of customization parameters including base_image which allows you to customize the model container runtime. This is especially useful if you need to run inference on custom hardware which requires a specifically compiled version of llama.cpp, vLLM, etc...
from ramalama_sdk import RamalamaModel
model = RamalamaModel(
model="tinyllama",
base_image=artifactory.corp.com/llama-runtime:prod,
temp=0.7,
ngl=20,
max_tokens=256,
threads=8,
ctx_size=4096,
timeout=30,
)
| Field | Type | Description | Default |
|---|---|---|---|
| model | str | Model name or identifier. | required |
| base_image | str | Container image to use for serving, if different from config. | quay.io/ramalama/ramalama |
| temp | float | Temperature override for sampling. | 0.8 |
| ngl | int | GPU layers override. | -1 (all) |
| max_tokens | int | Maximum tokens for completions. | 0 (unlimited) |
| threads | int | CPU threads override. | -1 (all) |
| ctx_size | int | Context window override. | 0 (loaded from the model) |
| timeout | int | Seconds to wait for server readiness. | 30 |
Async Models
The Async model API is identical to the sync examples above.
from ramalama_sdk import AsyncRamalamaModel
async with AsyncRamalamaModel(model="tinyllama") as model:
response = await model.chat("How tall is Michael Jordan")
print(response["content"])
Documentation
- Python SDK: https://docs.ramalama.com/sdk/python
- Quick start: https://docs.ramalama.com/sdk/python/quickstart
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ramalama_sdk-0.1.4.tar.gz.
File metadata
- Download URL: ramalama_sdk-0.1.4.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54d1e1775265d209b15a9ceac48b1cec229392814e3ca8505ea31e9f47c5e447
|
|
| MD5 |
35ada2070402d8e998a5adaf1305f209
|
|
| BLAKE2b-256 |
674c5e90852155a07a19cb2add1c204e8e8cb0734f87066142a0c246885b7f2f
|
File details
Details for the file ramalama_sdk-0.1.4-py3-none-any.whl.
File metadata
- Download URL: ramalama_sdk-0.1.4-py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16b2002f9dcdaed003fc84ec39cc1af0c1c0bb84256555061afbdacf7935cff3
|
|
| MD5 |
4998d9e832217a337912c027c57f7c98
|
|
| BLAKE2b-256 |
10ba2ef46a79123075dd15cdc2e80828f3f425d8fc3eb3a0037f9a1ab0ed7230
|