Skip to main content

An SDK for programmatically running local AI.

Project description

Project logo

   

Programmable AI on any device.
Run LLMs locally on any hardware. If you can build a container you can deploy AI.

Ramalama Python SDK

Build local-first AI apps on top of the Ramalama CLI. The SDK provisions models in containers and exposes a simple API for on-device inference.

Capabilities

  • LLM chat with OpenAI-compatible endpoints for direct requests.
  • Speech-to-Text (STT) with Whisper (coming soon).

Installation

Requirements

  • Docker or Podman running locally.
  • Python 3.10+

pypi

pip install ramalama-sdk

Quick Start

Basic Chat

from ramalama_sdk import RamalamaModel

with RamalamaModel(model="tinyllama") as model:
    response = model.chat("How tall is Michael Jordan")
    print(response["content"])
Michael Jordan is 6 feet 6 inches (1.98 m) tall.

For multiturn conversations the chat method accepts an additional history argument which can also be used to set system prompts.

sys_prompt = {
  "role": "system", 
  "content": "Respond to all conversations as if you were a dog with variations of bark and woof."
}
history = [sys_prompt]
with RamalamaModel(model="tinyllama") as model:
    response = model.chat("How tall is Michael Jordan?", history)
    print(response["content"])
Woof woof. Bark bark bark. Rrr-woooooof.
Arf arf arf arf arf arf. Ruff!

Model Management

Models can be pulled from a variety of sources including HuggingFace, Ollama, ModelScope, any OCI registry, local files, and any downloadable URL.

with RamalamaModel(model="hf://ggml-org/gpt-oss-20b-GGUF") as model:
    response = model.chat("How tall is Michael Jordan")
    print(response["content"])

The full suite of supported prefixes can be found below.

Transport Prefixes/Schemes Description
huggingface huggingface://, hf://, hf.co/ HuggingFace model hub
modelscope modelscope://, ms:// ModelScope
ollama ollama://, ollama.com/library/ Ollama model library
rlcr rlcr:// Ramalama Container Registry
oci oci://, docker:// OCI container images / Docker registries
url http://, https:// Generic URLs (HTTP/HTTPS)
file file:// Local file paths

Runtime Customization

The Model exposes a variety of customization parameters including base_image which allows you to customize the model container runtime. This is especially useful if you need to run inference on custom hardware which requires a specifically compiled version of llama.cpp, vLLM, etc...

from ramalama_sdk import RamalamaModel

model = RamalamaModel(
    model="tinyllama",
    base_image=artifactory.corp.com/llama-runtime:prod,
    temp=0.7,
    ngl=20,
    max_tokens=256,
    threads=8,
    ctx_size=4096,
    timeout=30,
)
Field Type Description Default
model str Model name or identifier. required
base_image str Container image to use for serving, if different from config. quay.io/ramalama/ramalama
temp float Temperature override for sampling. 0.8
ngl int GPU layers override. -1 (all)
max_tokens int Maximum tokens for completions. 0 (unlimited)
threads int CPU threads override. -1 (all)
ctx_size int Context window override. 0 (loaded from the model)
timeout int Seconds to wait for server readiness. 30

Global SDK host defaults are configured once per process:

from ramalama_sdk import settings

settings.connection.bind_host = "127.0.0.1"
settings.connection.connect_host = "127.0.0.1"

When connect_host is not explicitly configured, the SDK resolves it automatically:

  • host process: 127.0.0.1
  • containerized SDK + Docker daemon: host.docker.internal
  • containerized SDK + Podman daemon: host.containers.internal

Async Models

The Async model API is identical to the sync examples above.

from ramalama_sdk import AsyncRamalamaModel

async with AsyncRamalamaModel(model="tinyllama") as model:
    response = await model.chat("How tall is Michael Jordan")
    print(response["content"])

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ramalama_sdk-0.1.6.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ramalama_sdk-0.1.6-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file ramalama_sdk-0.1.6.tar.gz.

File metadata

  • Download URL: ramalama_sdk-0.1.6.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ramalama_sdk-0.1.6.tar.gz
Algorithm Hash digest
SHA256 accb5fe3cff5f7006e1d1168352033e359668b84ea69892f14c3e192fad27748
MD5 e8b79ff1db7b29f41aa6718652339114
BLAKE2b-256 6e403a7b47f791f46c1cb6ab500f68e8a8086d3265238020b781ac451884b048

See more details on using hashes here.

File details

Details for the file ramalama_sdk-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: ramalama_sdk-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ramalama_sdk-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f2d15f3f8ff75f6f2ab4989a7e304d18a8837c4cc7dcd8e4a10697d10ef6db36
MD5 2eeea84c65526c0e5bc1abca9d271215
BLAKE2b-256 006a9f2a406a321cb7333df09cfba29c9707fb3e775de7eb422b3b3a05859cd5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page