Skip to main content

An SDK for programmatically running local AI.

Project description

Project logo

   

Programmable AI on any device.
Run LLMs locally on any hardware. If you can build a container you can deploy AI.

Ramalama Python SDK

Build local-first AI apps on top of the Ramalama CLI. The SDK provisions models in containers and exposes a simple API for on-device inference.

Capabilities

  • LLM chat with OpenAI-compatible endpoints for direct requests.
  • Speech-to-Text (STT) with Whisper (coming soon).

Installation

Requirements

  • Docker or Podman running locally.
  • Python 3.10+

pypi

pip install ramalama-sdk

Quick Start

Basic Chat

from ramalama_sdk import RamalamaModel

with RamalamaModel(model="tinyllama") as model:
    response = model.chat("How tall is Michael Jordan")
    print(response["content"])
Michael Jordan is 6 feet 6 inches (1.98 m) tall.

For multiturn conversations the chat method accepts an additional history argument which can also be used to set system prompts.

sys_prompt = {
  "role": "system", 
  "content": "Respond to all conversations as if you were a dog with variations of bark and woof."
}
history = [sys_prompt]
with RamalamaModel(model="tinyllama") as model:
    response = model.chat("How tall is Michael Jordan?", history)
    print(response["content"])
Woof woof. Bark bark bark. Rrr-woooooof.
Arf arf arf arf arf arf. Ruff!

Model Management

Models can be pulled from a variety of sources including HuggingFace, Ollama, ModelScope, any OCI registry, local files, and any downloadable URL.

with RamalamaModel(model="hf://ggml-org/gpt-oss-20b-GGUF") as model:
    response = model.chat("How tall is Michael Jordan")
    print(response["content"])

The full suite of supported prefixes can be found below.

Transport Prefixes/Schemes Description
huggingface huggingface://, hf://, hf.co/ HuggingFace model hub
modelscope modelscope://, ms:// ModelScope
ollama ollama://, ollama.com/library/ Ollama model library
rlcr rlcr:// Ramalama Container Registry
oci oci://, docker:// OCI container images / Docker registries
url http://, https:// Generic URLs (HTTP/HTTPS)
file file:// Local file paths

Runtime Customization

The Model exposes a variety of customization parameters including base_image which allows you to customize the model container runtime. This is especially useful if you need to run inference on custom hardware which requires a specifically compiled version of llama.cpp, vLLM, etc...

from ramalama_sdk import RamalamaModel

model = RamalamaModel(
    model="tinyllama",
    base_image=artifactory.corp.com/llama-runtime:prod,
    temp=0.7,
    ngl=20,
    max_tokens=256,
    threads=8,
    ctx_size=4096,
    timeout=30,
)
Field Type Description Default
model str Model name or identifier. required
base_image str Container image to use for serving, if different from config. quay.io/ramalama/ramalama
temp float Temperature override for sampling. 0.8
ngl int GPU layers override. -1 (all)
max_tokens int Maximum tokens for completions. 0 (unlimited)
threads int CPU threads override. -1 (all)
ctx_size int Context window override. 0 (loaded from the model)
timeout int Seconds to wait for server readiness. 30

Global SDK host defaults are configured once per process:

from ramalama_sdk import settings

settings.connection.bind_host = "127.0.0.1"
settings.connection.connect_host = "127.0.0.1"

When connect_host is not explicitly configured, the SDK resolves it automatically:

  • host process: 127.0.0.1
  • containerized SDK + Docker daemon: host.docker.internal
  • containerized SDK + Podman daemon: host.containers.internal

Async Models

The Async model API is identical to the sync examples above.

from ramalama_sdk import AsyncRamalamaModel

async with AsyncRamalamaModel(model="tinyllama") as model:
    response = await model.chat("How tall is Michael Jordan")
    print(response["content"])

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ramalama_sdk-0.1.5.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ramalama_sdk-0.1.5-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file ramalama_sdk-0.1.5.tar.gz.

File metadata

  • Download URL: ramalama_sdk-0.1.5.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ramalama_sdk-0.1.5.tar.gz
Algorithm Hash digest
SHA256 574900f58bd5b140c816dcb6d9a95f522daf69b69707e622be617388263d9365
MD5 59d814ad794fd37a3d84fd7a3fd31cbb
BLAKE2b-256 adae4c4eed4af2438690f790842d9fb15e02db9ed30fcb86bb135e23f2a7db37

See more details on using hashes here.

File details

Details for the file ramalama_sdk-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: ramalama_sdk-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ramalama_sdk-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e677d23632232ec72abe10c4e4a7f4c31b71e28a65ad00eb85db78f2f7c08278
MD5 3ede8066754ba703b0f1c6ba52b88764
BLAKE2b-256 54026f241eb4bd456969923990644ce71fa42d6ca996d1e01c02a6b933325332

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page