An SDK for programmatically running local AI.

These details have not been verified by PyPI

Project links

Project description

Project logo

Programmable AI on any device.
Run LLMs locally on any hardware. If you can build a container you can deploy AI.

Ramalama Python SDK

Build local-first AI apps on top of the Ramalama CLI. The SDK provisions models in containers and exposes a simple API for on-device inference.

Capabilities

LLM chat with OpenAI-compatible endpoints for direct requests.
Speech-to-Text (STT) with Whisper (coming soon).

Installation

Requirements

Docker or Podman running locally.
Python 3.10+

pypi

pip install ramalama-sdk

Quick Start

Basic Chat

from ramalama_sdk import RamalamaModel

with RamalamaModel(model="tinyllama") as model:
    response = model.chat("How tall is Michael Jordan")
    print(response["content"])

Michael Jordan is 6 feet 6 inches (1.98 m) tall.

For multiturn conversations the chat method accepts an additional history argument which can also be used to set system prompts.

sys_prompt = {
  "role": "system", 
  "content": "Respond to all conversations as if you were a dog with variations of bark and woof."
}
history = [sys_prompt]
with RamalamaModel(model="tinyllama") as model:
    response = model.chat("How tall is Michael Jordan?", history)
    print(response["content"])

Woof woof. Bark bark bark. Rrr-woooooof.
Arf arf arf arf arf arf. Ruff!

Model Management

Models can be pulled from a variety of sources including HuggingFace, Ollama, ModelScope, any OCI registry, local files, and any downloadable URL.

with RamalamaModel(model="hf://ggml-org/gpt-oss-20b-GGUF") as model:
    response = model.chat("How tall is Michael Jordan")
    print(response["content"])

The full suite of supported prefixes can be found below.

Transport	Prefixes/Schemes	Description
huggingface	`huggingface://`, `hf://`, `hf.co/`	HuggingFace model hub
modelscope	`modelscope://`, `ms://`	ModelScope
ollama	`ollama://`, `ollama.com/library/`	Ollama model library
rlcr	`rlcr://`	Ramalama Container Registry
oci	`oci://`, `docker://`	OCI container images / Docker registries
url	`http://`, `https://`	Generic URLs (HTTP/HTTPS)
file	`file://`	Local file paths

Runtime Customization

The Model exposes a variety of customization parameters including base_image which allows you to customize the model container runtime. This is especially useful if you need to run inference on custom hardware which requires a specifically compiled version of llama.cpp, vLLM, etc...

from ramalama_sdk import RamalamaModel

model = RamalamaModel(
    model="tinyllama",
    base_image=artifactory.corp.com/llama-runtime:prod,
    temp=0.7,
    ngl=20,
    max_tokens=256,
    threads=8,
    ctx_size=4096,
    timeout=30,
)

Field	Type	Description	Default
model	str	Model name or identifier.	required
base_image	str	Container image to use for serving, if different from config.	quay.io/ramalama/ramalama
temp	float	Temperature override for sampling.	0.8
ngl	int	GPU layers override.	-1 (all)
max_tokens	int	Maximum tokens for completions.	0 (unlimited)
threads	int	CPU threads override.	-1 (all)
ctx_size	int	Context window override.	0 (loaded from the model)
timeout	int	Seconds to wait for server readiness.	30

Global SDK host defaults are configured once per process:

from ramalama_sdk import settings

settings.connection.bind_host = "127.0.0.1"
settings.connection.connect_host = "127.0.0.1"

When connect_host is not explicitly configured, the SDK resolves it automatically:

host process: 127.0.0.1
containerized SDK + Docker daemon: host.docker.internal
containerized SDK + Podman daemon: host.containers.internal

Async Models

The Async model API is identical to the sync examples above.

from ramalama_sdk import AsyncRamalamaModel

async with AsyncRamalamaModel(model="tinyllama") as model:
    response = await model.chat("How tall is Michael Jordan")
    print(response["content"])

Documentation

Python SDK: https://docs.ramalama.com/sdk/python
Quick start: https://docs.ramalama.com/sdk/python/quickstart

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.6

Feb 13, 2026

0.1.5

Feb 13, 2026

0.1.4

Feb 12, 2026

0.1.3

Feb 12, 2026

0.1.2

Jan 24, 2026

0.1.1

Jan 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ramalama_sdk-0.1.6.tar.gz (16.5 kB view details)

Uploaded Feb 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ramalama_sdk-0.1.6-py3-none-any.whl (12.4 kB view details)

Uploaded Feb 13, 2026 Python 3

File details

Details for the file ramalama_sdk-0.1.6.tar.gz.

File metadata

Download URL: ramalama_sdk-0.1.6.tar.gz
Upload date: Feb 13, 2026
Size: 16.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ramalama_sdk-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`accb5fe3cff5f7006e1d1168352033e359668b84ea69892f14c3e192fad27748`
MD5	`e8b79ff1db7b29f41aa6718652339114`
BLAKE2b-256	`6e403a7b47f791f46c1cb6ab500f68e8a8086d3265238020b781ac451884b048`

See more details on using hashes here.

File details

Details for the file ramalama_sdk-0.1.6-py3-none-any.whl.

File metadata

Download URL: ramalama_sdk-0.1.6-py3-none-any.whl
Upload date: Feb 13, 2026
Size: 12.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ramalama_sdk-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f2d15f3f8ff75f6f2ab4989a7e304d18a8837c4cc7dcd8e4a10697d10ef6db36`
MD5	`2eeea84c65526c0e5bc1abca9d271215`
BLAKE2b-256	`006a9f2a406a321cb7333df09cfba29c9707fb3e775de7eb422b3b3a05859cd5`

See more details on using hashes here.

ramalama-sdk 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Ramalama Python SDK

Capabilities

Installation

Requirements

pypi

Quick Start

Basic Chat

Model Management

Runtime Customization

Async Models

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes