Skip to main content

An SDK for programmatically running local AI.

Project description

Project logo

   

Programmable AI on any device.
Run LLMs locally on any hardware. If you can build a container you can deploy AI.

Ramalama Python SDK

Build local-first AI apps on top of the Ramalama CLI. The SDK provisions models in containers and exposes a simple API for on-device inference.

Capabilities

  • LLM chat with OpenAI-compatible endpoints for direct requests.
  • Speech-to-Text (STT) with Whisper (coming soon).

Installation

Requirements

  • Docker or Podman running locally.
  • Python 3.10+

pypi

pip install ramalama-sdk

Quick Start

Basic Chat

from ramalama_sdk import RamalamaModel

with RamalamaModel(model="tinyllama") as model:
    response = model.chat("How tall is Michael Jordan")
    print(response["content"])
Michael Jordan is 6 feet 6 inches (1.98 m) tall.

For multiturn conversations the chat method accepts an additional history argument which can also be used to set system prompts.

sys_prompt = {
  "role": "system", 
  "content": "Respond to all conversations as if you were a dog with variations of bark and woof."
}
history = [sys_prompt]
with RamalamaModel(model="tinyllama") as model:
    response = model.chat("How tall is Michael Jordan?", history)
    print(response["content"])
Woof woof. Bark bark bark. Rrr-woooooof.
Arf arf arf arf arf arf. Ruff!

Model Management

Models can be pulled from a variety of sources including HuggingFace, Ollama, ModelScope, any OCI registry, local files, and any downloadable URL.

with RamalamaModel(model="hf://ggml-org/gpt-oss-20b-GGUF") as model:
    response = model.chat("How tall is Michael Jordan")
    print(response["content"])

The full suite of supported prefixes can be found below.

Transport Prefixes/Schemes Description
huggingface huggingface://, hf://, hf.co/ HuggingFace model hub
modelscope modelscope://, ms:// ModelScope
ollama ollama://, ollama.com/library/ Ollama model library
rlcr rlcr:// Ramalama Container Registry
oci oci://, docker:// OCI container images / Docker registries
url http://, https:// Generic URLs (HTTP/HTTPS)
file file:// Local file paths

Runtime Customization

The Model exposes a variety of customization parameters including base_image which allows you to customize the model container runtime. This is especially useful if you need to run inference on custom hardware which requires a specifically compiled version of llama.cpp, vLLM, etc...

from ramalama_sdk import RamalamaModel

model = RamalamaModel(
    model="tinyllama",
    base_image=artifactory.corp.com/llama-runtime:prod,
    temp=0.7,
    ngl=20,
    max_tokens=256,
    threads=8,
    ctx_size=4096,
    timeout=30,
)
Field Type Description Default
model str Model name or identifier. required
base_image str Container image to use for serving, if different from config. quay.io/ramalama/ramalama
temp float Temperature override for sampling. 0.8
ngl int GPU layers override. -1 (all)
max_tokens int Maximum tokens for completions. 0 (unlimited)
threads int CPU threads override. -1 (all)
ctx_size int Context window override. 0 (loaded from the model)
timeout int Seconds to wait for server readiness. 30

Async Models

The Async model API is identical to the sync examples above.

from ramalama_sdk import AsyncRamalamaModel

async with AsyncRamalamaModel(model="tinyllama") as model:
    response = await model.chat("How tall is Michael Jordan")
    print(response["content"])

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ramalama_sdk-0.1.3.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ramalama_sdk-0.1.3-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file ramalama_sdk-0.1.3.tar.gz.

File metadata

  • Download URL: ramalama_sdk-0.1.3.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ramalama_sdk-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f4e8dc9eeab22600934642938376e3bf17dbdc79ec479b16bfe9e4b9f0e61b7c
MD5 e4314e243b193e8e01cbcdad48d758a8
BLAKE2b-256 7e2207b3b6fadf083066fe57fb933672e1a2053288a3019b6cf3994464663454

See more details on using hashes here.

File details

Details for the file ramalama_sdk-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: ramalama_sdk-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ramalama_sdk-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 924a0a3168f75bc345e974d991b4f8f637ca94c23dc9c450f847ae2667bc15fb
MD5 14162cc5bbf7572e1ebbc814cc699efd
BLAKE2b-256 6f310a41247a1d770adb33ea8f5e2e3d0e279be9f806e6d474282cd95b13662d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page