Skip to main content

An SDK for programmatically running local AI.

Project description

Project logo

   

Programmable AI on any device.
Run LLMs locally on any hardware. If you can build a container you can deploy AI.

Ramalama Python SDK

Build local-first AI apps on top of the Ramalama CLI. The SDK provisions models in containers and exposes a simple API for on-device inference.

Capabilities

  • LLM chat with OpenAI-compatible endpoints for direct requests.
  • Speech-to-Text (STT) with Whisper (coming soon).

Installation

Requirements

  • Docker or Podman running locally.
  • Python 3.10+

pypi

pip install ramalama-sdk

Quick Start

Basic Chat

from ramalama_sdk import RamalamaModel

with RamalamaModel(model="tinyllama") as model:
    response = model.chat("How tall is Michael Jordan")
    print(response["content"])
Michael Jordan is 6 feet 6 inches (1.98 m) tall.

For multiturn conversations the chat method accepts an additional history argument which can also be used to set system prompts.

sys_prompt = {
  "role": "system", 
  "content": "Respond to all conversations as if you were a dog with variations of bark and woof."
}
history = [sys_prompt]
with RamalamaModel(model="tinyllama") as model:
    response = model.chat("How tall is Michael Jordan?", history)
    print(response["content"])
Woof woof. Bark bark bark. Rrr-woooooof.
Arf arf arf arf arf arf. Ruff!

Model Management

Models can be pulled from a variety of sources including HuggingFace, Ollama, ModelScope, any OCI registry, local files, and any downloadable URL.

with RamalamaModel(model="hf://ggml-org/gpt-oss-20b-GGUF") as model:
    response = model.chat("How tall is Michael Jordan")
    print(response["content"])

The full suite of supported prefixes can be found below.

Transport Prefixes/Schemes Description
huggingface huggingface://, hf://, hf.co/ HuggingFace model hub
modelscope modelscope://, ms:// ModelScope
ollama ollama://, ollama.com/library/ Ollama model library
rlcr rlcr:// Ramalama Container Registry
oci oci://, docker:// OCI container images / Docker registries
url http://, https:// Generic URLs (HTTP/HTTPS)
file file:// Local file paths

Runtime Customization

The Model exposes a variety of customization parameters including base_image which allows you to customize the model container runtime. This is especially useful if you need to run inference on custom hardware which requires a specifically compiled version of llama.cpp, vLLM, etc...

from ramalama_sdk import RamalamaModel

model = RamalamaModel(
    model="tinyllama",
    base_image=artifactory.corp.com/llama-runtime:prod,
    temp=0.7,
    ngl=20,
    max_tokens=256,
    threads=8,
    ctx_size=4096,
    timeout=30,
)
Field Type Description Default
model str Model name or identifier. required
base_image str Container image to use for serving, if different from config. quay.io/ramalama/ramalama
temp float Temperature override for sampling. 0.8
ngl int GPU layers override. -1 (all)
max_tokens int Maximum tokens for completions. 0 (unlimited)
threads int CPU threads override. -1 (all)
ctx_size int Context window override. 0 (loaded from the model)
timeout int Seconds to wait for server readiness. 30

Async Models

The Async model API is identical to the sync examples above.

from ramalama_sdk import AsyncRamalamaModel

async with AsyncRamalamaModel(model="tinyllama") as model:
    response = await model.chat("How tall is Michael Jordan")
    print(response["content"])

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ramalama_sdk-0.1.2.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ramalama_sdk-0.1.2-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file ramalama_sdk-0.1.2.tar.gz.

File metadata

  • Download URL: ramalama_sdk-0.1.2.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ramalama_sdk-0.1.2.tar.gz
Algorithm Hash digest
SHA256 bc935e548d86630bb6cca617a73c62ba639ee7db676e86724d8e584d43fcc556
MD5 898fe620d1c2122c752bb5f73a6e1af7
BLAKE2b-256 6c751aca24e0942d61ad1f2f8aa2ef43a1049e1d894c3c45db0c6c5ede71965d

See more details on using hashes here.

File details

Details for the file ramalama_sdk-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ramalama_sdk-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ramalama_sdk-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 eadbc8dbbade89feb3616422856b7b23af413fa8ccd7b0d16b0968f17ec78aba
MD5 98d14a4dc2ea17d57240aa1b330fa2e2
BLAKE2b-256 85517e2a08d3233b38187a188615080cd3cd23d82cca2f1c2c071112d23d7f1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page