Skip to main content

RamaLama is a command line tool for working with AI LLM models.

Project description

RAMALAMA logo

RamaLama

The RamaLama project's goal is to make working with AI boring through the use of OCI containers.

RamaLama tool facilitates local management and serving of AI Models.

On first run RamaLama inspects your system for GPU support, falling back to CPU support if no GPUs are present.

RamaLama uses container engines like Podman or Docker to pull the appropriate OCI image with all of the software necessary to run an AI Model for your systems setup.

Running in containers eliminates the need for users to configure the host system for AI. After the initialization, RamaLama runs the AI Models within a container based on the OCI image.

RamaLama then pulls AI Models from model registries. Starting a chatbot or a rest API service from a simple single command. Models are treated similarly to how Podman and Docker treat container images.

When both Podman and Docker are installed, RamaLama defaults to Podman, The RAMALAMA_CONTAINER_ENGINE=docker environment variable can override this behaviour. When neither are installed RamaLama will attempt to run the model with software on the local system.

RamaLama supports multiple AI model registries types called transports. Supported transports:

TRANSPORTS

Transports Web Site
HuggingFace huggingface.co
Ollama ollama.com
OCI Container Registries opencontainers.org
Examples: quay.io, Docker Hub, and Artifactory

RamaLama uses the Ollama registry transport by default. Use the RAMALAMA_TRANSPORTS environment variable to modify the default. export RAMALAMA_TRANSPORT=huggingface Changes RamaLama to use huggingface transport.

Individual model transports can be modifies when specifying a model via the huggingface://, oci://, or ollama:// prefix.

ramalama pull huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf

To make it easier for users, RamaLama uses shortname files, which container alias names for fully specified AI Models allowing users to specify the shorter names when referring to models. RamaLama reads shortnames.conf files if they exist . These files contain a list of name value pairs for specification of the model. The following table specifies the order which RamaLama reads the files . Any duplicate names that exist override previously defined shortnames.

Shortnames type Path
Distribution /usr/share/ramalama/shortnames.conf
Administrators /etc/ramamala/shortnames.conf
Users $HOME/.config/ramalama/shortnames.conf
$ cat /usr/share/ramalama/shortnames.conf
[shortnames]
  "tiny" = "ollama://tinyllama"
  "granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
  "granite:7b" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
  "ibm/granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
  "merlinite" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
  "merlinite:7b" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
...

Install

Install via PyPi

RamaLama is available via PyPi https://pypi.org/project/ramalama

pipx install ramalama

Install by script

Install RamaLama by running this one-liner (on macOS run without sudo):

Linux:

curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.sh | sudo bash

macOS:

curl -fsSL https://raw.githubusercontent.com/containers/ramalama/s/install.sh | bash

Hardware Support

Hardware Enabled
CPU :white_check_mark:
Apple Silicon GPU (Linux / Asahi) :white_check_mark:
Apple Silicon GPU (macOS) :white_check_mark:
Apple Silicon GPU (podman-machine) :white_check_mark:
Nvidia GPU (cuda) :x: Containerfile available but not published to quay.io
AMD GPU (rocm) :white_check_mark:

COMMANDS

Command Description
ramalama(1) primary RamaLama man page
ramalama-containers(1) list all RamaLama containers
ramalama-info(1) display RamaLama configuration information
ramalama-list(1) list all downloaded AI Models
ramalama-login(1) login to remote registry
ramalama-logout(1) logout from remote registry
ramalama-pull(1) pull AI Model from Model registry to local storage
ramalama-push(1) push AI Model from local storage to remote registry
ramalama-rm(1) remove AI Model from local storage
ramalama-run(1) run specified AI Model as a chatbot
ramalama-serve(1) serve REST API on specified AI Model
ramalama-stop(1) stop named container that is running AI Model
ramalama-version(1) display version of AI Model

Usage

Running Models

You can run a chatbot on a model using the run command. By default, it pulls from the ollama registry.

Note: RamaLama will inspect your machine for native GPU support and then will use a container engine like Podman to pull an OCI container image with the appropriate code and libraries to run the AI Model. This can take a long time to setup, but only on the first run.

$ ramalama run instructlab/merlinite-7b-lab
Copying blob 5448ec8c0696 [--------------------------------------] 0.0b / 63.6MiB (skipped: 0.0b = 0.00%)
Copying blob cbd7e392a514 [--------------------------------------] 0.0b / 65.3MiB (skipped: 0.0b = 0.00%)
Copying blob 5d6c72bcd967 done  208.5MiB / 208.5MiB (skipped: 0.0b = 0.00%)
Copying blob 9ccfa45da380 [--------------------------------------] 0.0b / 7.6MiB (skipped: 0.0b = 0.00%)
Copying blob 4472627772b1 [--------------------------------------] 0.0b / 120.0b (skipped: 0.0b = 0.00%)
>

After the initial container image has been downloaded, you can interact with different models, using the container image.

$ ramalama run granite-code
> Write a hello world application in python

print("Hello World")

In a different terminal window see the running podman container.

$ podman ps
CONTAINER ID  IMAGE                             COMMAND               CREATED        STATUS        PORTS       NAMES
91df4a39a360  quay.io/ramalama/ramalama:latest  /home/dwalsh/rama...  4 minutes ago  Up 4 minutes              gifted_volhard

Listing Models

You can list all models pulled into local storage.

$ ramalama list
NAME                                                                MODIFIED     SIZE
ollama://tiny-llm:latest                                            16 hours ago 5.5M
huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf 14 hours ago 460M
ollama://granite-code:3b                                            5 days ago   1.9G
ollama://granite-code:latest                                        1 day ago    1.9G
ollama://moondream:latest                                           6 days ago   791M

Pulling Models

You can pull a model using the pull command. By default, it pulls from the ollama registry.

$ ramalama pull granite-code
###################################################                       32.5%

Serving Models

You can serve multiple models using the serve command. By default, it pulls from the ollama registry.

$ ramalama serve --name mylama llama3

Stopping servers

You can stop a running model if it is running in a container.

$ ramalama stop mylama

Diagram

+---------------------------+
|                           |
| ramalama run granite-code |
|                           |
+-------+-------------------+
	|
	|
	|                                          +------------------+
	|                                          | Pull model layer |
	+----------------------------------------->| granite-code     |
						   +------------------+
						   | Repo options:    |
						   +-+-------+------+-+
						     |       |      |
						     v       v      v
					     +---------+ +------+ +----------+
					     | Hugging | | quay | | Ollama   |
					     | Face    | |      | | Registry |
					     +-------+-+ +---+--+ +-+--------+
						     |       |      |
						     v       v      v
						   +------------------+
						   | Start with       |
						   | llama.cpp and    |
						   | granite-code     |
						   | model            |
						   +------------------+

In development

Regard this alpha, everything is under development, so expect breaking changes, luckily it's easy to reset everything and re-install:

rm -rf /var/lib/ramalama # only required if running as root user
rm -rf $HOME/.local/share/ramalama

and install again.

Credit where credit is due

This project wouldn't be possible without the help of other projects like:

llama.cpp whisper.cpp vllm podman omlmd huggingface

so if you like this tool, give some of these repos a :star:, and hey, give us a :star: too while you are at it.

Community

Matrix

Contributors

Open to contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ramalama-0.1.2.tar.gz (40.9 kB view details)

Uploaded Source

Built Distribution

ramalama-0.1.2-py3-none-any.whl (51.5 kB view details)

Uploaded Python 3

File details

Details for the file ramalama-0.1.2.tar.gz.

File metadata

  • Download URL: ramalama-0.1.2.tar.gz
  • Upload date:
  • Size: 40.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for ramalama-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2c57ded4e6183680272d5a0b2a474be60c32735a39c0c7c9457eb05268f412f2
MD5 eefd4f3531c6fd1f8b2cb362dac563da
BLAKE2b-256 ecd1898c759c7ed6345a472e787e6e809a83377505d085bd0164c33c6e3422cd

See more details on using hashes here.

File details

Details for the file ramalama-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ramalama-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 51.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for ramalama-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 535b8ad6a03f36164ce761ac8add5b193f6253d590d15dc06d654d77f2f164ab
MD5 329e13935c11735d32578922ab42d966
BLAKE2b-256 3418a349fe57334a8801b7f3c28c59331a9f7b5669c05192aa145c689a9ef74c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page