An external provider for Llama Stack allowing for the use of RamaLama for inference.
Project description
ramalama-stack
An external provider for Llama Stack allowing for the use of RamaLama for inference.
Installing
You can install ramalama-stack from PyPI via pip install ramalama-stack
This will install Llama Stack and RamaLama as well if they are not installed already.
Usage
[!WARNING] The following workaround is currently needed to run this provider - see https://github.com/containers/ramalama-stack/issues/53 for more details
curl --create-dirs --output ~/.llama/providers.d/remote/inference/ramalama.yaml https://raw.githubusercontent.com/containers/ramalama-stack/refs/tags/v0.2.5/src/ramalama_stack/providers.d/remote/inference/ramalama.yaml curl --create-dirs --output ~/.llama/distributions/ramalama/ramalama-run.yaml https://raw.githubusercontent.com/containers/ramalama-stack/refs/tags/v0.2.5/src/ramalama_stack/ramalama-run.yaml
-
First you will need a RamaLama server running - see the RamaLama project docs for more information.
-
Ensure you set your
INFERENCE_MODELenvironment variable to the name of the model you have running via RamaLama. -
You can then run the RamaLama external provider via
llama stack run ~/.llama/distributions/ramalama/ramalama-run.yaml
[!NOTE] You can also run the RamaLama external provider inside of a container via Podman
podman run \ --net=host \ --env RAMALAMA_URL=http://0.0.0.0:8080 \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ quay.io/ramalama/llama-stack
This will start a Llama Stack server which will use port 8321 by default. You can test this works by configuring the Llama Stack Client to run against this server and sending a test request.
- If your client is running on the same machine as the server, you can run
llama-stack-client configure --endpoint http://0.0.0.0:8321 --api-key none - If your client is running on a different machine, you can run
llama-stack-client configure --endpoint http://<hostname>:8321 --api-key none - The client should give you a message similar to
Done! You can now use the Llama Stack Client CLI with endpoint <endpoint> - You can then test the server by running
llama-stack-client inference chat-completion --message "tell me a joke"which should return something like
ChatCompletionResponse(
completion_message=CompletionMessage(
content='A man walked into a library and asked the librarian, "Do you have any books on Pavlov\'s dogs
and Schrödinger\'s cat?" The librarian replied, "It rings a bell, but I\'m not sure if it\'s here or not."',
role='assistant',
stop_reason='end_of_turn',
tool_calls=[]
),
logprobs=None,
metrics=[
Metric(metric='prompt_tokens', value=14.0, unit=None),
Metric(metric='completion_tokens', value=63.0, unit=None),
Metric(metric='total_tokens', value=77.0, unit=None)
]
)
Llama Stack User Interface
Llama Stack includes an experimental user-interface, check it out here.
To deploy the UI, run this:
podman run -d --rm --network=container:ramalama --name=streamlit quay.io/redhat-et/streamlit_client:0.1.0
[!NOTE] If running on MacOS (not Linux),
--network=hostdoesn't work. You'll need to publish additional ports8321:8321and8501:8501with the ramalama serve command, then run withnetwork=container:ramalama.If running on Linux use
--network=hostor-p 8501:8501instead. The streamlit container will be able to access the ramalama endpoint with either.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ramalama_stack-0.2.5.tar.gz.
File metadata
- Download URL: ramalama_stack-0.2.5.tar.gz
- Upload date:
- Size: 159.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25cd60b436e93f743def57b51cdb3af04d09150df26f3e10ec79c3fa0e8cca09
|
|
| MD5 |
974da599f03bd35ba3e7c5fcca88c9ed
|
|
| BLAKE2b-256 |
175f2a7a87b5802a82492628beb47fdbc515910b3c2365ad776921ff3a6ca0f1
|
Provenance
The following attestation bundles were made for ramalama_stack-0.2.5.tar.gz:
Publisher:
pypi.yml on containers/ramalama-stack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ramalama_stack-0.2.5.tar.gz -
Subject digest:
25cd60b436e93f743def57b51cdb3af04d09150df26f3e10ec79c3fa0e8cca09 - Sigstore transparency entry: 268895079
- Sigstore integration time:
-
Permalink:
containers/ramalama-stack@395b3802f3a2d6aed416a46fd44a85e6b7b317b7 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/containers
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@395b3802f3a2d6aed416a46fd44a85e6b7b317b7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ramalama_stack-0.2.5-py3-none-any.whl.
File metadata
- Download URL: ramalama_stack-0.2.5-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96057559ce95205f32d058cb707c432bf0f11f219a6a8040b579477e7ac13891
|
|
| MD5 |
3cc985d33ac525cd03b1d7777dd4885f
|
|
| BLAKE2b-256 |
79f1632d723b3dc126a3980f4abad42f30840c5d7726a0ce23f95e7db5129d8f
|
Provenance
The following attestation bundles were made for ramalama_stack-0.2.5-py3-none-any.whl:
Publisher:
pypi.yml on containers/ramalama-stack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ramalama_stack-0.2.5-py3-none-any.whl -
Subject digest:
96057559ce95205f32d058cb707c432bf0f11f219a6a8040b579477e7ac13891 - Sigstore transparency entry: 268895082
- Sigstore integration time:
-
Permalink:
containers/ramalama-stack@395b3802f3a2d6aed416a46fd44a85e6b7b317b7 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/containers
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@395b3802f3a2d6aed416a46fd44a85e6b7b317b7 -
Trigger Event:
release
-
Statement type: