Provider-specific Swarmauri import package for local llama.cpp OpenAI-compatible chat, streaming, async, and batch LLM workflows.
Project description
Swarmauri llama.cpp LLM
swarmauri_llm_llamacpp provides the provider-specific Swarmauri import package for LlamaCppModel. It is intended for local or self-hosted llama.cpp deployments that expose an OpenAI-compatible API, not a hosted SaaS provider.
The runtime delegates to swarmauri_standard.llms.LlamaCppModel, which talks to a local llama.cpp server at http://localhost:8080/v1 by default, discovers models from /models, and sends chat completion requests to /chat/completions.
Why Use This Package?
- Keep local
llama.cppmodel usage explicit in Swarmauri applications. - Run self-hosted LLM inference through the same
Conversationworkflow used by other Swarmauri provider packages. - Support sync, async, streaming, and batch execution against an OpenAI-compatible local endpoint.
- Avoid coupling local model runtime choices to hosted-provider packages.
FAQ
What does swarmauri_llm_llamacpp install?
It installs the LlamaCppModel provider package entry point under swarmauri.llms.
Does this package download or bundle a model?
No. You must run your own llama.cpp server and make at least one model available through its OpenAI-compatible API.
Which endpoint does the runtime call?
By default the underlying runtime targets http://localhost:8080/v1, queries /models for model discovery, and calls /chat/completions for inference.
Does it require an API key?
Usually no for local development. The runtime can include an API key if your self-hosted endpoint is configured to require one.
Does it support streaming and async workflows?
Yes. LlamaCppModel supports predict, apredict, stream, astream, batch, and abatch.
Where should I verify model pricing?
There is no provider pricing surface for llama.cpp itself. Cost depends on your hardware, hosting, and model selection. See docs/LLM_PROVIDER_MODEL_PRICING_LINKS.md for the project-level note on local-runtime pricing.
Features
LlamaCppModelimport package for local or self-hostedllama.cppinference.- OpenAI-compatible chat completion requests against a local
llama.cppserver. - Automatic model discovery via the
/modelsendpoint. - Sync, async, streaming, and batch generation workflows.
- Optional JSON response mode and stop-sequence support.
- Compatibility with Python 3.10, 3.11, 3.12, 3.13, and 3.14.
Installation
uv add swarmauri_llm_llamacpp
pip install swarmauri_llm_llamacpp
Usage
Start a llama.cpp server that exposes an OpenAI-compatible API before creating the model.
Basic Chat Completion
from swarmauri_llm_llamacpp import LlamaCppModel
from swarmauri_standard.conversations.Conversation import Conversation
from swarmauri_standard.messages.HumanMessage import HumanMessage
conversation = Conversation()
conversation.add_message(HumanMessage(content="Explain Swarmauri in one paragraph."))
model = LlamaCppModel()
result = model.predict(conversation=conversation, max_tokens=200)
print(result.get_last().content)
Streaming
from swarmauri_llm_llamacpp import LlamaCppModel
from swarmauri_standard.conversations.Conversation import Conversation
from swarmauri_standard.messages.HumanMessage import HumanMessage
conversation = Conversation()
conversation.add_message(HumanMessage(content="Write a short haiku about local inference."))
model = LlamaCppModel()
for token in model.stream(conversation=conversation):
print(token, end="", flush=True)
Async
import asyncio
from swarmauri_llm_llamacpp import LlamaCppModel
from swarmauri_standard.conversations.Conversation import Conversation
from swarmauri_standard.messages.HumanMessage import HumanMessage
async def main() -> None:
conversation = Conversation()
conversation.add_message(HumanMessage(content="List three reasons to self-host an LLM."))
model = LlamaCppModel()
result = await model.apredict(conversation=conversation)
print(result.get_last().content)
# asyncio.run(main())
Examples
- Use
LlamaCppModelwhen you want Swarmauri agents to run against a localllama.cppserver instead of a remote provider. - Use
streamorastreamwhen the UI should render tokens as they are produced by the local model. - Use
enable_json=Truewhen your downstream logic expects structured JSON output from the model.
Related Packages
- swarmauri_llm_openai
- swarmauri_llm_mistral
- swarmauri_llm_deepinfra
- swarmauri_llm_groq
- swarmauri_llm_hyperbolic
- swarmauri_llm_leptonai
Foundational Swarmauri Packages
More Documentation
Best Practices
- Keep your
llama.cppserver configuration aligned with the OpenAI-compatible routes this runtime expects. - Confirm the local server is reachable before starting agent workflows that depend on it.
- Use self-hosted model names returned by
/modelsinstead of hard-coding assumptions about local model IDs. - Tune
temperature,max_tokens,stop, andenable_jsonto match your application contract.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swarmauri_llm_llamacpp-0.11.0.dev1.tar.gz.
File metadata
- Download URL: swarmauri_llm_llamacpp-0.11.0.dev1.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e004464923197106c9dcce004eaef136356348bb08b8c88ffe9e99b978179e48
|
|
| MD5 |
d7a5e616a65cb19435dbbfa4ea2422a8
|
|
| BLAKE2b-256 |
e6bfc097814fc33d7bd9b2e67dece60494d0f9cdf864452b9328377a43e97542
|
File details
Details for the file swarmauri_llm_llamacpp-0.11.0.dev1-py3-none-any.whl.
File metadata
- Download URL: swarmauri_llm_llamacpp-0.11.0.dev1-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b75f1a57266719df905236058e68cd46d2b61d0ac29764c1446e76e0adab4bc1
|
|
| MD5 |
a72b2e16caab0cb52657e019ec0c07a9
|
|
| BLAKE2b-256 |
4939e9e95ad36403fde0f16bf249062c2987b5ca7f3905b7603f947e628cf403
|