Skip to main content

Provider-specific Swarmauri import package for local llama.cpp OpenAI-compatible chat, streaming, async, and batch LLM workflows.

Project description

Swarmauri Logo

PyPI - Downloads Hits PyPI - Python Version PyPI - License PyPI - swarmauri_llm_llamacpp Discord

Swarmauri llama.cpp LLM

swarmauri_llm_llamacpp provides the provider-specific Swarmauri import package for LlamaCppModel. It is intended for local or self-hosted llama.cpp deployments that expose an OpenAI-compatible API, not a hosted SaaS provider.

The runtime delegates to swarmauri_standard.llms.LlamaCppModel, which talks to a local llama.cpp server at http://localhost:8080/v1 by default, discovers models from /models, and sends chat completion requests to /chat/completions.

Why Use This Package?

  • Keep local llama.cpp model usage explicit in Swarmauri applications.
  • Run self-hosted LLM inference through the same Conversation workflow used by other Swarmauri provider packages.
  • Support sync, async, streaming, and batch execution against an OpenAI-compatible local endpoint.
  • Avoid coupling local model runtime choices to hosted-provider packages.

FAQ

What does swarmauri_llm_llamacpp install?

It installs the LlamaCppModel provider package entry point under swarmauri.llms.

Does this package download or bundle a model?

No. You must run your own llama.cpp server and make at least one model available through its OpenAI-compatible API.

Which endpoint does the runtime call?

By default the underlying runtime targets http://localhost:8080/v1, queries /models for model discovery, and calls /chat/completions for inference.

Does it require an API key?

Usually no for local development. The runtime can include an API key if your self-hosted endpoint is configured to require one.

Does it support streaming and async workflows?

Yes. LlamaCppModel supports predict, apredict, stream, astream, batch, and abatch.

Where should I verify model pricing?

There is no provider pricing surface for llama.cpp itself. Cost depends on your hardware, hosting, and model selection. See docs/LLM_PROVIDER_MODEL_PRICING_LINKS.md for the project-level note on local-runtime pricing.

Features

  • LlamaCppModel import package for local or self-hosted llama.cpp inference.
  • OpenAI-compatible chat completion requests against a local llama.cpp server.
  • Automatic model discovery via the /models endpoint.
  • Sync, async, streaming, and batch generation workflows.
  • Optional JSON response mode and stop-sequence support.
  • Compatibility with Python 3.10, 3.11, 3.12, 3.13, and 3.14.

Installation

uv add swarmauri_llm_llamacpp
pip install swarmauri_llm_llamacpp

Usage

Start a llama.cpp server that exposes an OpenAI-compatible API before creating the model.

Basic Chat Completion

from swarmauri_llm_llamacpp import LlamaCppModel
from swarmauri_standard.conversations.Conversation import Conversation
from swarmauri_standard.messages.HumanMessage import HumanMessage

conversation = Conversation()
conversation.add_message(HumanMessage(content="Explain Swarmauri in one paragraph."))

model = LlamaCppModel()
result = model.predict(conversation=conversation, max_tokens=200)

print(result.get_last().content)

Streaming

from swarmauri_llm_llamacpp import LlamaCppModel
from swarmauri_standard.conversations.Conversation import Conversation
from swarmauri_standard.messages.HumanMessage import HumanMessage

conversation = Conversation()
conversation.add_message(HumanMessage(content="Write a short haiku about local inference."))

model = LlamaCppModel()

for token in model.stream(conversation=conversation):
    print(token, end="", flush=True)

Async

import asyncio

from swarmauri_llm_llamacpp import LlamaCppModel
from swarmauri_standard.conversations.Conversation import Conversation
from swarmauri_standard.messages.HumanMessage import HumanMessage


async def main() -> None:
    conversation = Conversation()
    conversation.add_message(HumanMessage(content="List three reasons to self-host an LLM."))

    model = LlamaCppModel()
    result = await model.apredict(conversation=conversation)
    print(result.get_last().content)


# asyncio.run(main())

Examples

  • Use LlamaCppModel when you want Swarmauri agents to run against a local llama.cpp server instead of a remote provider.
  • Use stream or astream when the UI should render tokens as they are produced by the local model.
  • Use enable_json=True when your downstream logic expects structured JSON output from the model.

Related Packages

Foundational Swarmauri Packages

More Documentation

Best Practices

  • Keep your llama.cpp server configuration aligned with the OpenAI-compatible routes this runtime expects.
  • Confirm the local server is reachable before starting agent workflows that depend on it.
  • Use self-hosted model names returned by /models instead of hard-coding assumptions about local model IDs.
  • Tune temperature, max_tokens, stop, and enable_json to match your application contract.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarmauri_llm_llamacpp-0.11.0.dev1.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swarmauri_llm_llamacpp-0.11.0.dev1-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file swarmauri_llm_llamacpp-0.11.0.dev1.tar.gz.

File metadata

  • Download URL: swarmauri_llm_llamacpp-0.11.0.dev1.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_llm_llamacpp-0.11.0.dev1.tar.gz
Algorithm Hash digest
SHA256 e004464923197106c9dcce004eaef136356348bb08b8c88ffe9e99b978179e48
MD5 d7a5e616a65cb19435dbbfa4ea2422a8
BLAKE2b-256 e6bfc097814fc33d7bd9b2e67dece60494d0f9cdf864452b9328377a43e97542

See more details on using hashes here.

File details

Details for the file swarmauri_llm_llamacpp-0.11.0.dev1-py3-none-any.whl.

File metadata

  • Download URL: swarmauri_llm_llamacpp-0.11.0.dev1-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_llm_llamacpp-0.11.0.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 b75f1a57266719df905236058e68cd46d2b61d0ac29764c1446e76e0adab4bc1
MD5 a72b2e16caab0cb52657e019ec0c07a9
BLAKE2b-256 4939e9e95ad36403fde0f16bf249062c2987b5ca7f3905b7603f947e628cf403

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page