Provider-specific Swarmauri import package for local llama.cpp OpenAI-compatible chat, streaming, async, and batch LLM workflows.

These details have not been verified by PyPI

Project description

Swarmauri Logo

Swarmauri llama.cpp LLM

swarmauri_llm_llamacpp provides the provider-specific Swarmauri import package for LlamaCppModel. It is intended for local or self-hosted llama.cpp deployments that expose an OpenAI-compatible API, not a hosted SaaS provider.

The runtime delegates to swarmauri_standard.llms.LlamaCppModel, which talks to a local llama.cpp server at http://localhost:8080/v1 by default, discovers models from /models, and sends chat completion requests to /chat/completions.

Why Use This Package?

Keep local llama.cpp model usage explicit in Swarmauri applications.
Run self-hosted LLM inference through the same Conversation workflow used by other Swarmauri provider packages.
Support sync, async, streaming, and batch execution against an OpenAI-compatible local endpoint.
Avoid coupling local model runtime choices to hosted-provider packages.

FAQ

What does `swarmauri_llm_llamacpp` install?

It installs the LlamaCppModel provider package entry point under swarmauri.llms.

Does this package download or bundle a model?

No. You must run your own llama.cpp server and make at least one model available through its OpenAI-compatible API.

Which endpoint does the runtime call?

By default the underlying runtime targets http://localhost:8080/v1, queries /models for model discovery, and calls /chat/completions for inference.

Does it require an API key?

Usually no for local development. The runtime can include an API key if your self-hosted endpoint is configured to require one.

Does it support streaming and async workflows?

Yes. LlamaCppModel supports predict, apredict, stream, astream, batch, and abatch.

Where should I verify model pricing?

There is no provider pricing surface for llama.cpp itself. Cost depends on your hardware, hosting, and model selection. See docs/LLM_PROVIDER_MODEL_PRICING_LINKS.md for the project-level note on local-runtime pricing.

Features

LlamaCppModel import package for local or self-hosted llama.cpp inference.
OpenAI-compatible chat completion requests against a local llama.cpp server.
Automatic model discovery via the /models endpoint.
Sync, async, streaming, and batch generation workflows.
Optional JSON response mode and stop-sequence support.
Compatibility with Python 3.10, 3.11, 3.12, 3.13, and 3.14.

Installation

uv add swarmauri_llm_llamacpp

pip install swarmauri_llm_llamacpp

Usage

Start a llama.cpp server that exposes an OpenAI-compatible API before creating the model.

Basic Chat Completion

from swarmauri_llm_llamacpp import LlamaCppModel
from swarmauri_standard.conversations.Conversation import Conversation
from swarmauri_standard.messages.HumanMessage import HumanMessage

conversation = Conversation()
conversation.add_message(HumanMessage(content="Explain Swarmauri in one paragraph."))

model = LlamaCppModel()
result = model.predict(conversation=conversation, max_tokens=200)

print(result.get_last().content)

Streaming

from swarmauri_llm_llamacpp import LlamaCppModel
from swarmauri_standard.conversations.Conversation import Conversation
from swarmauri_standard.messages.HumanMessage import HumanMessage

conversation = Conversation()
conversation.add_message(HumanMessage(content="Write a short haiku about local inference."))

model = LlamaCppModel()

for token in model.stream(conversation=conversation):
    print(token, end="", flush=True)

Async

import asyncio

from swarmauri_llm_llamacpp import LlamaCppModel
from swarmauri_standard.conversations.Conversation import Conversation
from swarmauri_standard.messages.HumanMessage import HumanMessage


async def main() -> None:
    conversation = Conversation()
    conversation.add_message(HumanMessage(content="List three reasons to self-host an LLM."))

    model = LlamaCppModel()
    result = await model.apredict(conversation=conversation)
    print(result.get_last().content)


# asyncio.run(main())

Examples

Use LlamaCppModel when you want Swarmauri agents to run against a local llama.cpp server instead of a remote provider.
Use stream or astream when the UI should render tokens as they are produced by the local model.
Use enable_json=True when your downstream logic expects structured JSON output from the model.

Related Packages

Foundational Swarmauri Packages

Best Practices

Keep your llama.cpp server configuration aligned with the OpenAI-compatible routes this runtime expects.
Confirm the local server is reachable before starting agent workflows that depend on it.
Use self-hosted model names returned by /models instead of hard-coding assumptions about local model IDs.
Tune temperature, max_tokens, stop, and enable_json to match your application contract.

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.11.0.dev1 pre-release

Jun 30, 2026

0.1.1.dev3 pre-release

May 20, 2026

0.1.1.dev2 pre-release

May 20, 2026

0.1.0

Mar 24, 2026

0.1.0.dev10 pre-release

Mar 23, 2026

0.1.0.dev8 pre-release

Mar 20, 2026

0.1.0.dev7 pre-release

Mar 20, 2026

0.1.0.dev6 pre-release

Mar 20, 2026

0.1.0.dev5 pre-release

Mar 20, 2026

0.1.0.dev4 pre-release

Mar 20, 2026

0.1.0.dev3 pre-release

Mar 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarmauri_llm_llamacpp-0.11.0.dev1.tar.gz (8.0 kB view details)

Uploaded Jun 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

swarmauri_llm_llamacpp-0.11.0.dev1-py3-none-any.whl (8.8 kB view details)

Uploaded Jun 30, 2026 Python 3

File details

Details for the file swarmauri_llm_llamacpp-0.11.0.dev1.tar.gz.

File metadata

Download URL: swarmauri_llm_llamacpp-0.11.0.dev1.tar.gz
Upload date: Jun 30, 2026
Size: 8.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_llm_llamacpp-0.11.0.dev1.tar.gz
Algorithm	Hash digest
SHA256	`e004464923197106c9dcce004eaef136356348bb08b8c88ffe9e99b978179e48`
MD5	`d7a5e616a65cb19435dbbfa4ea2422a8`
BLAKE2b-256	`e6bfc097814fc33d7bd9b2e67dece60494d0f9cdf864452b9328377a43e97542`

See more details on using hashes here.

File details

Details for the file swarmauri_llm_llamacpp-0.11.0.dev1-py3-none-any.whl.

File metadata

Download URL: swarmauri_llm_llamacpp-0.11.0.dev1-py3-none-any.whl
Upload date: Jun 30, 2026
Size: 8.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_llm_llamacpp-0.11.0.dev1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b75f1a57266719df905236058e68cd46d2b61d0ac29764c1446e76e0adab4bc1`
MD5	`a72b2e16caab0cb52657e019ec0c07a9`
BLAKE2b-256	`4939e9e95ad36403fde0f16bf249062c2987b5ca7f3905b7603f947e628cf403`

See more details on using hashes here.

swarmauri_llm_llamacpp 0.11.0.dev1

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

Swarmauri llama.cpp LLM

Why Use This Package?

FAQ

What does swarmauri_llm_llamacpp install?

Does this package download or bundle a model?

Which endpoint does the runtime call?

Does it require an API key?

Does it support streaming and async workflows?

Where should I verify model pricing?

Features

Installation

Usage

Basic Chat Completion

Streaming

Async

Examples

Related Packages

Foundational Swarmauri Packages

More Documentation

Best Practices

License

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

What does `swarmauri_llm_llamacpp` install?