Custom CrewAI LLM provider for Ollama's native REST API — no OpenAI shim, NDJSON streaming, tool calling, cloud auth

These details have not been verified by PyPI

Project links

Project description

CrewAI Ollama Cloud Provider

A custom CrewAI LLM provider that speaks native Ollama protocol — POST /api/chat with NDJSON streaming. No OpenAI shim, no LiteLLM, no proxy needed. Works with local Ollama, self-hosted instances, and ollama.com Cloud API.

Why?

CrewAI's built-in Ollama support routes through the OpenAI-compatible shim (/v1/chat/completions). This provider talks the real Ollama protocol — /api/chat with native JSON, NDJSON streaming, and Ollama's native tool calling and thinking formats.

If you're running Ollama Cloud models (gpt-oss:120b-cloud, kimi-k2.6-cloud, etc.) or just want direct API access without translation layers, this is for you.

Features

Feature	Support
Native `/api/chat`	✅ real Ollama protocol, not OpenAI-compatible
NDJSON streaming	✅ token-by-token, thinking/reasoning tokens
Tool calling	✅ native Ollama tool calls (v0.3+)
Structured output	✅ JSON schema via `format` parameter
Thinking models	✅ `think` parameter for DeepSeek-R1, Kimi, etc.
Cloud auth	✅ `Authorization: Bearer` for ollama.com
Model discovery	✅ `list_ollama_models()`
Config overrides	✅ runtime temperature, max_tokens, etc.
Context windows	✅ auto-detection for popular models
Stop words	✅ `options.stop`
Keep alive	✅ `keep_alive` parameter
Multimodal	✅ image support for vision models
CrewAI events	✅ full observability integration

Installation

pip install crewai-ollama-cloud

Requires: Python ≥3.10, CrewAI ≥0.80.0, httpx ≥0.25.0

Environment Setup

# Optional: set your Ollama Cloud API key
export OLLAMA_API_KEY="sk-xxxx"

For local Ollama, no API key is needed.

Quick Start

from crewai import Agent, Task, Crew
from crewai_ollama_cloud import OllamaCloudProvider

# Ollama Cloud
llm = OllamaCloudProvider(
    model="deepseek-v4-flash",
    base_url="https://ollama.com",
    api_key="sk-xxxx",  # or set OLLAMA_API_KEY env var
    temperature=0.7,
    stream=True,
)

# Or local Ollama
# llm = OllamaCloudProvider(model="llama3.1:8b", base_url="http://localhost:11434")

agent = Agent(role="Analyst", goal="Analyze data", backstory="Expert", llm=llm)
task = Task(description="Summarize Q1 report", expected_output="Summary")
crew = Crew(agents=[agent], tasks=[task])

result = crew.kickoff()
print(result)

Configuration Reference

Constructor Parameters

Parameter	Type	Default	Description
`model`	`str`	(required)	Ollama model name (e.g. `"llama3.1:8b"`, `"deepseek-v4-flash"`)
`base_url`	`str`	`"http://localhost:11434"`	Ollama host URL (no trailing `/v1`)
`api_key`	`str` or `None`	env `OLLAMA_API_KEY`	API key for cloud instances
`temperature`	`float` or `None`	`None`	Sampling temperature (0–2)
`max_tokens`	`int` or `None`	`None`	Max tokens to generate
`top_p`	`float` or `None`	`None`	Nucleus sampling
`top_k`	`int` or `None`	`None`	Top-k sampling
`stop`	`list[str]`	`[]`	Stop sequences
`stream`	`bool`	`False`	Enable NDJSON streaming
`timeout`	`float`	`120.0`	HTTP timeout in seconds
`keep_alive`	`str`	`"5m"`	Model keep-alive duration
`think`	`bool`	`False`	Enable thinking/reasoning tokens
`additional_params`	`dict`	`{}`	Extra parameters merged into request body

Ollama Parameter Mapping

When calling the API, CrewAI parameters are mapped to Ollama's native format:

CrewAI field	Ollama request field
`temperature`	`options.temperature`
`max_tokens`	`options.num_predict`
`top_p`	`options.top_p`
`top_k`	`options.top_k`
`stop`	`options.stop`
`think`	`think` (top-level)
`response_model`	`format` (JSON schema)
`keep_alive`	`keep_alive` (top-level)

Runtime Overrides

All configuration fields can be changed at runtime between calls:

llm = OllamaCloudProvider(model="llama3.1:8b", temperature=0.3)

# Warm up: creative mode
llm.temperature = 0.9
result = llm.call("Write a poem")

# Switch to precise mode for next call
llm.temperature = 0.1
llm.top_p = 0.95
result = llm.call("Calculate 2+2")

Model Discovery

from crewai_ollama_cloud import list_ollama_models, OllamaModelInfo

# List models on a local GPU rig
models = list_ollama_models("http://localhost:11434")

# List cloud models
models = list_ollama_models("https://ollama.com", api_key="sk-xxxx")

for m in models:
    print(f"{m.name:35s} | {m.parameter_size:6s} | {m.family:10s} | {m.size_gb:5.1f} GB")
# Output:
# llama3.1:8b                         | 8b     | llama      |  4.7 GB
# mistral:7b                          | 7b     | mistral    |  4.1 GB
# deepseek-v4-flash                   | 70b    | deepseek   | 40.5 GB

The OllamaModelInfo object contains:

Attribute	Type	Description
`name`	`str`	Full model name
`digest`	`str`	SHA256 digest
`size`	`int`	Size in bytes
`modified_at`	`str` or `None`	Last modified timestamp
`family`	`str`	Inferred model family
`parameter_size`	`str`	Parameter count (e.g. `"8b"`, `"70b"`)
`size_gb`	`float`	Size in gigabytes

Environment Variables

Variable	Description
`OLLAMA_API_KEY`	API key for authenticated Ollama instances (e.g. cloud)

Stream Output

When stream=True, the provider uses Ollama's native NDJSON streaming. Tokens are emitted via CrewAI's LLMStreamChunkEvent:

llm = OllamaCloudProvider(model="llama3.1:8b", stream=True)

# Each token triggers a stream chunk event
result = llm.call("Tell me about black holes")
# Events:
#   chunk: "Black"
#   chunk: " holes"
#   chunk: " are"
#   ...

For thinking models (think=True, like deepseek-r1), reasoning tokens are separated from final output and emitted as thinking chunk events.

Tool Calling

Ollama v0.3+ supports native tool calling. The provider converts CrewAI BaseTool objects to Ollama's native tool format:

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get weather for a city",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {"type": "string", "description": "City name"}
      },
      "required": ["city"]
    }
  }
}

Tool execution results are returned directly.

Structured Output

To get JSON responses, use response_model:

from pydantic import BaseModel

class Summary(BaseModel):
    key_points: list[str]
    sentiment: str

llm = OllamaCloudProvider(model="llama3.1:8b", temperature=0)
result = llm.call("Analyze Q3 results", response_model=Summary)
# result.key_points = ["Revenue up 15%", ...]
# result.sentiment = "positive"

Context Windows

The provider auto-detects context window sizes for known models:

Model	Context Size
llama3:70b	8,192
llama3.1:8b	131,072
llama3.1:70b	131,072
llama3.1:405b	131,072
llama3.2:1b/3b	131,072
llama3.3:70b	131,072
mistral:7b	8,192
mixtral:8x7b	32,768
qwen2.5:7b/32b	32,768
deepseek-r1:7b/8b	131,072
Unknown models	4,096 (default)

Error Handling

Error	Provider Behavior
HTTP 4xx/5xx	`HTTPStatusError` → `LLMCallFailedEvent`
Context overflow	`LLMContextLengthExceededError` (CrewAI native)
Connection failure	`Exception` → `LLMCallFailedEvent`

Architecture

┌────────────────┐
│  CrewAI Agent  │
└───────┬────────┘
        │ Agent.llm.call(messages, tools, ...)
        ▼
┌─────────────────────────────┐
│  OllamaCloudProvider        │
│  (extends BaseLLM)          │
│                             │
│  call() / acall()           │
│   ├─ _format_messages()     │
│   ├─ _build_body()          │
│   ├─ BEFORE hooks           │
│   ├─ httpx POST /api/chat   │───────┐
│   ├─ _process_response()    │       │
│   ├─ AFTER hooks            │       │
│   └─ event emission         │       │
└─────────────────────────────┘       │
                                      ▼
                            ┌─────────────────┐
                            │  Ollama Instance │
                            │  (local/remote)  │
                            │                 │
                            │  POST /api/chat  │
                            │  ← JSON / NDJSON │
                            └─────────────────┘

Zero translation layers. httpx → /api/chat → Ollama. That's the whole call path.

Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

39 tests cover: initialization, capabilities, request body building, non-streaming calls, streaming calls with thinking tokens, tool calls, stop words, context overflow handling, auth headers, async call delegation, model discovery.

License

MIT — see LICENSE file.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

Jun 5, 2026

0.2.0

Jun 5, 2026

0.1.0

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crewai_ollama_cloud-0.2.1.tar.gz (300.5 kB view details)

Uploaded Jun 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crewai_ollama_cloud-0.2.1-py3-none-any.whl (14.1 kB view details)

Uploaded Jun 5, 2026 Python 3

File details

Details for the file crewai_ollama_cloud-0.2.1.tar.gz.

File metadata

Download URL: crewai_ollama_cloud-0.2.1.tar.gz
Upload date: Jun 5, 2026
Size: 300.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crewai_ollama_cloud-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`b2feb0203d0f292dd7231ad54fbded34a4dd8d18074f337c1e9e5cc5eceacb10`
MD5	`43c40e3c012371bfbc22facfc2ad51f8`
BLAKE2b-256	`064aa4212360c6fbdb786d3c1332ff87c6805761a87324d9c45a693479c6ff7f`

See more details on using hashes here.

File details

Details for the file crewai_ollama_cloud-0.2.1-py3-none-any.whl.

File metadata

Download URL: crewai_ollama_cloud-0.2.1-py3-none-any.whl
Upload date: Jun 5, 2026
Size: 14.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crewai_ollama_cloud-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd201dc499f829d2adc9dcc4182c12e3892b3da67e6777dd0c7e8303c04d2931`
MD5	`38126ce410d7b6c5fc10073abfe6d9cb`
BLAKE2b-256	`ab585f219d0130ccd06e49b6e6454d6d47133e325ab41799b4f537ead85b3669`

See more details on using hashes here.

crewai-ollama-cloud 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CrewAI Ollama Cloud Provider

Why?

Features

Installation

Environment Setup

Quick Start

Configuration Reference

Constructor Parameters

Ollama Parameter Mapping

Runtime Overrides

Model Discovery

Environment Variables

Stream Output

Tool Calling

Structured Output

Context Windows

Error Handling

Architecture

Testing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes