Custom CrewAI LLM provider for Ollama's native REST API — no OpenAI shim, NDJSON streaming, tool calling, cloud auth
Project description
CrewAI Ollama Cloud Provider
A custom CrewAI LLM provider that speaks native Ollama protocol —
POST /api/chat with NDJSON streaming. No OpenAI shim, no LiteLLM, no proxy needed.
Works with local Ollama, self-hosted instances, and ollama.com Cloud API.
Why?
CrewAI's built-in Ollama support routes through the OpenAI-compatible shim
(/v1/chat/completions). This provider talks the real Ollama protocol —
/api/chat with native JSON, NDJSON streaming, and Ollama's native tool
calling and thinking formats.
If you're running Ollama Cloud models (gpt-oss:120b-cloud, kimi-k2.6-cloud,
etc.) or just want direct API access without translation layers, this is for you.
Features
| Feature | Support |
|---|---|
Native /api/chat |
✅ real Ollama protocol, not OpenAI-compatible |
| NDJSON streaming | ✅ token-by-token, thinking/reasoning tokens |
| Tool calling | ✅ native Ollama tool calls (v0.3+) |
| Structured output | ✅ JSON schema via format parameter |
| Thinking models | ✅ think parameter for DeepSeek-R1, Kimi, etc. |
| Cloud auth | ✅ Authorization: Bearer for ollama.com |
| Model discovery | ✅ list_ollama_models() |
| Config overrides | ✅ runtime temperature, max_tokens, etc. |
| Context windows | ✅ auto-detection for popular models |
| Stop words | ✅ options.stop |
| Keep alive | ✅ keep_alive parameter |
| Multimodal | ✅ image support for vision models |
| CrewAI events | ✅ full observability integration |
Installation
pip install crewai-ollama-cloud
Requires: Python ≥3.10, CrewAI ≥0.80.0, httpx ≥0.25.0
Environment Setup
# Optional: set your Ollama Cloud API key
export OLLAMA_API_KEY="sk-xxxx"
For local Ollama, no API key is needed.
Quick Start
from crewai import Agent, Task, Crew
from crewai_ollama_cloud import OllamaCloudProvider
# Ollama Cloud
llm = OllamaCloudProvider(
model="deepseek-v4-flash",
base_url="https://ollama.com",
api_key="sk-xxxx", # or set OLLAMA_API_KEY env var
temperature=0.7,
stream=True,
)
# Or local Ollama
# llm = OllamaCloudProvider(model="llama3.1:8b", base_url="http://localhost:11434")
agent = Agent(role="Analyst", goal="Analyze data", backstory="Expert", llm=llm)
task = Task(description="Summarize Q1 report", expected_output="Summary")
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()
print(result)
Configuration Reference
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
str |
(required) | Ollama model name (e.g. "llama3.1:8b", "deepseek-v4-flash") |
base_url |
str |
"http://localhost:11434" |
Ollama host URL (no trailing /v1) |
api_key |
str or None |
env OLLAMA_API_KEY |
API key for cloud instances |
temperature |
float or None |
None |
Sampling temperature (0–2) |
max_tokens |
int or None |
None |
Max tokens to generate |
top_p |
float or None |
None |
Nucleus sampling |
top_k |
int or None |
None |
Top-k sampling |
stop |
list[str] |
[] |
Stop sequences |
stream |
bool |
False |
Enable NDJSON streaming |
timeout |
float |
120.0 |
HTTP timeout in seconds |
keep_alive |
str |
"5m" |
Model keep-alive duration |
think |
bool |
False |
Enable thinking/reasoning tokens |
additional_params |
dict |
{} |
Extra parameters merged into request body |
Ollama Parameter Mapping
When calling the API, CrewAI parameters are mapped to Ollama's native format:
| CrewAI field | Ollama request field |
|---|---|
temperature |
options.temperature |
max_tokens |
options.num_predict |
top_p |
options.top_p |
top_k |
options.top_k |
stop |
options.stop |
think |
think (top-level) |
response_model |
format (JSON schema) |
keep_alive |
keep_alive (top-level) |
Runtime Overrides
All configuration fields can be changed at runtime between calls:
llm = OllamaCloudProvider(model="llama3.1:8b", temperature=0.3)
# Warm up: creative mode
llm.temperature = 0.9
result = llm.call("Write a poem")
# Switch to precise mode for next call
llm.temperature = 0.1
llm.top_p = 0.95
result = llm.call("Calculate 2+2")
Model Discovery
from crewai_ollama_cloud import list_ollama_models, OllamaModelInfo
# List models on a local GPU rig
models = list_ollama_models("http://localhost:11434")
# List cloud models
models = list_ollama_models("https://ollama.com", api_key="sk-xxxx")
for m in models:
print(f"{m.name:35s} | {m.parameter_size:6s} | {m.family:10s} | {m.size_gb:5.1f} GB")
# Output:
# llama3.1:8b | 8b | llama | 4.7 GB
# mistral:7b | 7b | mistral | 4.1 GB
# deepseek-v4-flash | 70b | deepseek | 40.5 GB
The OllamaModelInfo object contains:
| Attribute | Type | Description |
|---|---|---|
name |
str |
Full model name |
digest |
str |
SHA256 digest |
size |
int |
Size in bytes |
modified_at |
str or None |
Last modified timestamp |
family |
str |
Inferred model family |
parameter_size |
str |
Parameter count (e.g. "8b", "70b") |
size_gb |
float |
Size in gigabytes |
Environment Variables
| Variable | Description |
|---|---|
OLLAMA_API_KEY |
API key for authenticated Ollama instances (e.g. cloud) |
Stream Output
When stream=True, the provider uses Ollama's native NDJSON streaming.
Tokens are emitted via CrewAI's LLMStreamChunkEvent:
llm = OllamaCloudProvider(model="llama3.1:8b", stream=True)
# Each token triggers a stream chunk event
result = llm.call("Tell me about black holes")
# Events:
# chunk: "Black"
# chunk: " holes"
# chunk: " are"
# ...
For thinking models (think=True, like deepseek-r1), reasoning tokens
are separated from final output and emitted as thinking chunk events.
Tool Calling
Ollama v0.3+ supports native tool calling. The provider converts CrewAI
BaseTool objects to Ollama's native tool format:
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
Tool execution results are returned directly.
Structured Output
To get JSON responses, use response_model:
from pydantic import BaseModel
class Summary(BaseModel):
key_points: list[str]
sentiment: str
llm = OllamaCloudProvider(model="llama3.1:8b", temperature=0)
result = llm.call("Analyze Q3 results", response_model=Summary)
# result.key_points = ["Revenue up 15%", ...]
# result.sentiment = "positive"
Context Windows
The provider auto-detects context window sizes for known models:
| Model | Context Size |
|---|---|
| llama3:70b | 8,192 |
| llama3.1:8b | 131,072 |
| llama3.1:70b | 131,072 |
| llama3.1:405b | 131,072 |
| llama3.2:1b/3b | 131,072 |
| llama3.3:70b | 131,072 |
| mistral:7b | 8,192 |
| mixtral:8x7b | 32,768 |
| qwen2.5:7b/32b | 32,768 |
| deepseek-r1:7b/8b | 131,072 |
| Unknown models | 4,096 (default) |
Error Handling
| Error | Provider Behavior |
|---|---|
| HTTP 4xx/5xx | HTTPStatusError → LLMCallFailedEvent |
| Context overflow | LLMContextLengthExceededError (CrewAI native) |
| Connection failure | Exception → LLMCallFailedEvent |
Architecture
┌────────────────┐
│ CrewAI Agent │
└───────┬────────┘
│ Agent.llm.call(messages, tools, ...)
▼
┌─────────────────────────────┐
│ OllamaCloudProvider │
│ (extends BaseLLM) │
│ │
│ call() / acall() │
│ ├─ _format_messages() │
│ ├─ _build_body() │
│ ├─ BEFORE hooks │
│ ├─ httpx POST /api/chat │───────┐
│ ├─ _process_response() │ │
│ ├─ AFTER hooks │ │
│ └─ event emission │ │
└─────────────────────────────┘ │
▼
┌─────────────────┐
│ Ollama Instance │
│ (local/remote) │
│ │
│ POST /api/chat │
│ ← JSON / NDJSON │
└─────────────────┘
Zero translation layers. httpx → /api/chat → Ollama. That's the whole call path.
Testing
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
39 tests cover: initialization, capabilities, request body building, non-streaming calls, streaming calls with thinking tokens, tool calls, stop words, context overflow handling, auth headers, async call delegation, model discovery.
License
MIT — see LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crewai_ollama_cloud-0.1.0.tar.gz.
File metadata
- Download URL: crewai_ollama_cloud-0.1.0.tar.gz
- Upload date:
- Size: 298.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
892123d87d6dccdc9d7026b36b107aef73914423175665ac030675401854aff2
|
|
| MD5 |
bbf280c477622d2e3d580c6000e0f1c8
|
|
| BLAKE2b-256 |
557d3b76339e38e9e292cc0d5056f372600c4ffd96212014e8413b275d468424
|
File details
Details for the file crewai_ollama_cloud-0.1.0-py3-none-any.whl.
File metadata
- Download URL: crewai_ollama_cloud-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab1ebf3f6c44f870fe4efe2292263d2b6e29dfb702824bb631f6ed89c86311b9
|
|
| MD5 |
97494a43d37678bbe7be3a3c3e83864d
|
|
| BLAKE2b-256 |
b2091fdfe5cc1fe54483d6ba5cbdb1294535dcdde270ff3e65004573aba6d8b9
|