Skip to main content

Custom CrewAI LLM provider for Ollama's native REST API — no OpenAI shim, NDJSON streaming, tool calling, cloud auth

Project description

CrewAI Ollama Cloud Provider

CI Ruff Python CrewAI License

A custom CrewAI LLM provider that speaks native Ollama protocolPOST /api/chat with NDJSON streaming. No OpenAI shim, no LiteLLM, no proxy needed. Works with local Ollama, self-hosted instances, and ollama.com Cloud API.

Why?

CrewAI's built-in Ollama support routes through the OpenAI-compatible shim (/v1/chat/completions). This provider talks the real Ollama protocol/api/chat with native JSON, NDJSON streaming, and Ollama's native tool calling and thinking formats.

If you're running Ollama Cloud models (gpt-oss:120b-cloud, kimi-k2.6-cloud, etc.) or just want direct API access without translation layers, this is for you.

Features

Feature Support
Native /api/chat ✅ real Ollama protocol, not OpenAI-compatible
NDJSON streaming ✅ token-by-token, thinking/reasoning tokens
Tool calling ✅ native Ollama tool calls (v0.3+)
Structured output ✅ JSON schema via format parameter
Thinking models think parameter for DeepSeek-R1, Kimi, etc.
Cloud auth Authorization: Bearer for ollama.com
Model discovery list_ollama_models()
Config overrides ✅ runtime temperature, max_tokens, etc.
Context windows ✅ auto-detection for popular models
Stop words options.stop
Keep alive keep_alive parameter
Multimodal ✅ image support for vision models
CrewAI events ✅ full observability integration

Installation

pip install crewai-ollama-cloud

Requires: Python ≥3.10, CrewAI ≥0.80.0, httpx ≥0.25.0

Environment Setup

# Optional: set your Ollama Cloud API key
export OLLAMA_API_KEY="sk-xxxx"

For local Ollama, no API key is needed.

Quick Start

from crewai import Agent, Task, Crew
from crewai_ollama_cloud import OllamaCloudProvider

# Ollama Cloud
llm = OllamaCloudProvider(
    model="deepseek-v4-flash",
    base_url="https://ollama.com",
    api_key="sk-xxxx",  # or set OLLAMA_API_KEY env var
    temperature=0.7,
    stream=True,
)

# Or local Ollama
# llm = OllamaCloudProvider(model="llama3.1:8b", base_url="http://localhost:11434")

agent = Agent(role="Analyst", goal="Analyze data", backstory="Expert", llm=llm)
task = Task(description="Summarize Q1 report", expected_output="Summary")
crew = Crew(agents=[agent], tasks=[task])

result = crew.kickoff()
print(result)

Configuration Reference

Constructor Parameters

Parameter Type Default Description
model str (required) Ollama model name (e.g. "llama3.1:8b", "deepseek-v4-flash")
base_url str "http://localhost:11434" Ollama host URL (no trailing /v1)
api_key str or None env OLLAMA_API_KEY API key for cloud instances
temperature float or None None Sampling temperature (0–2)
max_tokens int or None None Max tokens to generate
top_p float or None None Nucleus sampling
top_k int or None None Top-k sampling
stop list[str] [] Stop sequences
stream bool False Enable NDJSON streaming
timeout float 120.0 HTTP timeout in seconds
keep_alive str "5m" Model keep-alive duration
think bool False Enable thinking/reasoning tokens
additional_params dict {} Extra parameters merged into request body

Ollama Parameter Mapping

When calling the API, CrewAI parameters are mapped to Ollama's native format:

CrewAI field Ollama request field
temperature options.temperature
max_tokens options.num_predict
top_p options.top_p
top_k options.top_k
stop options.stop
think think (top-level)
response_model format (JSON schema)
keep_alive keep_alive (top-level)

Runtime Overrides

All configuration fields can be changed at runtime between calls:

llm = OllamaCloudProvider(model="llama3.1:8b", temperature=0.3)

# Warm up: creative mode
llm.temperature = 0.9
result = llm.call("Write a poem")

# Switch to precise mode for next call
llm.temperature = 0.1
llm.top_p = 0.95
result = llm.call("Calculate 2+2")

Model Discovery

from crewai_ollama_cloud import list_ollama_models, OllamaModelInfo

# List models on a local GPU rig
models = list_ollama_models("http://localhost:11434")

# List cloud models
models = list_ollama_models("https://ollama.com", api_key="sk-xxxx")

for m in models:
    print(f"{m.name:35s} | {m.parameter_size:6s} | {m.family:10s} | {m.size_gb:5.1f} GB")
# Output:
# llama3.1:8b                         | 8b     | llama      |  4.7 GB
# mistral:7b                          | 7b     | mistral    |  4.1 GB
# deepseek-v4-flash                   | 70b    | deepseek   | 40.5 GB

The OllamaModelInfo object contains:

Attribute Type Description
name str Full model name
digest str SHA256 digest
size int Size in bytes
modified_at str or None Last modified timestamp
family str Inferred model family
parameter_size str Parameter count (e.g. "8b", "70b")
size_gb float Size in gigabytes

Environment Variables

Variable Description
OLLAMA_API_KEY API key for authenticated Ollama instances (e.g. cloud)

Stream Output

When stream=True, the provider uses Ollama's native NDJSON streaming. Tokens are emitted via CrewAI's LLMStreamChunkEvent:

llm = OllamaCloudProvider(model="llama3.1:8b", stream=True)

# Each token triggers a stream chunk event
result = llm.call("Tell me about black holes")
# Events:
#   chunk: "Black"
#   chunk: " holes"
#   chunk: " are"
#   ...

For thinking models (think=True, like deepseek-r1), reasoning tokens are separated from final output and emitted as thinking chunk events.

Tool Calling

Ollama v0.3+ supports native tool calling. The provider converts CrewAI BaseTool objects to Ollama's native tool format:

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get weather for a city",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {"type": "string", "description": "City name"}
      },
      "required": ["city"]
    }
  }
}

Tool execution results are returned directly.

Structured Output

To get JSON responses, use response_model:

from pydantic import BaseModel

class Summary(BaseModel):
    key_points: list[str]
    sentiment: str

llm = OllamaCloudProvider(model="llama3.1:8b", temperature=0)
result = llm.call("Analyze Q3 results", response_model=Summary)
# result.key_points = ["Revenue up 15%", ...]
# result.sentiment = "positive"

Context Windows

The provider auto-detects context window sizes for known models:

Model Context Size
llama3:70b 8,192
llama3.1:8b 131,072
llama3.1:70b 131,072
llama3.1:405b 131,072
llama3.2:1b/3b 131,072
llama3.3:70b 131,072
mistral:7b 8,192
mixtral:8x7b 32,768
qwen2.5:7b/32b 32,768
deepseek-r1:7b/8b 131,072
Unknown models 4,096 (default)

Error Handling

Error Provider Behavior
HTTP 4xx/5xx HTTPStatusErrorLLMCallFailedEvent
Context overflow LLMContextLengthExceededError (CrewAI native)
Connection failure ExceptionLLMCallFailedEvent

Architecture

┌────────────────┐
│  CrewAI Agent  │
└───────┬────────┘
        │ Agent.llm.call(messages, tools, ...)
        ▼
┌─────────────────────────────┐
│  OllamaCloudProvider        │
│  (extends BaseLLM)          │
│                             │
│  call() / acall()           │
│   ├─ _format_messages()     │
│   ├─ _build_body()          │
│   ├─ BEFORE hooks           │
│   ├─ httpx POST /api/chat   │───────┐
│   ├─ _process_response()    │       │
│   ├─ AFTER hooks            │       │
│   └─ event emission         │       │
└─────────────────────────────┘       │
                                      ▼
                            ┌─────────────────┐
                            │  Ollama Instance │
                            │  (local/remote)  │
                            │                 │
                            │  POST /api/chat  │
                            │  ← JSON / NDJSON │
                            └─────────────────┘

Zero translation layers. httpx → /api/chat → Ollama. That's the whole call path.

Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

39 tests cover: initialization, capabilities, request body building, non-streaming calls, streaming calls with thinking tokens, tool calls, stop words, context overflow handling, auth headers, async call delegation, model discovery.

License

MIT — see LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crewai_ollama_cloud-0.1.0.tar.gz (298.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crewai_ollama_cloud-0.1.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file crewai_ollama_cloud-0.1.0.tar.gz.

File metadata

  • Download URL: crewai_ollama_cloud-0.1.0.tar.gz
  • Upload date:
  • Size: 298.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crewai_ollama_cloud-0.1.0.tar.gz
Algorithm Hash digest
SHA256 892123d87d6dccdc9d7026b36b107aef73914423175665ac030675401854aff2
MD5 bbf280c477622d2e3d580c6000e0f1c8
BLAKE2b-256 557d3b76339e38e9e292cc0d5056f372600c4ffd96212014e8413b275d468424

See more details on using hashes here.

File details

Details for the file crewai_ollama_cloud-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: crewai_ollama_cloud-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for crewai_ollama_cloud-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ab1ebf3f6c44f870fe4efe2292263d2b6e29dfb702824bb631f6ed89c86311b9
MD5 97494a43d37678bbe7be3a3c3e83864d
BLAKE2b-256 b2091fdfe5cc1fe54483d6ba5cbdb1294535dcdde270ff3e65004573aba6d8b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page