OpenTelemetry instrumentation for Ollama - local LLM runner
Project description
TraceAI Ollama Instrumentation
OpenTelemetry instrumentation for Ollama - the leading local LLM runner.
Installation
pip install traceai-ollama
Features
- Automatic tracing of Ollama API calls
- Support for chat, generate, and embed endpoints
- Streaming response support
- Token usage tracking
- Performance metrics (total_duration, eval_duration, etc.)
- Full OpenTelemetry semantic conventions compliance
Usage
Basic Setup
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from traceai_ollama import OllamaInstrumentor
import ollama
# Set up tracing
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
# Instrument Ollama
OllamaInstrumentor().instrument(tracer_provider=provider)
# Use Ollama - calls are automatically traced
response = ollama.chat(
model="llama3.2",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response["message"]["content"])
Chat Completions
import ollama
# Simple chat
response = ollama.chat(
model="llama3.2",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
)
# Multi-turn conversation
messages = [
{"role": "user", "content": "My name is Alice."},
{"role": "assistant", "content": "Hello Alice! Nice to meet you."},
{"role": "user", "content": "What's my name?"}
]
response = ollama.chat(model="llama3.2", messages=messages)
Streaming Responses
import ollama
# Streaming chat
stream = ollama.chat(
model="llama3.2",
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True
)
for chunk in stream:
print(chunk["message"]["content"], end="", flush=True)
Text Generation
import ollama
# Simple text generation
response = ollama.generate(
model="llama3.2",
prompt="The quick brown fox"
)
print(response["response"])
# With system prompt
response = ollama.generate(
model="llama3.2",
prompt="Write a haiku",
system="You are a poet."
)
Embeddings
import ollama
# Single embedding
response = ollama.embed(
model="nomic-embed-text",
input="Hello, world!"
)
print(f"Embedding dimensions: {len(response['embedding'])}")
# Multiple embeddings (batch)
response = ollama.embeddings(
model="nomic-embed-text",
prompt=["Hello", "World"]
)
Using Client Class
import ollama
# Create a client
client = ollama.Client(host="http://localhost:11434")
# Use the client
response = client.chat(
model="llama3.2",
messages=[{"role": "user", "content": "Hello!"}]
)
Async Support
import asyncio
import ollama
async def main():
client = ollama.AsyncClient()
# Async chat
response = await client.chat(
model="llama3.2",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response["message"]["content"])
# Async streaming
async for chunk in await client.chat(
model="llama3.2",
messages=[{"role": "user", "content": "Tell me a joke."}],
stream=True
):
print(chunk["message"]["content"], end="", flush=True)
asyncio.run(main())
Multimodal (Vision)
import ollama
# With image (base64 encoded)
response = ollama.chat(
model="llava",
messages=[
{
"role": "user",
"content": "What's in this image?",
"images": ["base64_encoded_image_data"]
}
]
)
Configuration Options
TraceConfig
from fi_instrumentation import TraceConfig
from traceai_ollama import OllamaInstrumentor
config = TraceConfig(
hide_inputs=False, # Set True to hide input content
hide_outputs=False, # Set True to hide output content
base64_image_max_length=100, # Max length for base64 images in traces
)
OllamaInstrumentor().instrument(
tracer_provider=provider,
config=config
)
Captured Attributes
Request Attributes
| Attribute | Description |
|---|---|
fi.span.kind |
"LLM" for chat/generate, "EMBEDDING" for embed |
llm.system |
"ollama" |
llm.provider |
"ollama" |
llm.model |
Model name (llama3.2, etc.) |
llm.input_messages.{n}.role |
Message role |
llm.input_messages.{n}.content |
Message content |
Response Attributes
| Attribute | Description |
|---|---|
llm.token_count.prompt |
Input token count (prompt_eval_count) |
llm.token_count.completion |
Output token count (eval_count) |
llm.token_count.total |
Total token count |
llm.output_messages.{n}.role |
Response role |
llm.output_messages.{n}.content |
Response content |
ollama.total_duration_ns |
Total request duration (nanoseconds) |
ollama.load_duration_ns |
Model load duration |
ollama.prompt_eval_duration_ns |
Prompt evaluation duration |
ollama.eval_duration_ns |
Response generation duration |
Supported Models
Any model available in Ollama can be traced, including:
| Model | Description |
|---|---|
llama3.2 |
Meta's Llama 3.2 |
mistral |
Mistral 7B |
mixtral |
Mixtral 8x7B |
codellama |
Code Llama |
llava |
LLaVA (vision) |
nomic-embed-text |
Nomic embeddings |
Real-World Use Cases
RAG Pipeline
import ollama
# Generate embedding for query
query = "What is machine learning?"
query_embedding = ollama.embed(
model="nomic-embed-text",
input=query
)
# Search vector database (not shown)
# context = search_vector_db(query_embedding["embedding"])
# Generate response with context
response = ollama.chat(
model="llama3.2",
messages=[
{"role": "system", "content": f"Use this context: {context}"},
{"role": "user", "content": query}
]
)
Code Assistant
import ollama
response = ollama.chat(
model="codellama",
messages=[
{"role": "system", "content": "You are a Python expert."},
{"role": "user", "content": "Write a function to calculate fibonacci numbers"}
]
)
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
traceai_ollama-0.1.0.tar.gz
(10.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file traceai_ollama-0.1.0.tar.gz.
File metadata
- Download URL: traceai_ollama-0.1.0.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa40f2b773d7ada3ab59d1af7ee56218280f70cf716bd4b578f407951676fa6d
|
|
| MD5 |
19e77bc830fee51945db1f84955270db
|
|
| BLAKE2b-256 |
75a74d38ba1fe677234902e7b97e55115a86237b881e8ba078d987fda65d13c3
|
File details
Details for the file traceai_ollama-0.1.0-py3-none-any.whl.
File metadata
- Download URL: traceai_ollama-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
446b836b2c396883e5a5ed14cdc579593147322a6b00b7476a7fe064270454b8
|
|
| MD5 |
b4100c36a0d294f45b586f14e7da058a
|
|
| BLAKE2b-256 |
6e59589b86ec408d818bfaceb448addc256b7630d705171293f467964960c857
|