Skip to main content

Exemplar harness eval SDK — ingest framework callbacks to the Exemplar platform API

Project description

exemplar-harness-sdk

Python SDK for ingesting framework-native agent telemetry into Exemplar harness eval.

Multiple framework integrations — LangChain, LangGraph, OpenAI, Anthropic, Google GenAI SDK, Google ADK, and more. See Framework integrations.

Install

pip install exemplar-harness-sdk

# One framework
pip install "exemplar-harness-sdk[langchain]"

# Everything
pip install "exemplar-harness-sdk[all]"

Licensed for non-commercial use only. Commercial use requires a separate license from Exemplar Dev LLC. See LICENSE.


Quick start

Follow these steps to send your first agent turn to Exemplar harness eval. The walkthrough uses LangChain; other frameworks follow the same pattern.

Step 1 — Install

pip install "exemplar-harness-sdk[langchain]"

Need a different framework? Install the matching extra instead (e.g. [anthropic], [openai], [google-adk]). See optional extras.

Step 2 — Configure credentials

Set your Exemplar org API key:

export EXEMPLAR_API_KEY="eis_your_org_api_key"
Variable Purpose
EXEMPLAR_API_KEY Org API key (required)

The SDK sends requests to https://production-api.exemplar.dev.

Step 3 — Create a Harness client

from exemplar_harness import Harness

harness = Harness.from_env()

Or pass the API key explicitly:

from exemplar_harness import Harness

harness = Harness(api_key="eis_...")

Pick one session_id per conversation — all turns with the same ID are grouped into a single eval session.

Step 4 — Wire the LangChain integration

Create a callback handler and pass it to your LLM or chain:

from exemplar_harness.integrations.langchain import make_langchain_callback_handler

SESSION_ID = "support-sess-001"  # reuse for every turn in this chat

handler = make_langchain_callback_handler(
    harness,
    session_id=SESSION_ID,
    chain_name="support-bot",
    source_app="my-app",
)

Each completed LLM call invokes Harness.ingest() automatically — you do not call ingest() yourself for callback-based integrations.

Step 5 — Run your agent

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI  # or ChatGoogleGenerativeAI, etc.

llm = ChatOpenAI(model="gpt-4o", callbacks=[handler])
chain = ChatPromptTemplate.from_messages([("human", "{question}")]) | llm

chain.invoke({"question": "Summarize our refund policy."})
chain.invoke({"question": "What is the return window?"})  # same SESSION_ID → same eval session

Step 6 — (Optional) Trigger eval on ingest

Pass auto_run=True when you call harness.ingest() directly, or bind it on a session helper:

session = harness.session(
    SESSION_ID,
    agent_name="support-bot",
    source_app="my-app",
    auto_run=True,  # run eval after each ingest through this session
)

Callback-based integrations (LangChain, LangGraph, Agno, etc.) call harness.ingest() internally with platform defaults.

What happens under the hood

flowchart LR
  Agent[Your agent] --> Integration[SDK integration]
  Integration --> Harness[Harness.ingest]
  Harness --> API[Exemplar platform API]
  API --> Eval[Harness eval]

Each ingest POSTs an envelope v1 payload (sourceType + framework-native data). The Exemplar platform API maps it to eval session turns.

A successful ingest returns metadata such as sessionId, turnCount, and evalStatus.

Complete example (steps 3–5)

from exemplar_harness import Harness
from exemplar_harness.integrations.langchain import make_langchain_callback_handler
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

harness = Harness.from_env()
session_id = "support-sess-001"

handler = make_langchain_callback_handler(
    harness,
    session_id=session_id,
    chain_name="support-bot",
    source_app="my-app",
)

llm = ChatOpenAI(model="gpt-4o", callbacks=[handler])
chain = ChatPromptTemplate.from_messages([("human", "{question}")]) | llm

for question in (
    "Summarize our refund policy.",
    "What is the return window?",
):
    chain.invoke({"question": question})

No framework? Ingest turns directly

For fixtures, batch replay, or custom pipelines:

harness.ingest(
    "generic",
    session_id="sess-abc",
    event="turns",
    data={
        "turns": [
            {
                "input": "What is harness eval?",
                "output": "Automated judge over agent sessions.",
                "model": "gpt-4o",
            }
        ]
    },
    agent_name="my-agent",
    source_app="my-app",
)

Choose a different framework

Framework Extra Wire this
LangChain langchain make_langchain_callback_handlercallbacks=[handler]
LangGraph langgraph HarnessLangGraphHandler.make_graph_callback_handler()invoke(..., config={"callbacks": [...]})
LiteLLM litellm register_litellm_handler + metadata={"session_id": ...}
OpenAI SDK openai HarnessOpenAICallback.on_completion after chat.completions.create
Anthropic SDK anthropic make_anthropic_middlewareAnthropic(middleware=[...])
Claude Agent SDK claude-agent run_query() or record_agent_result after query()
Portkey portkey HarnessPortkeyCallback.on_completion after chat.completions.create
Agno agno harness_agno_post_hookAgent.post_hooks
Haystack haystack HarnessHaystackHandler.run_and_record or record_generator_run
LlamaIndex llamaindex register_llamaindex_handler (auto) or record_llm_chat_end
AutoGen autogen HarnessAutoGenHandler.on_agent_run_complete after agent.run
CrewAI crewai make_crewai_listener (auto) or on_task_complete
Google GenAI SDK google-genai instrument_google_genai_client or sdk_generate_content
Google ADK google-adk ingest_adk_session after session.model_dump()
Pydantic AI pydantic-ai instrument_pydantic_ai_agent or record_run after agent.run
Semantic Kernel semantic-kernel register_semantic_kernel_filter on Kernel
smolagents smolagents harness_smolagents_step_callbackstep_callbacks

Copy-paste snippets for each framework are in Framework integrations below.

Ingest parameters

Parameter Purpose
session_id Groups turns into one eval session
agent_name Agent identifier on the ingest body
source_app Your application name (defaults to source_type)
auto_run Run eval immediately after ingest (True / False / omit)
harness Optional eval metadata (scenario tags, categories, etc.)

Framework integrations

Step-by-step quick start above uses LangChain. Below are equivalent snippets for each supported framework.

LangChain

Already covered in Quick start. Minimal reference:

from exemplar_harness.integrations.langchain import make_langchain_callback_handler
from langchain_openai import ChatOpenAI

handler = make_langchain_callback_handler(harness, session_id="sess-abc", chain_name="support-bot")
llm = ChatOpenAI(model="gpt-4o", callbacks=[handler])

LangGraph

pip install "exemplar-harness-sdk[langgraph]"

Pass make_graph_callback_handler() in your graph invoke config:

from typing import Annotated, TypedDict

from exemplar_harness import Harness
from exemplar_harness.integrations.langgraph import HarnessLangGraphHandler
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import END, START, StateGraph
from langgraph.graph.message import add_messages

harness = Harness(api_key="eis_...")
session = HarnessLangGraphHandler(harness, session_id="sess-abc", graph_name="support-bot")
handler = session.make_graph_callback_handler()
config = {"callbacks": [handler], "configurable": {"thread_id": "sess-abc"}}

llm = ChatOpenAI(model="gpt-4o")


class State(TypedDict):
    messages: Annotated[list, add_messages]


def agent(state: State, run_config):
    return {"messages": [llm.invoke(state["messages"], config=run_config)]}


graph = StateGraph(State)
graph.add_node("agent", agent)
graph.add_edge(START, "agent")
graph.add_edge("agent", END)
app = graph.compile()

app.invoke(
    {"messages": [HumanMessage(content="Summarize our refund policy.")]},
    config=config,
)

LiteLLM

pip install "exemplar-harness-sdk[litellm]"

Register a custom logger once; pass session_id in completion metadata:

import litellm
from exemplar_harness import Harness
from exemplar_harness.integrations.litellm import register_litellm_handler

harness = Harness(api_key="eis_...")
register_litellm_handler(harness, source_app="my-app", agent_name="support-bot")

litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize our refund policy."}],
    metadata={"session_id": "sess-abc"},
)

Anthropic SDK

pip install "exemplar-harness-sdk[anthropic]"

One HarnessAnthropicCallback per session. Prefer client middleware for automatic ingest:

import anthropic
from exemplar_harness import Harness
from exemplar_harness.integrations.anthropic import (
    HarnessAnthropicCallback,
    make_anthropic_middleware,
)

harness = Harness(api_key="eis_...")
callback = HarnessAnthropicCallback(harness, session_id="sess-abc", agent_name="support-bot")

client = anthropic.Anthropic(middleware=[make_anthropic_middleware(callback)])
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize our refund policy."}],
)

Or call on_completion manually after each messages.create if you cannot use middleware.

Claude Agent SDK

pip install "exemplar-harness-sdk[claude-agent]"

Use run_query to run the agent and ingest automatically:

import asyncio

from exemplar_harness import Harness
from exemplar_harness.integrations.claude_agent import HarnessClaudeAgentHandler

harness = Harness(api_key="eis_...")
handler = HarnessClaudeAgentHandler(harness, session_id="sess-abc", agent_name="research-agent")

asyncio.run(handler.run_query("Summarize harness eval in one sentence."))

Or call record_agent_result manually after query() returns a ResultMessage:

import asyncio

from claude_agent_sdk import ResultMessage, query
from exemplar_harness import Harness
from exemplar_harness.integrations.claude_agent import HarnessClaudeAgentHandler

harness = Harness(api_key="eis_...")
handler = HarnessClaudeAgentHandler(harness, session_id="sess-abc", agent_name="research-agent")


async def main() -> None:
    prompt = "Summarize harness eval in one sentence."
    result = None
    async for message in query(prompt=prompt):
        if isinstance(message, ResultMessage):
            result = message
    handler.record_agent_result(prompt=prompt, result=result)


asyncio.run(main())

OpenAI SDK

pip install "exemplar-harness-sdk[openai]"

Call on_completion after each chat.completions.create:

from openai import OpenAI
from exemplar_harness import Harness
from exemplar_harness.integrations.openai import HarnessOpenAICallback

harness = Harness(api_key="eis_...")
callback = HarnessOpenAICallback(harness, session_id="sess-abc", agent_name="support-bot")

client = OpenAI()
messages = [{"role": "user", "content": "Summarize our refund policy."}]
response = client.chat.completions.create(model="gpt-4o", messages=messages)

callback.on_completion(messages=messages, response=response, model="gpt-4o")

Portkey

pip install "exemplar-harness-sdk[portkey]"

Call on_completion after each chat.completions.create:

import os

from portkey_ai import Portkey
from exemplar_harness import Harness
from exemplar_harness.integrations.portkey import HarnessPortkeyCallback

harness = Harness(api_key="eis_...")
callback = HarnessPortkeyCallback(harness, session_id="sess-abc", agent_name="support-bot")

client = Portkey(api_key=os.environ["PORTKEY_API_KEY"])
messages = [{"role": "user", "content": "Summarize our refund policy."}]
response = client.chat.completions.create(model="@openai/gpt-4o", messages=messages)

callback.on_completion(messages=messages, response=response, model="@openai/gpt-4o")

Agno

pip install "exemplar-harness-sdk[agno]"

Pass harness_agno_post_hook to Agent.post_hooks — each agent.run ingests automatically:

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from exemplar_harness import Harness
from exemplar_harness.integrations.agno import harness_agno_post_hook

harness = Harness(api_key="eis_...")
agent = Agent(
    name="support-bot",
    model=OpenAIChat(id="gpt-4o"),
    post_hooks=[
        harness_agno_post_hook(harness, session_id="sess-abc", agent_name="support-bot")
    ],
)

agent.run("Summarize our refund policy.")

Haystack

pip install "exemplar-harness-sdk[haystack]"

Use run_and_record to run the pipeline and ingest in one call:

from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from exemplar_harness import Harness
from exemplar_harness.integrations.haystack import HarnessHaystackHandler

harness = Harness(api_key="eis_...")
handler = HarnessHaystackHandler(harness, session_id="sess-abc", pipeline_name="support-bot")

prompt_builder = ChatPromptBuilder(
    template=[ChatMessage.from_user("{{ question }}")],
    required_variables=["question"],
)
llm = OpenAIChatGenerator(model="gpt-4o")
pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("prompt_builder.prompt", "llm.messages")

question = "Summarize our refund policy."
handler.run_and_record(pipe, {"prompt_builder": {"question": question}}, question=question)

You can also call record_generator_run manually after pipe.run, or pass handler.make_snapshot_callback(question=...) to pipe.run(..., snapshot_callback=...).

LlamaIndex

pip install "exemplar-harness-sdk[llamaindex]"

Register instrumentation once; LlamaIndex LLM calls ingest automatically:

from exemplar_harness import Harness
from exemplar_harness.integrations.llamaindex import register_llamaindex_handler
from llama_index.core.llms import ChatMessage
from llama_index.llms.openai import OpenAI

harness = Harness(api_key="eis_...")
register_llamaindex_handler(harness, session_id="sess-abc", workflow_name="support-workflow")

llm = OpenAI(model="gpt-4o")
messages = [ChatMessage(role="user", content="Summarize our refund policy.")]
llm.chat(messages)

Or call record_llm_chat_end manually after each llm.chat if you prefer explicit control.

AutoGen

pip install "exemplar-harness-sdk[autogen]"

Call on_agent_run_complete after each agent.run:

import asyncio

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from exemplar_harness import Harness
from exemplar_harness.integrations.autogen import HarnessAutoGenHandler

harness = Harness(api_key="eis_...")
handler = HarnessAutoGenHandler(
    harness, session_id="sess-abc", agent_name="assistant", team_name="support-team"
)


async def run() -> None:
    client = OpenAIChatCompletionClient(model="gpt-4o")
    agent = AssistantAgent(name="assistant", model_client=client)
    try:
        task = "Summarize our refund policy."
        result = await agent.run(task=task)
        handler.on_agent_run_complete(task=task, result=result, model="gpt-4o")
    finally:
        await client.close()


asyncio.run(run())

CrewAI

pip install "exemplar-harness-sdk[crewai]"

Register a CrewAI event listener for automatic ingest on TaskCompletedEvent:

from exemplar_harness import Harness
from exemplar_harness.integrations.crewai import make_crewai_listener

harness = Harness(api_key="eis_...")
monitor = make_crewai_listener(
    harness, session_id="sess-abc", crew_name="research-crew", model="gpt-4o"
)

# Run your crew — task completions ingest automatically via the listener.

Or call on_task_complete manually when each task finishes:

from exemplar_harness.integrations.crewai import HarnessCrewAIMonitor

monitor = HarnessCrewAIMonitor(
    harness, session_id="sess-abc", crew_name="research-crew", model="gpt-4o"
)
monitor.on_task_complete(
    agent_role="Researcher",
    task_description="Summarize our refund policy.",
    output="Our policy allows returns within 30 days.",
    prompt_tokens=120,
    completion_tokens=45,
)

Google GenAI SDK

pip install "exemplar-harness-sdk[google-genai]"

Use instrument_google_genai_client so each generate_content call ingests automatically:

from google import genai
from exemplar_harness import Harness
from exemplar_harness.integrations.google_genai import (
    HarnessGoogleGenAICallback,
    instrument_google_genai_client,
)

harness = Harness(api_key="eis_...")
callback = HarnessGoogleGenAICallback(harness, session_id="sess-abc", agent_name="support-bot")

client = genai.Client(api_key="your-google-api-key")
instrument_google_genai_client(callback, client)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Summarize our refund policy.",
)
print(response.text)

Or wrap a single call with sdk_generate_content if you prefer not to patch the client.

Google ADK

pip install "exemplar-harness-sdk[google-adk]"

Run your ADK agent, then export and ingest the session:

import asyncio

from exemplar_harness import Harness
from exemplar_harness.integrations.google_adk import ingest_adk_session
from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types

harness = Harness(api_key="eis_...")
session_id, app_name, user_id = "sess-abc", "my_app", "user-1"

agent = LlmAgent(
    model="gemini-2.5-flash",
    name="my_agent",  # must be a valid Python identifier
    instruction="Answer concisely.",
)
sessions = InMemorySessionService()


async def run_and_ingest() -> None:
    await sessions.create_session(app_name=app_name, user_id=user_id, session_id=session_id)
    runner = Runner(agent=agent, app_name=app_name, session_service=sessions)

    message = types.Content(role="user", parts=[types.Part(text="Hello")])
    async for _ in runner.run_async(user_id=user_id, session_id=session_id, new_message=message):
        pass

    session = await sessions.get_session(app_name=app_name, user_id=user_id, session_id=session_id)
    ingest_adk_session(
        harness,
        session.model_dump(mode="json", exclude_none=True),
        session_id=session_id,
        agent_name="my-agent",
    )


asyncio.run(run_and_ingest())

If you already have a session export dict:

ingest_adk_session(harness, adk_session, session_id="sess-abc", agent_name="my-agent")

Pydantic AI

pip install "exemplar-harness-sdk[pydantic-ai]"

Instrument the agent so each run / run_sync ingests automatically:

from pydantic_ai import Agent
from exemplar_harness import Harness
from exemplar_harness.integrations.pydantic_ai import (
    HarnessPydanticAIHandler,
    instrument_pydantic_ai_agent,
)

harness = Harness(api_key="eis_...")
handler = HarnessPydanticAIHandler(harness, session_id="sess-abc", agent_name="support-bot")
agent = Agent("google:gemini-2.5-flash")
instrument_pydantic_ai_agent(handler, agent)

agent.run_sync("Summarize our refund policy.")

Semantic Kernel

pip install "exemplar-harness-sdk[semantic-kernel]"

Register a function-invocation filter on your kernel:

from semantic_kernel import Kernel
from exemplar_harness import Harness
from exemplar_harness.integrations.semantic_kernel import register_semantic_kernel_filter

harness = Harness(api_key="eis_...")
kernel = Kernel()
register_semantic_kernel_filter(kernel, harness, session_id="sess-abc", plugin_name="my-plugin")

# Each kernel.invoke(...) ingests automatically after the function runs.

smolagents

pip install "exemplar-harness-sdk[smolagents]"

Pass a step callback when creating your agent:

from smolagents import CodeAgent
from exemplar_harness import Harness
from exemplar_harness.integrations.smolagents import harness_smolagents_step_callback

harness = Harness(api_key="eis_...")
agent = CodeAgent(
    tools=[],
    model=model,
    step_callbacks=[
        harness_smolagents_step_callback(harness, session_id="sess-abc", agent_name="support-bot")
    ],
)
agent.run("Summarize our refund policy.")

Reference

Optional extras

Extra Module
langchain exemplar_harness.integrations.langchain
langgraph exemplar_harness.integrations.langgraph
litellm exemplar_harness.integrations.litellm
openai exemplar_harness.integrations.openai
anthropic exemplar_harness.integrations.anthropic
claude-agent exemplar_harness.integrations.claude_agent
portkey exemplar_harness.integrations.portkey
agno exemplar_harness.integrations.agno
haystack exemplar_harness.integrations.haystack
llamaindex exemplar_harness.integrations.llamaindex
autogen exemplar_harness.integrations.autogen
crewai exemplar_harness.integrations.crewai
google-adk exemplar_harness.integrations.google_adk
google-genai exemplar_harness.integrations.google_genai
pydantic-ai exemplar_harness.integrations.pydantic_ai
semantic-kernel exemplar_harness.integrations.semantic_kernel
smolagents exemplar_harness.integrations.smolagents

Vercel AI SDK is not included in the Python package (TypeScript SDK path).

Envelope v1

Each ingest POSTs schemaVersion: 1 to POST /api/harness-eval/v1/sessions with a sourceType and framework-native data payload. The Exemplar platform API maps envelopes to EvalSession turns.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exemplar_harness_sdk-0.1.2.tar.gz (31.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exemplar_harness_sdk-0.1.2-py3-none-any.whl (41.9 kB view details)

Uploaded Python 3

File details

Details for the file exemplar_harness_sdk-0.1.2.tar.gz.

File metadata

  • Download URL: exemplar_harness_sdk-0.1.2.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for exemplar_harness_sdk-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ad4d7f07d4fdb28bc68d3a8b5dd89c0ddc650c361a8432818a599521cf933156
MD5 3266bbe0eca03818d6f2b01963f2fc4d
BLAKE2b-256 37e0cc1ec097f59bb93f1a90738656f3385809a654ca34be7c2e0affabea63b7

See more details on using hashes here.

File details

Details for the file exemplar_harness_sdk-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for exemplar_harness_sdk-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c0ed02545ddde26e34444840e443979d0831ca0f55fe2bdcb3d75200d0f0024e
MD5 ff146a0e627861d0a9e2fc2d59d2c97e
BLAKE2b-256 b7e3e493340e236c92399d3b4085f630e1169926519980d44272dfa355aad5e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page