Python SDK for building servers implementing the Azure AI Responses protocol

These details have not been verified by PyPI

Project links

repository

Project description

Azure AI Agent Server Responses client library for Python

The azure-ai-agentserver-responses package provides the Responses protocol endpoints for Azure AI Hosted Agent containers. It plugs into the azure-ai-agentserver-core host framework and adds the full response lifecycle: create, stream (SSE), cancel, delete, replay, and input-item listing.

Getting started

Install the package

pip install azure-ai-agentserver-responses

This automatically installs azure-ai-agentserver-core as a dependency.

Prerequisites

Python 3.10 or later

Key concepts

ResponsesAgentServerHost

ResponsesAgentServerHost is an AgentServerHost subclass that adds Responses protocol endpoints. Register your handler with the @app.response_handler decorator:

@app.response_handler
def my_handler(
    request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event
):
    ...

Protocol endpoints

Method	Route	Description
`POST`	`/responses`	Create a new response
`GET`	`/responses/{response_id}`	Get response state (JSON or SSE replay via `?stream=true`)
`POST`	`/responses/{response_id}/cancel`	Cancel an in-flight response
`DELETE`	`/responses/{response_id}`	Delete a stored response
`GET`	`/responses/{response_id}/input_items`	List input items (paginated)

TextResponse

The simplest way to return text. Handles the full SSE lifecycle automatically (response.created → response.in_progress → message/content events → response.completed):

return TextResponse(context, request, text="Hello!")

For streaming, pass an async iterable to text:

async def tokens():
    for t in ["Hello", ", ", "world!"]:
        yield t

return TextResponse(context, request, text=tokens())

ResponseEventStream

Use ResponseEventStream when you need function calls, reasoning items, multiple output types, or fine-grained event control. Each yield maps 1:1 to an SSE event with zero bookkeeping:

stream = ResponseEventStream(response_id=context.response_id, request=request)
yield stream.emit_created()
yield stream.emit_in_progress()
yield from stream.output_item_message("Hello, world!")
yield stream.emit_completed()

Drop down to the builder API for full control over individual events:

message = stream.add_output_item_message()
yield message.emit_added()
text = message.add_text_content()
yield text.emit_added()
yield text.emit_delta("Hello!")
yield text.emit_text_done()
yield text.emit_done()
yield message.emit_done()

ResponseContext

The ResponseContext provides request-scoped state:

Property / Method	Description
`response_id`	Unique ID for this response
`is_shutdown_requested`	Whether the server is draining
`platform_context`	`PlatformContext` with `user_id_key` (from `x-agent-user-id`) and `call_id` (from `x-agent-foundry-call-id`) for multi-tenant state partitioning and per-request caller-context forwarding
`client_headers`	Dictionary of `x-client-*` headers forwarded from the platform (keys normalized to lowercase)
`query_parameters`	Dictionary of query string parameters
`get_input_items()`	Load resolved input items as `Item` subtypes
`get_input_text()`	Extract all text content from input items as a single string
`get_history()`	Load conversation history items

Streaming and background modes

The SDK automatically handles all combinations of stream and background flags:

Default — Run to completion, return final JSON response
Streaming — Pipe events as SSE in real-time, cancel on client disconnect
Background — Return immediately, handler runs in the background
Streaming + Background — SSE while connected, handler continues after disconnect

Response lifecycle

The library orchestrates the complete response lifecycle: created → in_progress → completed (or failed / cancelled). Cancellation, error handling, and terminal event guarantees are all managed automatically.

For detailed handler implementation guidance, see docs/handler-implementation-guide.md.

Examples

Echo handler

import asyncio

from azure.ai.agentserver.responses import (
    CreateResponse,
    ResponseContext,
    ResponsesAgentServerHost,
    TextResponse,
)

app = ResponsesAgentServerHost()


@app.response_handler
async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
    text = await context.get_input_text()
    return TextResponse(context, request, text=f"Echo: {text}")


app.run()

Multi-user session (per-request call ID)

On container protocol 2.0.0 a single agent session can serve multiple users. Forwarding the per-request x-agent-foundry-call-id on outbound toolbox calls lets the tool server resolve which user made this request and act on their behalf — so user A's and user B's requests to the same session each get a user-scoped result. (x-agent-user-id is never forwarded; the tool resolves the user from the call ID server-side. Use context.platform_context.user_id_key only for the container's own per-user state.)

import asyncio
import os

import httpx
from azure.ai.agentserver.core import get_request_context
from azure.ai.agentserver.responses import (
    CreateResponse,
    ResponseContext,
    ResponsesAgentServerHost,
    TextResponse,
)

app = ResponsesAgentServerHost()


@app.response_handler
async def handler(request: CreateResponse, context: ResponseContext, cancellation_signal: asyncio.Event):
    # platform_headers() echoes x-agent-foundry-call-id only (never x-agent-user-id).
    headers = get_request_context().platform_headers()

    # Toolbox / MCP — attach the call ID PER CALL. The MCP session is long-lived and
    # shared across users/turns, so never bake one call's ID into static client headers.
    async with httpx.AsyncClient() as mcp:
        resp = await mcp.post(
            f"{os.environ['FOUNDRY_PROJECT_ENDPOINT']}/toolboxes/github/mcp",
            headers={"Authorization": f"Bearer {get_agent_token()}", **headers},  # get_agent_token(): the agent's managed-identity token
            json={"jsonrpc": "2.0", "method": "tools/call",
                  "params": {"name": "list_my_assigned_issues", "arguments": {}}},
        )
        # The toolbox resolved the caller from the call ID and returned THIS user's issues.

    return TextResponse(context, request, text=resp.text)


app.run()

Function calling

import json

from azure.ai.agentserver.responses import ResponseEventStream

stream = ResponseEventStream(response_id=context.response_id, request=request)
yield stream.emit_created()
yield stream.emit_in_progress()

arguments = json.dumps({"location": "Seattle", "unit": "fahrenheit"})
yield from stream.output_item_function_call("get_weather", "call_001", arguments)

yield stream.emit_completed()

Reasoning + text message

stream = ResponseEventStream(response_id=context.response_id, request=request)
yield stream.emit_created()
yield stream.emit_in_progress()

yield from stream.output_item_reasoning_item("Let me think about this...")
yield from stream.output_item_message("Here is my answer.")

yield stream.emit_completed()

Configuration

from azure.ai.agentserver.responses import ResponsesAgentServerHost, ResponsesServerOptions

options = ResponsesServerOptions(
    default_model="gpt-4o",
    sse_keep_alive_interval_seconds=15,
    shutdown_grace_period_seconds=10,
)

app = ResponsesAgentServerHost(options=options)

Troubleshooting

Common errors

400 Bad Request: The request body failed validation. Check that optional fields such as model (when provided) are valid and that input items are well-formed.
404 Not Found: The response ID does not exist or has expired past the configured TTL.
400 Bad Request (cancel): The response was not created with background=true, or it has already reached a terminal state.

Reporting issues

To report an issue with the client library, or request additional features, please open a GitHub issue here.

Next steps

Visit the Samples folder for complete working examples:

Sample	Description
Getting Started	Minimal echo handler using `TextResponse`
Streaming Text Deltas	Token-by-token streaming with `configure` callback
Full Control	Convenience, streaming, and builder — three ways to emit output
Function Calling	Two-turn function calling with convenience and builder variants
Conversation History	Multi-turn study tutor with `context.get_history()`
Multi-Output	Reasoning + message in a single response
Streaming Upstream	Forward to upstream streaming LLM via `openai` SDK
Non-Streaming Upstream	Forward to upstream non-streaming LLM, emit items via builders
Image Generation	Image gen convenience, streaming partials, and full-control builder
Image Input	Receive images via URL, base64 data URL, or file ID
File Inputs	Receive files via base64 data URL, URL, or file ID
Annotations	Attach file_path, file_citation, and url_citation annotations
Structured Outputs	Return structured JSON as a `structured_outputs` item

Handler implementation guide — Detailed reference for building handlers

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information, see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Project details

These details have not been verified by PyPI

Project links

repository

Release history Release notifications | RSS feed

This version

1.0.0b8 pre-release

Jun 28, 2026

1.0.0b7 pre-release

May 25, 2026

1.0.0b6 pre-release

May 21, 2026

1.0.0b5 pre-release

Apr 23, 2026

1.0.0b4 pre-release

Apr 20, 2026

1.0.0b3 pre-release

Apr 19, 2026

1.0.0b2 pre-release

Apr 19, 2026

1.0.0b1 pre-release

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure_ai_agentserver_responses-1.0.0b8.tar.gz (450.1 kB view details)

Uploaded Jun 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

azure_ai_agentserver_responses-1.0.0b8-py3-none-any.whl (269.3 kB view details)

Uploaded Jun 28, 2026 Python 3

File details

Details for the file azure_ai_agentserver_responses-1.0.0b8.tar.gz.

File metadata

Download URL: azure_ai_agentserver_responses-1.0.0b8.tar.gz
Upload date: Jun 28, 2026
Size: 450.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for azure_ai_agentserver_responses-1.0.0b8.tar.gz
Algorithm	Hash digest
SHA256	`bc0365fd70b7dabf9c9394dac5bbab08f772b59f865319d401cfde317b6832ef`
MD5	`b57649415cbacdf18eeea6a1da5f4b49`
BLAKE2b-256	`311f7e7563705100f2d21c952a44d5a2e93a8193cc0a6013941bac9f52ad8874`

See more details on using hashes here.

File details

Details for the file azure_ai_agentserver_responses-1.0.0b8-py3-none-any.whl.

File metadata

Download URL: azure_ai_agentserver_responses-1.0.0b8-py3-none-any.whl
Upload date: Jun 28, 2026
Size: 269.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for azure_ai_agentserver_responses-1.0.0b8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4243de685ffb3c3a3c11c4f07e0156b20ea7f37aecfe6e8eec53f776734b3808`
MD5	`0e681baf80fd24fa18b88c0fdb7849d5`
BLAKE2b-256	`d30dbb14df13b7d3d57decc72c0908677e229efcb24cad2ccce0c16fa736edce`

See more details on using hashes here.

azure-ai-agentserver-responses 1.0.0b8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Azure AI Agent Server Responses client library for Python

Getting started

Install the package

Prerequisites

Key concepts

ResponsesAgentServerHost

Protocol endpoints

TextResponse

ResponseEventStream

ResponseContext

Streaming and background modes

Response lifecycle

Examples

Echo handler

Multi-user session (per-request call ID)

Function calling

Reasoning + text message

Configuration

Troubleshooting

Common errors

Reporting issues

Next steps

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes