Python SDK for OpenAI-compatible inference endpoints (Courier and friends) with auto tool-call loops, structured outputs, and Whisper.

These details have not been verified by PyPI

Project links

Project description

Courier — ENCODE

Courier — ENCODE is an SDK specifically designed for developing AI apps and agents with open source LLMs.

Features

Python SDK
Request wrapper that sends messages to v1/chat/completions and v1/responses, configurable for each, auto formatting, and returns a pydantic response with props the end user can use in their code.
- The request wrapper includes agentic processing with an optional 'tools' prop. Python functions can be passed and automatically converted into tools for an agent. The SDK should parse out tool responses and call the functions, and automatically send a follow up message to the agent. The SDK should be considered an option that runs until it's completed it's tool calling, and then return a full list of messages in openai format.
- The request wrapper should optionally accept Pydantic models as a response_format for structured JSON responses and automatically apply the format. The response object should be auto formatted to that pydantic model and usable as such.
- The request wrapper should accept a pydantic model of messages. Images and audio should be included as well.
- a model name should be able to be selected.
A whisper request wrapper that allows for transcriptions and translations

Notes

encode should be a uv project and structured to be deployed on PyPi as an open source pip package.

Everywhere that pydantic is accepted JSON should be accepted as well.

The tool call loop runs until the model doesn't make a tool call (or a hard capped limit that can be specified). So, if a tool call is present, the code should execute and the response appended to the messages, and then another API requests sent to the model. Once a response comes in with no acions the code continues past the relay loop.

Web search should be a boolean to enable. If enabled it should automatically be appended using the shorthand schema.

function names

encode.relay() — the chat completions and responses wrapper
- .intercept() — listener that can be attached to a relay() call and execute code whenever a tool call loop is engaged, where the model is processing requests until the loop ends. Intercept runs every time a tool call finishes even if the model is continuing the loop.
encode.whisper() — whisper function. Accepts audio and can translate or transcribe.

Courier Docs

Courier

API docs

API Docs

Courier Inference API

Courier provides a custom inference API optimized for n8n and other workflows.

POST /inference/

{
  "model_name": "Solar Open 100B",
  "model_id": "Model_UUID",
  "model_type": "text-text",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant"
    },
    {
      "role": "user",
      "content": "hello"
    }
  ],
  "temperature": 0.7
}

Authentication

For the Courier /inference/ endpoint, use token authentication:

Authorization: API_KEY

OpenAI Compatible Endpoints

Courier supports OpenAI-compatible APIs for completions and responses workflows.

POST /v1/chat/completions

{
  "model": "Solar Open 100B",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant"
    },
    {
      "role": "user",
      "content": "hello"
    }
  ],
  "temperature": 0.7
}

GET /v1/models

{
  "object": "list",
  "data": [
    {
      "id": "model_name",
      "object": "model",
      "created": 1686935092,
      "owned_by": "recursion-ai"
    }
  ]
}

POST /v1/responses

{
  "model": "my-shared-model",
  "input": "Summarize this in one sentence.",
  "instructions": "optional system/developer instruction",
  "tools": [],
  "tool_choice": "auto",
  "text": {
    "format": {
      "type": "text"
    }
  },
  "stream": false,
  "max_output_tokens": 256
}

Authentication (OpenAI Endpoints)

Use Bearer authentication:

Authorization: Bearer API_KEY

Tool Calling

Tool Calling API

Industry-leading tool calling for self-hosted AI stacks, with production-ready reliability for text and fused modality ( vision) models. One of the most robust OpenAI-compatible tool-calling implementations available on an API platform you can own.

Global OpenAI Compatibility

Auth header: Authorization: Bearer <api_key> is required.
Model matching is case-insensitive against workbench name or nickname, and only models available to the API key are usable.

Error Envelope

{
  "error": {
    "message": "....",
    "type": "invalid_request_error",
    "code": "...."
  }
}

POST /v1/chat/completions

Supported Request Fields

model, messages, tools, tool_choice, stream, response_format, stop, max_tokens, temperature, top_p, presence_penalty, frequency_penalty, user. n is accepted but currently returns one choice (index: 0).

Request Body Example

{
  "model": "Solar Open 100B",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the weather in Denver?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather by city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string"
            }
          },
          "required": [
            "city"
          ]
        }
      }
    }
  ],
  "tool_choice": "auto",
  "stream": false
}

Tool Support Rules

Only tools with type: "function" are used.
Forced tool_choice names must exist in tools, or requests fail with 400 invalid_tool_choice.
Text and fused modality (image-text-text) models support the tool-calling pipeline. Audio models do not support tools or streaming.
Tool arguments are normalized to JSON strings; strict invalid arguments fail with 400 invalid_tool_call.

Response Body Example

{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"Denver\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Streaming Behavior (SSE)

Emits chat.completion.chunk events.
Streams content via choices[0].delta.content.
Tool calls stream incrementally through delta.tool_calls argument chunks.
Final chunk sets finish_reason to tool_calls or stop, then emits [DONE].

POST /v1/responses

Supported Request Fields

model, input, messages, input_content, instructions, text, tools, tool_choice, stream, stop, max_tokens, max_output_tokens, temperature, top_p, presence_penalty, frequency_penalty, user. n, logit_bias, and input_type are accepted but not used in generation logic.

Request Body Example

{
  "model": "Solar Open 100B",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": "Find today's top headline"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "search_news",
        "description": "Search current headlines",
        "parameters": {
          "type": "object",
          "properties": {
            "query": {
              "type": "string"
            }
          },
          "required": [
            "query"
          ]
        }
      }
    }
  ],
  "tool_choice": "auto",
  "stream": false
}

Input Normalization

input can be a string or a list of typed items.
Supported item types: message, input_text, function_call, function_call_output.
reasoning input items are rejected with 400 invalid_input.

Response Body Example

{
  "object": "response",
  "output": [
    {
      "id": "msg_123",
      "type": "message",
      "status": "completed",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Searching now...",
          "annotations": []
        }
      ]
    },
    {
      "id": "fc_123",
      "type": "function_call",
      "status": "completed",
      "call_id": "call_abc123",
      "name": "search_news",
      "arguments": "{\"query\":\"top headline today\"}"
    }
  ]
}

Streaming Behavior (Responses SSE Events)

response.created, response.in_progress, response.output_item.added, response.content_part.added, response.output_text.delta, response.output_text.done, response.content_part.done, response.function_call_arguments.delta, response.function_call_arguments.done, response.output_item.done, response.completed, error

Streams end with data: [DONE].

Multimodal Restrictions on /v1/responses

For audio-* models, stream and tools are not supported.
When provided for audio models, text.format must be plain text.

Tool Calling Parity Notes

OpenAI-style behavior with function tools only.
Tool arguments must normalize to valid JSON strings.
Audio models do not support tools and/or streaming. Text and fused modality (image-text-text) models have full tool-calling parity.

Whisper API

OpenAI-compatible Whisper transcription and translation endpoints built for production automation workflows.

Implemented Endpoints

POST /v1/audio/transcriptions
POST /v1/audio/translations
/v1/audio/speech is not currently implemented.

Common Request Behavior

Multipart upload with required file and model.
Allowed extensions: .mp3, .mp4, .mpeg, .mpga, .m4a, .wav, .webm. Max size: 25 MB.
Error mapping: unsupported format → invalid_audio, too large → invalid_request_error, invalid format value → invalid_response_format.
model=whisper-1 maps to UCE_WHISPER_MODEL (default mlx-community/whisper-large-v3-turbo); other model names pass through unchanged.

POST /v1/audio/transcriptions

Request Example (multipart)

curl -X POST "$BASE_URL/v1/audio/transcriptions" \
  -H "Authorization: Bearer $API_KEY" \
  -F "file=@audio.wav" \
  -F "model=whisper-1" \
  -F "response_format=verbose_json" \
  -F "timestamp_granularities[]=word"

Response Example (json)

{
  "text": "Hello from Courier Whisper."
}

Response Example (verbose_json)

{
  "text": "Hello from Courier Whisper.",
  "language": "en",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 1.8,
      "text": "Hello from Courier Whisper."
    }
  ]
}

timestamp_granularities values segment and word are only allowed when response_format=verbose_json.

POST /v1/audio/translations

Request Example (multipart)

curl -X POST "$BASE_URL/v1/audio/translations" \
  -H "Authorization: Bearer $API_KEY" \
  -F "file=@audio-es.mp3" \
  -F "model=whisper-1" \
  -F "response_format=json"

Response Example

{
  "text": "This audio was translated into English."
}

Behavior Notes

Uses translate operation internally.
word_timestamps is disabled for translations.
For verbose_json, language defaults to en if upstream language is missing.

JSON Response Formatting

Structured JSON Outputs with Outlines

Courier supports guaranteed structured JSON outputs using the Outlines library. This feature enables models to generate responses that strictly adhere to a provided JSON schema through FSM-based logit masking.

Overview

Courier's structured JSON output feature uses Outlines to ensure models generate responses that strictly follow your JSON schema. This is achieved through Finite State Machine (FSM) based logit masking that constrains token generation to only produce valid JSON matching your schema.

Technical Architecture

FSM-Based Logit Masking: Outlines builds a Finite State Machine from your JSON schema that constrains token generation to only produce valid JSON matching the schema.
Generator Caching: The first time a schema is used, there's a 0.1-1s cold start while the FSM is compiled. Subsequent uses are instant (cached in memory per worker).
Thought Field Pattern: To prevent "probability tunneling", schemas are automatically enhanced with a "thought"or " reasoning" field if one isn't present, allowing natural language processing before data constraints.
Zero-Copy Integration: The Outlines wrapper shares the same MLX model weights in memory, providing minimal overhead when structured output is requested.

Usage

Both /v1/chat/completions and /inference/ endpoints use the OpenAI-compatible response_format parameter:

POST /v1/chat/completions

{
  "model": "Solar Open 100B",
  "messages": [
    {
      "role": "user",
      "content": "What is 123 * 456?"
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "schema": {
        "type": "object",
        "properties": {
          "thought": {
            "type": "string"
          },
          "answer": {
            "type": "number"
          }
        },
        "required": [
          "thought",
          "answer"
        ]
      }
    }
  }
}

POST /inference/

{
  "model_id": "uuid-here",
  "model_name": "your-model",
  "model_type": "text-text",
  "messages": [
    {
      "role": "user",
      "content": "Classify: 'Great product!'"
    }
  ],
  "temperature": 0.7,
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "schema": {
        "type": "object",
        "properties": {
          "reasoning": {
            "type": "string"
          },
          "sentiment": {
            "type": "string",
            "enum": [
              "positive",
              "negative",
              "neutral"
            ]
          }
        },
        "required": [
          "reasoning",
          "sentiment"
        ]
      }
    }
  }
}

Authentication

For the Courier /inference/ endpoint, use token authentication:

Authorization: API_KEY

For the OpenAI /v1/chat/completions endpoint, use bearer authentication:

Authorization: Bearer API_KEY

Example Structured Response

{
  "content": "{\n    \"reasoning\": \"The user sent a positive sentiment message: 'Great product!'. I need to classify this as positive sentiment.\",\n    \"sentiment\": \"positive\"\n  }"
}

As you can see, the LLM responded in the specified JSON structure with both reasoning and sentiment fields. You can parse this response using JSON.parse() without any extra formatting or validation needed.

Schema Examples

Simple Classification

{
  "type": "object",
  "properties": {
    "thought": {
      "type": "string"
    },
    "classification": {
      "type": "string",
      "enum": [
        "urgent",
        "normal",
        "low_priority"
      ]
    }
  },
  "required": [
    "thought",
    "classification"
  ]
}

Nested Objects

{
  "type": "object",
  "properties": {
    "analysis": {
      "type": "string"
    },
    "person": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "age": {
          "type": "integer"
        },
        "email": {
          "type": "string"
        }
      },
      "required": [
        "name",
        "age"
      ]
    }
  },
  "required": [
    "analysis",
    "person"
  ]
}

Arrays and Lists

{
  "type": "object",
  "properties": {
    "reasoning": {
      "type": "string"
    },
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": {
            "type": "string"
          },
          "quantity": {
            "type": "integer"
          }
        },
        "required": [
          "name",
          "quantity"
        ]
      }
    }
  },
  "required": [
    "reasoning",
    "items"
  ]
}

Enums and Constraints

{
  "type": "object",
  "properties": {
    "thought": {
      "type": "string"
    },
    "rating": {
      "type": "integer",
      "minimum": 1,
      "maximum": 5
    },
    "category": {
      "type": "string",
      "enum": [
        "electronics",
        "clothing",
        "food",
        "other"
      ]
    }
  },
  "required": [
    "thought",
    "rating",
    "category"
  ]
}

Limitations

No Streaming: Structured output currently doesn't support streaming. The response is returned as a complete JSON object.
Schema Complexity: Extremely complex schemas with deep nesting may take longer to compile or potentially fail.
Model Capabilities: The underlying model must be capable of understanding and following instructions. Smaller models may struggle with complex schemas.
Text & Vision Models: Supported for text-text and image-text-text (fused modality) model types. Audio and image generation models use standard unconstrained generation.

Backward Compatibility

When no response_format is provided, the system works exactly as it did before. There is zero impact on existing inference flows. This feature is purely opt-in.

// This still works exactly as before
{
  "model": "your-model",
  "messages": [
    {
      "role": "user",
      "content": "Hello!"
    }
  ]
  // No response_format = standard unconstrained generation
}

Web search

Web Search

Built-in web search that lets models automatically ground responses with real-time information from the web. Powered by the Brave Search API.

When enabled, models can decide to search the web mid-inference. The server handles the search transparently and returns a grounded response - no extra client-side logic required.

Setup

1. Get a Brave Search API Key

2. Configure the Key

Option A: Courier TUI Installer - Run courier and enter your key in the "Brave Search API Key" field (in the System Configuration section, below the ngrok fields).
Option B: Manual - Add to your ~/.courier/.env:

BRAVE_SEARCH_API_KEY=your_key_here

Then restart Courier.

Usage

Include web_search in your request's tools array. Two formats are supported:

Shorthand Format

{
  "model": "Qwen3 30B",
  "messages": [
    {
      "role": "user",
      "content": "What happened in the news today?"
    }
  ],
  "tools": [
    {
      "type": "web_search"
    }
  ]
}

Standard Function Format

{
  "model": "Qwen3 30B",
  "messages": [
    {
      "role": "user",
      "content": "What is the current price of Bitcoin?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "web_search",
        "description": "Search the web for current information",
        "parameters": {
          "type": "object",
          "properties": {
            "query": {
              "type": "string"
            }
          },
          "required": [
            "query"
          ]
        }
      }
    }
  ]
}

cURL Example

curl -X POST http://localhost:9100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "Qwen3 30B",
    "messages": [{"role": "user", "content": "What are the latest developments in AI?"}],
    "tools": [{"type": "web_search"}]
  }'

Streaming

curl -X POST http://localhost:9100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "Qwen3 30B",
    "messages": [{"role": "user", "content": "What are the latest developments in AI?"}],
    "tools": [{"type": "web_search"}],
    "stream": true
  }'

Streaming works with web search enabled. Add "stream": true to your request. The server resolves all searches before streaming the final grounded response.

Mixing Web Search with Other Tools

Web search works alongside your own function tools. The server executes web_search calls automatically while returning your custom tool calls to the client as normal.

Example

{
  "model": "Qwen3 30B",
  "messages": [
    {
      "role": "user",
      "content": "What's the weather in Denver and what's trending on Hacker News?"
    }
  ],
  "tools": [
    {
      "type": "web_search"
    },
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather by city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {
              "type": "string"
            }
          },
          "required": [
            "city"
          ]
        }
      }
    }
  ]
}

In this case, the model may call both tools. web_search is resolved server-side and get_weather is returned to the client for execution.

How It Works

The model decides whether a search is needed based on the user's question
If the model calls web_search, the server intercepts the call and queries the Brave Search API
The top 5 results (title, URL, description) are injected back as context
The model generates a final grounded response using the search results
The client receives only the final answer — the search loop is invisible

The server caps search iterations at 3 per request to prevent runaway loops.

Behavior Without a Key

If BRAVE_SEARCH_API_KEY is not configured and a request includes web_search, the tool is silently dropped. The request proceeds normally as if no tools were provided. A warning is logged server-side.

Pricing

Brave Search API charges $5 per 1,000 queries. New accounts get $5/month in free credits. There is no markup from Courier — you pay Brave directly for what you use.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.41

May 12, 2026

0.1.4

May 12, 2026

0.1.3

May 12, 2026

This version

0.1.2

May 10, 2026

0.1.1

May 9, 2026

0.1.0

May 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

courier_encode-0.1.2.tar.gz (110.0 kB view details)

Uploaded May 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

courier_encode-0.1.2-py3-none-any.whl (43.9 kB view details)

Uploaded May 10, 2026 Python 3

File details

Details for the file courier_encode-0.1.2.tar.gz.

File metadata

Download URL: courier_encode-0.1.2.tar.gz
Upload date: May 10, 2026
Size: 110.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for courier_encode-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`5abe62be57d41774e822939f19deffe85207f5244e22ad8cb99d8d3c09c2f79e`
MD5	`affc63c3004def173005e725d10447de`
BLAKE2b-256	`4c5b71f11f00644f7ab1cba305c7551f3412b76a6e7d036a39795978d0860a05`

See more details on using hashes here.

File details

Details for the file courier_encode-0.1.2-py3-none-any.whl.

File metadata

Download URL: courier_encode-0.1.2-py3-none-any.whl
Upload date: May 10, 2026
Size: 43.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for courier_encode-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`acd76d755136ae1003294856a47fedcc77c3dd91281679da4eb9fe49dbaec845`
MD5	`ba42be60ca26b94bf4aa08a6253af06e`
BLAKE2b-256	`e14854d46d6df4f29f0385a4b8d128aae377f50dec034fcac115c3af745df46d`

See more details on using hashes here.

courier-encode 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Courier — ENCODE

Features

Notes

function names

Courier Docs

API docs

API Docs

Courier Inference API

POST /inference/

Authentication

OpenAI Compatible Endpoints

POST /v1/chat/completions

GET /v1/models

POST /v1/responses

Authentication (OpenAI Endpoints)

Tool Calling

Tool Calling API

Global OpenAI Compatibility

Error Envelope

POST /v1/chat/completions

Supported Request Fields

Request Body Example

Tool Support Rules

Response Body Example

Streaming Behavior (SSE)

POST /v1/responses

Supported Request Fields

Request Body Example

Input Normalization

Response Body Example

Streaming Behavior (Responses SSE Events)

Multimodal Restrictions on /v1/responses

Tool Calling Parity Notes

Whisper API

Whisper API

Implemented Endpoints

Common Request Behavior

POST /v1/audio/transcriptions

Request Example (multipart)

Response Example (json)

Response Example (verbose_json)

POST /v1/audio/translations

Request Example (multipart)

Response Example

Behavior Notes

JSON Response Formatting

JSON Response Formatting

Structured JSON Outputs with Outlines

Overview

Technical Architecture

Usage

POST /v1/chat/completions

POST /inference/

Authentication

Example Structured Response

Schema Examples

Simple Classification

Nested Objects

Arrays and Lists

Enums and Constraints

Limitations

Backward Compatibility

Web search

Web Search

Setup

1. Get a Brave Search API Key

2. Configure the Key

Usage

Shorthand Format

Standard Function Format

cURL Example