Enhanced wrapper for Azure AI Inference SDK with automatic retry, JSON validation, and reasoning separation

These details have not been verified by PyPI

Project links

Project description

azure-ai-inference-plus

The easier way to use Azure AI Inference SDK ✨

Enhanced wrapper that makes Azure AI Inference SDK simple and reliable with automatic retry, JSON validation, and reasoning separation.

Why Use This Instead?

✅ Reasoning separation - automatically splits thinking from output (.content and .reasoning)
✅ Automatic retries - never lose requests to transient failures
✅ JSON that works - guaranteed valid JSON or automatic retry
✅ One import - no need for multiple Azure SDK imports
✅ 100% compatible - drop-in replacement for Azure AI Inference SDK

🛡️ Handles Real-World LLM Issues

Automatic retries for the errors you actually encounter in production:

🔄 Service overloaded (timeouts)     → Auto-retry with backoff
🔄 Rate limits (429)                 → Smart retry timing
🔄 Azure service hiccups (5xx)       → Exponential backoff
🔄 Invalid JSON responses            → Re-request clean JSON
🔄 Network timeouts                  → Multiple quick attempts

Just works. No manual error handling needed.

Installation

pip install azure-ai-inference-plus

Supports Python 3.11+

Quick Start

from azure_ai_inference_plus import ChatCompletionsClient, SystemMessage, UserMessage

# Uses environment variables: AZURE_AI_ENDPOINT, AZURE_AI_API_KEY
client = ChatCompletionsClient()

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="What's the capital of France?"),
    ],
    max_tokens=100,
    model="Codestral-2501"
)

print(response.choices[0].message.content)
# "The capital of France is Paris..."

Or with manual credentials (everything from one import!):

from azure_ai_inference_plus import ChatCompletionsClient, SystemMessage, UserMessage, AzureKeyCredential

client = ChatCompletionsClient(
    endpoint="https://your-resource.services.ai.azure.com/models",
    credential=AzureKeyCredential("your-api-key")
)

🎯 Key Features

🧠 Automatic Reasoning Separation

Game changer for reasoning models like DeepSeek-R1 - automatically separates thinking from output:

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="What's 2+2? Think step by step."),
    ],
    model="DeepSeek-R1",
    reasoning_tags=["<think>", "</think>"]  # ✨ Auto-separation
)

# Clean output without reasoning clutter
print(response.choices[0].message.content)
# "2 + 2 equals 4."

# Access the reasoning separately
print(response.choices[0].message.reasoning)
# "Let me think about this step by step. 2 + 2 is a basic addition..."

✅ Guaranteed Valid JSON

No more JSON parsing errors - automatic validation and retry.

Simple JSON (standard models like GPT-4o):

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant that returns JSON."),
        UserMessage(content="Give me Tokyo info as JSON with keys: name, country, population"),
    ],
    max_tokens=500,
    model="gpt-4o",
    response_format="json_object"  # ✨ Auto-validation + retry
)

# Always valid JSON, no try/catch needed!
import json
data = json.loads(response.choices[0].message.content)  # ✅ Works perfectly

JSON with reasoning models (like DeepSeek-R1):

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant that returns JSON."),
        UserMessage(content="Give me Paris info as JSON with keys: name, country, population"),
    ],
    max_tokens=2000,  # More tokens needed for reasoning + JSON
    model="DeepSeek-R1",
    response_format="json_object",  # ✨ Clean JSON guaranteed
    reasoning_tags=["<think>", "</think>"]  # Required for reasoning separation
)

# Pure JSON - reasoning automatically stripped
data = json.loads(response.choices[0].message.content)  # {"name": "Paris", ...}

# But reasoning is still accessible
thinking = response.choices[0].message.reasoning  # "Let me think about Paris..."

Note: JSON responses are automatically cleaned of markdown wrappers (like ```json blocks) for reliable parsing.

🔄 Smart Automatic Retries

Built-in retry with exponential backoff - no configuration needed:

# Automatically retries on failures (including timeouts) - just works!
response = client.complete(
    messages=[UserMessage(content="Tell me a joke")],
    model="Phi-4"
)

⚙️ Custom Retry Configuration

from azure_ai_inference_plus import RetryConfig

# Override default behavior (with smart timeout strategy)
client = ChatCompletionsClient(
    connection_timeout=100.0,  # Better: 100s + retries vs 300s timeout
    retry_config=RetryConfig(max_retries=5, delay_seconds=2.0)
)

📢 Retry Callbacks (Optional Observability)

Get notified when retries happen - perfect for logging and monitoring:

from azure_ai_inference_plus import RetryConfig

def on_chat_retry(attempt, max_retries, exception, delay):
    print(f"🔄 Chat retry {attempt}/{max_retries}: {type(exception).__name__} - waiting {delay:.1f}s")

def on_json_retry(attempt, max_retries, message):
    print(f"📝 JSON retry {attempt}/{max_retries}: {message}")

# Add callbacks to your retry config
client = ChatCompletionsClient(
    retry_config=RetryConfig(
        max_retries=3,
        on_chat_retry=on_chat_retry,    # Called for general failures
        on_json_retry=on_json_retry     # Called for JSON validation failures
    )
)

# Now you'll see retry notifications:
# 🔄 Chat retry 1/3: HttpResponseError - waiting 1.0s
# 📝 JSON retry 2/3: Retry 2 after JSON validation failed

Why callbacks? The library doesn't print anything by default (clean for production), but callbacks let you add your own logging, metrics, or notifications exactly how you want them.

🚀 Embeddings Too

from azure_ai_inference_plus import EmbeddingsClient

client = EmbeddingsClient()
response = client.embed(
    input=["Hello world", "Python is great"],
    model="text-embedding-3-large"
)

Environment Setup

Create a .env file:

AZURE_AI_ENDPOINT=https://your-resource.services.ai.azure.com/models
AZURE_AI_API_KEY=your-api-key-here

Migration from Azure AI Inference SDK

2 simple steps:

pip install azure-ai-inference-plus

Change your import:

# Before
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

# After
from azure_ai_inference_plus import ChatCompletionsClient, SystemMessage, UserMessage, AzureKeyCredential

That's it! Your existing code works unchanged with automatic retries and JSON validation.

Manual Credential Setup

from azure_ai_inference_plus import ChatCompletionsClient, AzureKeyCredential

client = ChatCompletionsClient(
    endpoint="https://your-resource.services.ai.azure.com/models",
    credential=AzureKeyCredential("your-api-key")
)

Examples

Check out the examples/ directory for complete demonstrations:

basic_usage.py - Reasoning separation, JSON validation, retry features, and timeout strategy
embeddings_example.py - Embeddings with retry and credential setup
callbacks_example.py - Retry callbacks for logging and monitoring

All examples show real-world usage patterns and advanced features.

License

MIT

Contributing

Contributions are welcome! Whether it's bug fixes, feature additions, or documentation improvements, we appreciate your help in making this project better. For major changes or new features, please open an issue first to discuss what you would like to change.

Related Projects

langchain-azure-ai-inference-plus - The easier way to use Azure AI Inference SDK with LangChain ✨

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.4

Jun 9, 2025

This version

1.0.3

Jun 9, 2025

1.0.2

Jun 8, 2025

1.0.1

Jun 7, 2025

1.0.0

Jun 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure_ai_inference_plus-1.0.3.tar.gz (22.9 kB view details)

Uploaded Jun 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

azure_ai_inference_plus-1.0.3-py3-none-any.whl (15.4 kB view details)

Uploaded Jun 9, 2025 Python 3

File details

Details for the file azure_ai_inference_plus-1.0.3.tar.gz.

File metadata

Download URL: azure_ai_inference_plus-1.0.3.tar.gz
Upload date: Jun 9, 2025
Size: 22.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for azure_ai_inference_plus-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`bff61c79cba26f366a5b5406fd50a23c54606a20566485314ed13dd89abc74e2`
MD5	`abad5fb79f4e85a563ec8f8709334ed3`
BLAKE2b-256	`b3336330c7dadcc844dfc57b29b47492e0d00b703f9c7696ffe11fefb791e2fc`

See more details on using hashes here.

File details

Details for the file azure_ai_inference_plus-1.0.3-py3-none-any.whl.

File metadata

Download URL: azure_ai_inference_plus-1.0.3-py3-none-any.whl
Upload date: Jun 9, 2025
Size: 15.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for azure_ai_inference_plus-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`60fbb193058192d06b91504ba7f0acea9c2873c4775d5ad577be557626ad8132`
MD5	`ea8c5954636556e484a227a2477b97a3`
BLAKE2b-256	`4a2dbf6fc0a24b4d7f877e90158594ec22129f29a3cc0977e0812c588da7709a`

See more details on using hashes here.

azure-ai-inference-plus 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

azure-ai-inference-plus

Why Use This Instead?

🛡️ Handles Real-World LLM Issues

Installation

Quick Start

🎯 Key Features

🧠 Automatic Reasoning Separation

✅ Guaranteed Valid JSON

🔄 Smart Automatic Retries

⚙️ Custom Retry Configuration

📢 Retry Callbacks (Optional Observability)

🚀 Embeddings Too

Environment Setup

Migration from Azure AI Inference SDK

Manual Credential Setup

Examples

License

Contributing

Related Projects

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes