Skip to main content

Enhanced wrapper for Azure AI Inference SDK with automatic retry, JSON validation, and reasoning separation

Project description

azure-ai-inference-plus

The easier way to use Azure AI Inference SDK

Enhanced wrapper that makes Azure AI Inference SDK simple and reliable with automatic retry, JSON validation, and reasoning separation.

Why Use This Instead?

Reasoning separation - automatically splits thinking from output (.content and .reasoning)
Automatic retries - never lose requests to transient failures
JSON that works - guaranteed valid JSON or automatic retry
One import - no need for multiple Azure SDK imports
100% compatible - drop-in replacement for Azure AI Inference SDK

Installation

pip install azure-ai-inference-plus

Supports Python 3.10+

Quick Start

from azure_ai_inference_plus import ChatCompletionsClient, SystemMessage, UserMessage

# Uses environment variables: AZURE_AI_ENDPOINT, AZURE_AI_API_KEY
client = ChatCompletionsClient()

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="What's the capital of France?"),
    ],
    max_tokens=100,
    model="Codestral-2501"
)

print(response.choices[0].message.content)
# "The capital of France is Paris..."

Or with manual credentials (everything from one import!):

from azure_ai_inference_plus import ChatCompletionsClient, SystemMessage, UserMessage, AzureKeyCredential

client = ChatCompletionsClient(
    endpoint="https://your-resource.services.ai.azure.com/models",
    credential=AzureKeyCredential("your-api-key")
)

🎯 Key Features

🧠 Automatic Reasoning Separation

Game changer for reasoning models like DeepSeek-R1 - automatically separates thinking from output:

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="What's 2+2? Think step by step."),
    ],
    model="DeepSeek-R1",
    reasoning_tags=["<think>", "</think>"]  # ✨ Auto-separation
)

# Clean output without reasoning clutter
print(response.choices[0].message.content)
# "2 + 2 equals 4."

# Access the reasoning separately
print(response.choices[0].message.reasoning)
# "Let me think about this step by step. 2 + 2 is a basic addition..."

For JSON mode, reasoning is automatically removed so you get clean JSON:

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant that returns JSON."),
        UserMessage(content="Give me Paris info as JSON with keys: name, country, population"),
    ],
    max_tokens=2000,
    model="DeepSeek-R1",
    response_format="json_object",  # ✨ Clean JSON guaranteed
    reasoning_tags=["<think>", "</think>"]
)

# Pure JSON - reasoning automatically stripped
data = response.choices[0].message.content  # {"name": "Paris", ...}

# But reasoning is still accessible
thinking = response.choices[0].message.reasoning  # "Let me think about Paris..."

✅ Guaranteed Valid JSON

No more JSON parsing errors - automatic validation and retry:

response = client.complete(
    messages=[UserMessage(content="Give me a JSON response")],
    model="Codestral-2501",
    response_format="json_object"  # ✨ Auto-validation + retry
)

# Always valid JSON, no try/catch needed!
data = response.choices[0].message.content

🔄 Smart Automatic Retries

Built-in retry with exponential backoff - no configuration needed:

# Automatically retries on failures - just works!
response = client.complete(
    messages=[UserMessage(content="Tell me a joke")],
    model="Phi-4"
)

⚙️ Custom Retry (If Needed)

from azure_ai_inference_plus import RetryConfig

# Override default behavior
client = ChatCompletionsClient(
    retry_config=RetryConfig(max_retries=5, delay_seconds=2.0)
)

📢 Retry Callbacks (Optional Observability)

Get notified when retries happen - perfect for logging and monitoring:

from azure_ai_inference_plus import RetryConfig

def on_chat_retry(attempt, max_retries, exception, delay):
    print(f"🔄 Chat retry {attempt}/{max_retries}: {type(exception).__name__} - waiting {delay:.1f}s")

def on_json_retry(attempt, max_retries, message):
    print(f"📝 JSON retry {attempt}/{max_retries}: {message}")

# Add callbacks to your retry config
client = ChatCompletionsClient(
    retry_config=RetryConfig(
        max_retries=3,
        on_chat_retry=on_chat_retry,    # Called for general failures
        on_json_retry=on_json_retry     # Called for JSON validation failures
    )
)

# Now you'll see retry notifications:
# 🔄 Chat retry 1/3: HttpResponseError - waiting 1.0s
# 📝 JSON retry 2/3: Retry 2 after JSON validation failed

Why callbacks? The library doesn't print anything by default (clean for production), but callbacks let you add your own logging, metrics, or notifications exactly how you want them.

🚀 Embeddings Too

from azure_ai_inference_plus import EmbeddingsClient

client = EmbeddingsClient()
response = client.embed(
    input=["Hello world", "Python is great"],
    model="text-embedding-3-large"
)

Environment Setup

Create a .env file:

AZURE_AI_ENDPOINT=https://your-resource.services.ai.azure.com/models
AZURE_AI_API_KEY=your-api-key-here

Migration from Azure AI Inference SDK

2 simple steps:

  1. pip install azure-ai-inference-plus

  2. Change your import:

    # Before
    from azure.ai.inference import ChatCompletionsClient
    from azure.ai.inference.models import SystemMessage, UserMessage
    from azure.core.credentials import AzureKeyCredential
    
    # After
    from azure_ai_inference_plus import ChatCompletionsClient, SystemMessage, UserMessage, AzureKeyCredential
    

That's it! Your existing code works unchanged with automatic retries and JSON validation.

Manual Credential Setup

from azure_ai_inference_plus import ChatCompletionsClient, AzureKeyCredential

client = ChatCompletionsClient(
    endpoint="https://your-resource.services.ai.azure.com/models",
    credential=AzureKeyCredential("your-api-key")
)

Examples

Check out the examples/ directory for complete demonstrations:

All examples show real-world usage patterns and advanced features.

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure_ai_inference_plus-1.0.0.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

azure_ai_inference_plus-1.0.0-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file azure_ai_inference_plus-1.0.0.tar.gz.

File metadata

  • Download URL: azure_ai_inference_plus-1.0.0.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for azure_ai_inference_plus-1.0.0.tar.gz
Algorithm Hash digest
SHA256 437913593ac6644a15d0788b5d9c6e8eb59be50dcd24ca28f594fb22b1c650ed
MD5 e51ae595e58f3518e06d6417f7d944e8
BLAKE2b-256 effa6eb0aba71b4a13d914baa5a82e6eaa2efa9a39b3b5546b6127e202d136b0

See more details on using hashes here.

File details

Details for the file azure_ai_inference_plus-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for azure_ai_inference_plus-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f36436879a4d0a0aaba36dc05b1734c14f975d56979c6d34c9c1fc9187d80d6f
MD5 0d7f5ded0931068737f2651614082a11
BLAKE2b-256 beebb00f83f727de764f22f5c9cba3111ba4b057fe4fb29aa4d19bb157f03518

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page