Skip to main content

Drop-in replacement for ibm-watsonx-ai that fixes vLLM bugs with gpt-oss-120b/20b models

Project description

ibm-watsonx-ai-120b

A drop-in replacement for ibm-watsonx-ai that fixes all known issues with IBM's vLLM-hosted openai/gpt-oss-120b and openai/gpt-oss-20b models.

The Problem

IBM hosts OpenAI's gpt-oss models on WatsonX using vLLM, but the deployment has numerous bugs:

  • Tool calling doesn't work - tool_calls array is always empty
  • JSON schema mode is ignored - Model returns free text instead of JSON
  • Thinking leaks into output - reasoning_content appears without actual content
  • Streaming breaks with tools - Tool calls appear in wrong fields
  • Harmony tokens leak - Special tokens like <|channel|> appear in output

The Solution

Change one import and everything works:

# Before (broken)
from ibm_watsonx_ai.foundation_models import ModelInference

# After (fixed!)
from ibm_watsonx_ai_120b.foundation_models import ModelInference

# Your code stays exactly the same
model = ModelInference(
    model_id="openai/gpt-oss-120b",
    credentials=credentials,
    project_id=project_id
)

# Tool calling now works!
response = model.chat(
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }]
)

# JSON schema mode now works!
response = model.chat(
    messages=[{"role": "user", "content": "List 3 colors"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "colors",
            "schema": {
                "type": "object",
                "properties": {
                    "colors": {"type": "array", "items": {"type": "string"}}
                },
                "required": ["colors"]
            }
        }
    }
)

Installation

pip install ibm-watsonx-ai-120b

When IBM Fixes Their vLLM

Just change your import back:

# Fixed by IBM - just use the original!
from ibm_watsonx_ai.foundation_models import ModelInference

Your code stays exactly the same because we maintained full API compatibility.

What Gets Fixed

Feature Original Behavior With This Package
Tool Calling tool_calls=[] always Works correctly
JSON Schema Ignored, returns text Enforced and validated
Thinking Responses Empty content, only reasoning Automatically handled
Streaming + Tools Tools in wrong field Falls back to sync
Harmony Tokens Leak into output Stripped automatically
Null Content Crashes vLLM Converted to empty string

Configuration

from ibm_watsonx_ai_120b import Config

# Adjust retry behavior
Config.max_retries = 5

# Force non-streaming for tools (most reliable)
Config.streaming_tool_strategy = "fallback"

# Enable debug logging
Config.debug = True

Or via environment variables:

export WATSONX_120B_MAX_RETRIES=5
export WATSONX_120B_DISABLE_STREAMING=true
export WATSONX_120B_DEBUG=true

How It Works

The package wraps ibm-watsonx-ai and applies fixes through an adapter pipeline:

  1. MessageAdapter - Fixes null content and tool role issues
  2. ToolAdapter - Emulates tool calling via prompt injection
  3. JSONAdapter - Emulates JSON schema via prompt injection
  4. ThinkingAdapter - Handles reasoning-only responses
  5. HarmonyAdapter - Strips leaked special tokens
  6. StreamAdapter - Handles streaming quirks

Everything else passes through unchanged to the original library.

Requirements

  • Python 3.9+
  • ibm-watsonx-ai >= 1.0.0

Documentation

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

MIT License - See LICENSE for details.

Acknowledgments

This package was developed to centralize workarounds originally implemented in:

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ibm_watsonx_ai_120b-0.1.0.tar.gz (67.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ibm_watsonx_ai_120b-0.1.0-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file ibm_watsonx_ai_120b-0.1.0.tar.gz.

File metadata

  • Download URL: ibm_watsonx_ai_120b-0.1.0.tar.gz
  • Upload date:
  • Size: 67.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for ibm_watsonx_ai_120b-0.1.0.tar.gz
Algorithm Hash digest
SHA256 970bed884c55314536a12b083445f70dfcd07f71c9455c14e27c7aa507a5584a
MD5 7f21ff59766e0cce03a68efd04ce4556
BLAKE2b-256 646e5a7738f5c8bf1d7111fc113102c3819c83b19a5024d170c8fc188f7e8702

See more details on using hashes here.

File details

Details for the file ibm_watsonx_ai_120b-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ibm_watsonx_ai_120b-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0966a7ef01dafd1b4e1f33d47bbcbc8a4d9fb9770caf9c59f2ac998ee544bf41
MD5 005a5edad1cfe36f057dec4ac719d1fb
BLAKE2b-256 a4382253ff110b15cfa28005998538af8c60f1462612c3b4bff6e1661094200f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page