Skip to main content

Drop-in replacement for ibm-watsonx-ai that fixes vLLM bugs with gpt-oss-120b/20b models

Project description

ibm-watsonx-ai-120b

A drop-in replacement for ibm-watsonx-ai that fixes all known issues with IBM's vLLM-hosted openai/gpt-oss-120b and openai/gpt-oss-20b models.

The Problem

IBM hosts OpenAI's gpt-oss models on WatsonX using vLLM, but the deployment has numerous bugs:

  • Tool calling doesn't work - tool_calls array is always empty
  • JSON schema mode is ignored - Model returns free text instead of JSON
  • Thinking leaks into output - reasoning_content appears without actual content
  • Streaming breaks with tools - Tool calls appear in wrong fields
  • Harmony tokens leak - Special tokens like <|channel|> appear in output

The Solution

Change one import and everything works:

# Before (broken)
from ibm_watsonx_ai.foundation_models import ModelInference

# After (fixed!)
from ibm_watsonx_ai_120b.foundation_models import ModelInference

# Your code stays exactly the same
model = ModelInference(
    model_id="openai/gpt-oss-120b",
    credentials=credentials,
    project_id=project_id
)

# Tool calling now works!
response = model.chat(
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }]
)

# JSON schema mode now works!
response = model.chat(
    messages=[{"role": "user", "content": "List 3 colors"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "colors",
            "schema": {
                "type": "object",
                "properties": {
                    "colors": {"type": "array", "items": {"type": "string"}}
                },
                "required": ["colors"]
            }
        }
    }
)

Installation

pip install ibm-watsonx-ai-120b

When IBM Fixes Their vLLM

Just change your import back:

# Fixed by IBM - just use the original!
from ibm_watsonx_ai.foundation_models import ModelInference

Your code stays exactly the same because we maintained full API compatibility.

What Gets Fixed

Feature Original Behavior With This Package
Tool Calling tool_calls=[] always Works correctly
JSON Schema Ignored, returns text Enforced and validated
Thinking Responses Empty content, only reasoning Automatically handled
Streaming + Tools Tools in wrong field Falls back to sync
Harmony Tokens Leak into output Stripped automatically
Null Content Crashes vLLM Converted to empty string

Configuration

from ibm_watsonx_ai_120b import Config

# Adjust retry behavior
Config.max_retries = 5

# Force non-streaming for tools (most reliable)
Config.streaming_tool_strategy = "fallback"

# Enable debug logging
Config.debug = True

Or via environment variables:

export WATSONX_120B_MAX_RETRIES=5
export WATSONX_120B_DISABLE_STREAMING=true
export WATSONX_120B_DEBUG=true

How It Works

The package wraps ibm-watsonx-ai and applies fixes through an adapter pipeline:

  1. MessageAdapter - Fixes null content and tool role issues
  2. ToolAdapter - Emulates tool calling via prompt injection
  3. JSONAdapter - Emulates JSON schema via prompt injection
  4. ThinkingAdapter - Handles reasoning-only responses
  5. HarmonyAdapter - Strips leaked special tokens
  6. StreamAdapter - Handles streaming quirks

Everything else passes through unchanged to the original library.

Requirements

  • Python 3.9+
  • ibm-watsonx-ai >= 1.0.0

Documentation

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

License

MIT License - See LICENSE for details.

Acknowledgments

This package was developed to centralize workarounds originally implemented in:

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ibm_watsonx_ai_120b-0.1.2.tar.gz (69.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ibm_watsonx_ai_120b-0.1.2-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file ibm_watsonx_ai_120b-0.1.2.tar.gz.

File metadata

  • Download URL: ibm_watsonx_ai_120b-0.1.2.tar.gz
  • Upload date:
  • Size: 69.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for ibm_watsonx_ai_120b-0.1.2.tar.gz
Algorithm Hash digest
SHA256 44ffd2e1afb12a5fbba62d9cfef43ce97a194d9a57b898da3743ebadcb695f0a
MD5 3550e10d314627e5cbed97e1b41245bc
BLAKE2b-256 cbf37b50dc4f4c02c7ce1860ea231f1ce3ea33e68321f1541ea2023b45976bbb

See more details on using hashes here.

File details

Details for the file ibm_watsonx_ai_120b-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for ibm_watsonx_ai_120b-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 381c7888c4445e6087fe5838ad0496b71c3c1c49ff994d9a46f78d1351c4d530
MD5 0906a0b49d1814dd2b87c2bdd1d31cf6
BLAKE2b-256 ab8c9267689f4a48fbd1381eae3f628870477194675bd2494affe3949816993f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page