Drop-in replacement for ibm-watsonx-ai that fixes vLLM bugs with gpt-oss-120b/20b models
Project description
ibm-watsonx-ai-120b
A drop-in replacement for ibm-watsonx-ai that fixes all known issues with IBM's vLLM-hosted openai/gpt-oss-120b and openai/gpt-oss-20b models.
The Problem
IBM hosts OpenAI's gpt-oss models on WatsonX using vLLM, but the deployment has numerous bugs:
- Tool calling doesn't work -
tool_callsarray is always empty - JSON schema mode is ignored - Model returns free text instead of JSON
- Thinking leaks into output -
reasoning_contentappears without actualcontent - Streaming breaks with tools - Tool calls appear in wrong fields
- Harmony tokens leak - Special tokens like
<|channel|>appear in output
The Solution
Change one import and everything works:
# Before (broken)
from ibm_watsonx_ai.foundation_models import ModelInference
# After (fixed!)
from ibm_watsonx_ai_120b.foundation_models import ModelInference
# Your code stays exactly the same
model = ModelInference(
model_id="openai/gpt-oss-120b",
credentials=credentials,
project_id=project_id
)
# Tool calling now works!
response = model.chat(
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}]
)
# JSON schema mode now works!
response = model.chat(
messages=[{"role": "user", "content": "List 3 colors"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "colors",
"schema": {
"type": "object",
"properties": {
"colors": {"type": "array", "items": {"type": "string"}}
},
"required": ["colors"]
}
}
}
)
Installation
pip install ibm-watsonx-ai-120b
When IBM Fixes Their vLLM
Just change your import back:
# Fixed by IBM - just use the original!
from ibm_watsonx_ai.foundation_models import ModelInference
Your code stays exactly the same because we maintained full API compatibility.
What Gets Fixed
| Feature | Original Behavior | With This Package |
|---|---|---|
| Tool Calling | tool_calls=[] always |
Works correctly |
| JSON Schema | Ignored, returns text | Enforced and validated |
| Thinking Responses | Empty content, only reasoning | Automatically handled |
| Streaming + Tools | Tools in wrong field | Falls back to sync |
| Harmony Tokens | Leak into output | Stripped automatically |
| Null Content | Crashes vLLM | Converted to empty string |
Configuration
from ibm_watsonx_ai_120b import Config
# Adjust retry behavior
Config.max_retries = 5
# Force non-streaming for tools (most reliable)
Config.streaming_tool_strategy = "fallback"
# Enable debug logging
Config.debug = True
Or via environment variables:
export WATSONX_120B_MAX_RETRIES=5
export WATSONX_120B_DISABLE_STREAMING=true
export WATSONX_120B_DEBUG=true
How It Works
The package wraps ibm-watsonx-ai and applies fixes through an adapter pipeline:
- MessageAdapter - Fixes null content and tool role issues
- ToolAdapter - Emulates tool calling via prompt injection
- JSONAdapter - Emulates JSON schema via prompt injection
- ThinkingAdapter - Handles reasoning-only responses
- HarmonyAdapter - Strips leaked special tokens
- StreamAdapter - Handles streaming quirks
Everything else passes through unchanged to the original library.
Requirements
- Python 3.9+
ibm-watsonx-ai >= 1.0.0
Documentation
- ARCHITECTURE.md - Technical design and issue catalog
- TASKS.md - Development roadmap
Contributing
Contributions welcome! See CONTRIBUTING.md for guidelines.
License
MIT License - See LICENSE for details.
Acknowledgments
This package was developed to centralize workarounds originally implemented in:
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ibm_watsonx_ai_120b-0.1.0.tar.gz.
File metadata
- Download URL: ibm_watsonx_ai_120b-0.1.0.tar.gz
- Upload date:
- Size: 67.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
970bed884c55314536a12b083445f70dfcd07f71c9455c14e27c7aa507a5584a
|
|
| MD5 |
7f21ff59766e0cce03a68efd04ce4556
|
|
| BLAKE2b-256 |
646e5a7738f5c8bf1d7111fc113102c3819c83b19a5024d170c8fc188f7e8702
|
File details
Details for the file ibm_watsonx_ai_120b-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ibm_watsonx_ai_120b-0.1.0-py3-none-any.whl
- Upload date:
- Size: 29.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0966a7ef01dafd1b4e1f33d47bbcbc8a4d9fb9770caf9c59f2ac998ee544bf41
|
|
| MD5 |
005a5edad1cfe36f057dec4ac719d1fb
|
|
| BLAKE2b-256 |
a4382253ff110b15cfa28005998538af8c60f1462612c3b4bff6e1661094200f
|