Skip to main content

Reinforcement Learning Bridge - connects LLMs to RL environments, automates language-based problem construction, and derives instructions for long-term goals without user supervision

Project description

Reinforcement Learning Bridge (RL Bridge)

Reinforcement Learning Bridge connects LLMs to reinforcement learning environments. It ships as an MCP plugin compatible with Claude Code, Claude Desktop, LM Studio, Cursor, Windsurf, Codex CLI, and OpenCode, and uniquely automates how LLMs construct RL problems in language - building environments, translating observations, matching goals to sub-goals, and training agents to complete instructions without user supervision.


Architecture

┌──────────────────────────────────────────────────────┐
│  Claude Code                                         │
│                                                      │
│  "Run 100 steps of CartPole with a random policy"    │
│         │                                            │
│   MCP layer (stdio)                                  │
└──────────┼───────────────────────────────────────────┘
           │
┌──────────▼───────────────────────────────────────────┐
│  RL Bridge MCP Plugin  (rlip.mcp_plugin)             │
│  FastMCP tools:  rl_create · rl_reset · rl_step      │
│                 rl_render · rl_close · rl_run_episode│
│         │                                            │
│   In-process dispatcher                              │
└──────────┼───────────────────────────────────────────┘
           │           ╌╌ OR ╌╌ (RLIP_SERVER_URL)
┌──────────▼───────────────────────────────────────────┐
│  RL Bridge Server (HTTP/JSON-RPC 2.0)                │
│  POST /rpc  ·  GET /environments  ·  GET /health     │
└──────────┬───────────────────────────────────────────┘
           │
┌──────────▼───────────────────────────────────────────┐
│  Environment Registry + Session Manager              │
│  ┌───────────────┐  ┌───────────────┐                │
│  │ CartPole-v1   │  │ LunarLander   │  + any custom  │
│  │ (Gymnasium)   │  │ (Gymnasium)   │    env …       │
│  └───────────────┘  └───────────────┘                │
└──────────────────────────────────────────────────────┘

Quick Start

1. Install

From PyPI (recommended):

pip install rlbridge

# Classic Gymnasium envs for examples (CartPole, etc.)
pip install "rlbridge[examples]"

With uv:

uv pip install rlbridge
# or, in a uv-managed project:
uv add rlbridge
uv add "rlbridge[examples]"

From source (development):

git clone https://github.com/pdfosborne/RL-IP
cd RL-IP
pip install -e ".[dev,examples]"
# or: uv sync --extra dev --extra examples

2. Add to your AI tool

Run the command for whichever tool(s) you use, then restart the client:

# Claude Code (CLI)  → ~/.claude.json
rlip install-claude

# Claude Desktop (GUI app)
#   macOS   → ~/Library/Application Support/Claude/claude_desktop_config.json
#   Windows → %APPDATA%\Claude\claude_desktop_config.json
#   Linux   → ~/.config/Claude/claude_desktop_config.json
rlip install-claude-desktop

# LM Studio  → ~/.lmstudio/mcp.json  (Linux)
#              ~/Library/Application Support/LM Studio/mcp.json  (macOS)
#              %APPDATA%\LM Studio\mcp.json  (Windows)
rlip install-lmstudio

# Cursor  → ~/.cursor/mcp.json
rlip install-cursor

# Windsurf  → ~/.codeium/windsurf/mcp_config.json
rlip install-windsurf

# Codex CLI  → ~/.codex/config.toml
rlip install-codex

# OpenCode   → ~/.config/opencode/config.json
rlip install-opencode

All commands accept --use-script (uses the rlip-mcp console script instead of python -m) and --config-path to override the default config location.

3. Use it

Open Claude Code and ask:

"Run a CartPole episode with a random policy and show me the total reward."

Claude will call rl_create, rl_reset, rl_step (in a loop), and rl_close automatically.


Available MCP Tools

Environment control

Tool Description
rl_list_environments Browse all registered RL environments
rl_create Create an environment instance
rl_reset Reset an instance, get the initial observation
rl_step Execute one action, get (obs, reward, terminated, truncated, info)
rl_sample_action Sample a random valid action
rl_spaces Inspect observation and action space details
rl_render Render current state (PNG or ASCII)
rl_close Destroy an instance
rl_list_instances List all active instances
rl_run_episode Run a complete episode in one call

Custom environment builder

Build and register new environments—wrapping any Gymnasium env—with custom metadata, local caching, and language translation.

Tool Description
rl_build_environment Wrap a Gymnasium env with custom ID, description, and tags; cache to ~/.rlip/envs/ and write to user catalog
rl_list_cached_environments Browse previously built environments stored in the local cache
rl_load_cached_environments Re-register all cached environments at session start

Example workflow in Claude Code:

"Build me a FrozenLake environment called FrozenLake-Custom-v0 with tags grid and discrete."

rl_build_environment("FrozenLake-Custom-v0", "FrozenLake-v1",
                     description="Custom FrozenLake.", tags="grid,discrete")

Language translation

Map raw environment observations to natural-language descriptions. Required for instruction-following and sub-goal reward shaping.

Tool Description
rl_sample_states_for_translation Randomly explore an environment and display raw observed states so a translate() function can be written
rl_set_translator_code Compile, validate, and install a Python translate() function; saves it to ~/.rlip/envs/<env_id>/translator.py so it reloads automatically
rl_translate_state Test a single state → natural-language description round-trip

Example workflow in Claude Code:

"Sample some states from FrozenLake-Custom-v0 so I can write a translator."

rl_sample_states_for_translation("FrozenLake-Custom-v0", n_samples=20)

"Install this translator:"

def translate(state, *, legal_moves=None, action_history=None):
    row, col = divmod(int(state), 4)
    labels = {0: "start", 15: "goal"}
    return f"Agent at row {row}, column {col}. {labels.get(state, '')}"
rl_set_translator_code("FrozenLake-Custom-v0", python_code="...")
rl_translate_state("FrozenLake-Custom-v0", state="15")

The installed translator is automatically used by rl_match_instruction and rl_train_agent with sub-goal shaping.

Instruction-following

Tool Description
rl_match_instruction Explore an environment, translate states to language, and find the state best matching a natural-language goal
rl_instruction_run_episode Run a shaped episode where the matched state provides a bonus reward signal

For semantic matching with Hugging Face sentence-transformers models:

pip install -e "."[sentence-transformers]

Then choose a model directly in the tool call:

# Uses the default model for sentence-transformers
rl_match_instruction("Sailing-v0", "sail toward the beach", encoder="sentence-transformers")

# Uses a specific Hugging Face model id
rl_match_instruction(
    "Sailing-v0",
    "sail toward the beach",
    encoder="sentence",
    encoder_model="BAAI/bge-small-en-v1.5",
)

RL agent training

Tool Description
rl_list_agents See available agent types (tabular_q, dqn, ppo) with guidance on when to use each
rl_train_agent Train an agent; optionally combine with a match_id for instruction-shaped rewards
rl_run_agent_episode Evaluate a trained agent for one greedy episode
rl_render_policy Render the best training episode as an animated GIF

Local LLM policy agent (Python API)

RL Bridge also includes a direct local_llm policy agent for action selection without gradient training.

from rlip.environments.registry import registry
from rlip.language_translation import get_translator
from rlip.rl_agents import LocalLLMAgent

env = registry.get("Sailing-v0").create()
translator = get_translator("Sailing-v0")

agent = LocalLLMAgent(base_url="http://localhost:11434/v1", model="llama3.1")

obs = env.reset().observation
obs_text = translator.translate(obs) if translator else str(obs)
action = agent.choose_action(obs_text, action_space=env.action_space)
step = env.step(action)

Use this agent primarily with language-translated observations. Calling a model for every environment action can be expensive in long episodes, so account for per-step latency and token cost.


Full Example: New Environment from Scratch

# 1. Build and cache the environment
rl_build_environment(
    env_id="FrozenLake-Custom-v0",
    gym_env_id="FrozenLake-v1",
    description="4×4 frozen lake grid-world.",
    tags="grid,discrete",
    namespace="custom",
)

# 2. Sample states to understand the observation format
rl_sample_states_for_translation("FrozenLake-Custom-v0", n_samples=16)
# → STATE 1: 0  STATE 2: 1  STATE 3: 5  …

# 3. Install a translator
rl_set_translator_code("FrozenLake-Custom-v0", python_code="""
def translate(state, *, legal_moves=None, action_history=None):
    row, col = divmod(int(state), 4)
    cell = {0: "start (S)", 5: "hole (H)", 10: "hole (H)", 15: "goal (G)"}.get(state, "frozen (F)")
    return f"Agent is at row {row}, column {col} - {cell}."
""")

# 4. Match an instruction to a goal state
rl_match_instruction("FrozenLake-Custom-v0", "reach the goal")
# → match_id: abc123

# 5. Train an agent with sub-goal shaping
rl_train_agent("dqn", "FrozenLake-Custom-v0", n_episodes=500, match_id="abc123")
# → agent_id: def456

# 6. Render the result
rl_render_policy("FrozenLake-Custom-v0", agent_id="def456")

Standalone HTTP Server

Run RL Bridge as a standalone service (useful for multi-process workflows or connecting non-Python agents):

rlip server --port 8765

Then send JSON-RPC requests:

curl -X POST http://localhost:8765/rpc \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": "1",
    "method": "rlip/environment/create",
    "params": {"env_id": "CartPole-v1"}
  }'

Browse the auto-generated API docs at http://localhost:8765/docs.


Remote RL Bridge Server from the MCP Plugin

# Point the plugin at a remote server instead of running envs in-process
RLIP_SERVER_URL=http://my-gpu-machine:8765 rlip mcp

Custom Environments (Python API)

For programmatic use, the EnvironmentBuilder Python API mirrors the MCP tools:

from rlip.environments.builder import EnvironmentBuilder, load_cached_environments

# Build, cache, and register in one call
built = (
    EnvironmentBuilder("FrozenLake-Custom-v0")
    .from_gymnasium("FrozenLake-v1")
    .with_metadata(description="Custom FrozenLake.", tags=["grid", "discrete"])
    .build()   # auto_register=True, update_catalog=True by default
)

# Attach a hand-written translator
built.translator = ...   # any LanguageTranslator instance
built.register(register_translator=True)

# Reload all cached environments at startup
load_cached_environments()

For a fully custom (non-Gymnasium) environment, subclass the ABCs directly:

from rlip.environments.base import RLIPEnvironment, RLIPEnvironmentFactory
from rlip.environments.registry import registry
from rlip.protocol.messages import DiscreteSpace, EnvironmentInfo, ResetResult, StepResult, RenderResult

class MyEnv(RLIPEnvironment):
    def reset(self, seed=None, options=None) -> ResetResult: ...
    def step(self, action) -> StepResult: ...
    def close(self): ...
    @property
    def observation_space(self): return DiscreteSpace(n=10)
    @property
    def action_space(self): return DiscreteSpace(n=4)
    def render(self) -> RenderResult: ...

class MyFactory(RLIPEnvironmentFactory):
    @property
    def env_info(self) -> EnvironmentInfo:
        return EnvironmentInfo(env_id="MyEnv-v0", description="My custom env", namespace="custom")
    def create(self, render_mode=None, **kwargs) -> MyEnv:
        return MyEnv()

registry.register(MyFactory())

Third-Party Environment Plugins

RL Bridge can load additional environments from separately installed pip packages. Packages register factories via the rlip.environments entry-point group and optional MCP tools via rlip.environment_mcp_tools.

Example: Flesh and Blood TCG environments live in the flesh-and-blood-rlip package (not bundled with RL Bridge):

# Install RL Bridge from PyPI
pip install rlbridge

# Then install the FaB plugin from GitHub
pip install git+https://github.com/pdfosborne/flesh-and-blood-rlip.git

After installation, environments appear in the registry automatically:

from rlip.environments.registry import registry

env = registry.create("FleshAndBlood-Talishar-v0", format="silver_age")

To publish your own plugin, add to pyproject.toml:

[project.entry-points."rlip.environments"]
my-env = "my_package:register_environments"

[project.entry-points."rlip.environment_mcp_tools"]
my-env = "my_package:register_mcp_tools"

Each callable receives registry= (environments) or mcp=, registry=, log= (MCP tools) and returns the number of items registered.


Language Translation (Python API)

from rlip.language_translation.generator import TranslatorGenerator, build_translator

def my_llm(prompt: str) -> str:
    ...  # wrap any LLM provider

# Option A: two-stage LLM pipeline (describe → synthesise rules)
translator = build_translator(
    env, llm_fn=my_llm,
    env_context="4×4 grid, state = integer 0–15.",
    n_samples=20,
)

# Option B: hand-written rules with LLM fallback
from rlip.language_translation.generator import GeneratedTranslator

translator = GeneratedTranslator(llm_fn=my_llm, env_id="MyEnv-v0")
translator.rule_code = """
def translate(state, *, legal_moves=None, action_history=None):
    row, col = divmod(int(state), 4)
    return f"row {row}, col {col}"
"""

# Save as a standalone module (no LLM dependency at runtime)
translator.save_code("my_env_translator.py")

Generated translators fall back to live LLM calls for states the rule function cannot handle, cache those answers, and auto-refine the rules once the cache grows past refine_threshold.


Proxy Mode (MCP ↔ Remote RL Bridge)

┌─────────────┐   stdio/MCP   ┌───────────────────┐   HTTP/JSON-RPC   ┌──────────────┐
│ Claude Code │ ────────────► │  RL Bridge Plugin │ ────────────────► │ RL Bridge    │
│             │               │ (thin proxy)      │                   │ Server       │
└─────────────┘               └───────────────────┘                   └──────────────┘

Set RLIP_SERVER_URL to enable proxy mode.


Protocol

See docs/protocol_spec.md for the full JSON-RPC protocol specification, including all method signatures, space encodings, error codes, and the session lifecycle diagram.


Project Layout

src/rlip/
├── __init__.py
├── __main__.py              # CLI (rlip server / rlip install-claude / rlip install-claude-desktop / rlip install-lmstudio / rlip install-cursor / rlip install-windsurf / rlip install-codex / rlip install-opencode / …)
├── protocol/
│   ├── constants.py         # Method names, error codes
│   └── messages.py          # Pydantic message models
├── environments/
│   ├── base.py              # RLIPEnvironment / RLIPEnvironmentFactory ABCs
│   ├── builder.py           # EnvironmentBuilder - fluent API for custom envs
│   ├── gymnasium_adapter.py # Gymnasium wrapper
│   ├── registry.py          # EnvironmentRegistry singleton
│   └── utils.py             # NumPy ↔ JSON helpers, space serialisation
├── instruction_matching/
│   ├── base.py              # BaseEncoder ABC
│   ├── tfidf.py             # TFIDFEncoder (default)
│   ├── bm25.py              # BM25Encoder
│   └── sentence_transformer.py  # SentenceEncoder (optional dep)
├── language_translation/
│   ├── base.py              # LanguageTranslator ABC
│   ├── caching.py           # CachingTranslator + translation cache
│   ├── generator.py         # TranslatorGenerator / GeneratedTranslator (LLM)
│   └── sailing.py           # Built-in Sailing translator
├── server/
│   ├── dispatcher.py        # Transport-agnostic JSON-RPC dispatcher
│   ├── rlip_server.py       # FastAPI HTTP server
│   ├── session.py           # Session / instance manager
│   └── exceptions.py        # RLIPError
├── transport/
│   ├── stdio_transport.py   # Stdio (newline-delimited JSON-RPC)
│   └── http_client.py       # Sync + async HTTP clients
└── mcp_plugin/
    └── plugin.py            # FastMCP MCP plugin for Claude Code
examples/
├── cartpole_http.py         # HTTP client episode example
├── custom_env.py            # Registering a custom environment
└── in_process_usage.py      # Using RL Bridge without a server
docs/
└── protocol_spec.md         # Full JSON-RPC protocol specification

Console scripts

Command Entry point Purpose
rlip rlip.__main__:app CLI (server, MCP, install-*, catalog, …)
rlip-mcp rlip.mcp_plugin.plugin:main MCP stdio plugin for AI tools
rlip-server rlip.__main__:server_app Standalone HTTP JSON-RPC server

Optional dependency extras

Extra Install Use case
examples / envs-classic pip install "rlbridge[examples]" CartPole and other classic-control envs
envs-box2d pip install "rlbridge[envs-box2d]" LunarLander, etc.
envs-atari pip install "rlbridge[envs-atari]" Atari games
envs-mujoco pip install "rlbridge[envs-mujoco]" MuJoCo envs
envs-all pip install "rlbridge[envs-all]" All Gymnasium env groups
torch pip install "rlbridge[torch]" DQN / PPO training agents
sentence-transformers pip install "rlbridge[sentence-transformers]" Semantic instruction matching
openai-sdk pip install "rlbridge[openai-sdk]" Official OpenAI client (optional)
dev pip install "rlbridge[dev]" pytest, ruff, mypy, build, twine

Publishing (maintainers)

Build and upload to PyPI (test first on TestPyPI):

# Install build tools
pip install "rlbridge[dev]"
# or: uv sync --extra dev

# Bump version in pyproject.toml (single source of truth; __version__ reads it at runtime)

# Build sdist + wheel
python -m build
# or: uv build

# Upload (requires PyPI account + API token)
twine upload dist/*
# or: uv publish

User install after release:

pip install rlbridge
uv add rlbridge

Note: The PyPI distribution name is rlbridge. The Python import package is rlip. Console commands remain rlip, rlip-mcp, and rlip-server after install.

Manual steps before first release: create a PyPI account, generate an API token, and tag releases on GitHub.


License

Apache License 2.0 - see LICENSE.

The framework used in this work is currently patent pending with the US Patent and Trademark Office (18/955718).

Cite

Please use the following to cite this work

@phdthesis{OsborneThesis2024,
  title        = {Improving Real-World Reinforcement Learning by Self Completing Human Instructions on Rule Defined Language},  
  author       = {Philip Osborne},  
  year         = 2024,  
  month        = {August},  
  address      = {Manchester, UK},  
  note         = {Available at \url{https://research.manchester.ac.uk/en/studentTheses/improving-real-world-reinforcement-learning-by-self-completing-hu}},  
  school       = {The University of Manchester},  
  type         = {PhD thesis}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rlbridge-0.1.1.tar.gz (267.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rlbridge-0.1.1-py3-none-any.whl (313.0 kB view details)

Uploaded Python 3

File details

Details for the file rlbridge-0.1.1.tar.gz.

File metadata

  • Download URL: rlbridge-0.1.1.tar.gz
  • Upload date:
  • Size: 267.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for rlbridge-0.1.1.tar.gz
Algorithm Hash digest
SHA256 999e6b1c7ec47757ff6cb99b222089bee3b656c2b4f74a2ca2af57230bcc7612
MD5 7973b6aa149763cb72b8b490df755ef7
BLAKE2b-256 ea2086c9f9b89a8f63f7b8977851e653a45ab44b6af366540e2a65f33cb98f1d

See more details on using hashes here.

File details

Details for the file rlbridge-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: rlbridge-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 313.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for rlbridge-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 103c9fdd53a329f55991d39bb0684c7880706be3c23565ea45761fecf3e49efe
MD5 5bb157fdeead94023159c129bbcfaa95
BLAKE2b-256 9bbbbac9a33b62f8de5743ff28087892e803b2e7a7d454ada0bec65ae4991b10

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page