Skip to main content

Drop-in gymnasium replacement for training hybrid LLM+ML agents through games and neuroevolution

Project description

npc-gym

A drop-in gymnasium replacement for training hybrid LLM+ML agents through games, RL environments, and neuroevolution.

import npc_gym as gym

# Use npc-gym environments
env = gym.make("SlimeVolley-v1")
obs = env.reset()
obs, reward, done, info = env.step([1, 0, 1])  # forward + jump

# Or use any existing gymnasium environment — falls through automatically
env = gym.make("CartPole-v1")
pip install -e .

What's Included

Everything gymnasium has, plus multi-agent games, partial information, LLM agents, and evolutionary training.

Spaces

import npc_gym as gym

gym.Box(-1, 1, shape=(12,))      # Continuous
gym.Discrete(n=5)                 # Discrete
gym.MultiBinary(3)                # Binary vectors
gym.MultiDiscrete([3, 4, 5])      # Multi-dim discrete
gym.Text(max_length=1024)         # Natural language
gym.Dict({"obs": gym.Box(0, 1, shape=(4,)), "text": gym.Text()})

Wrappers

env = gym.make("SlimeVolley-v1")
env = gym.TimeLimit(env, max_episode_steps=3000)
env = gym.ClipReward(env, min_r=-1, max_r=1)
env = gym.FlattenObservation(env)

# Base classes for custom wrappers
class MyWrapper(gym.ObservationWrapper):
    def observation(self, obs):
        return normalize(obs)

Registration

# Register custom environments
gym.register("MyEnv-v0", entry_point="my_module:MyEnv", max_steps=1000)
env = gym.make("MyEnv-v0")

# List all available
print(gym.list_envs())

Environments

SlimeVolley-v1 — Physics-Based RL

Direct port of hardmaru/slimevolleygym with identical physics, coordinate system, and baseline AI. Train agents via NEAT neuroevolution or any RL algorithm.

from npc_gym.envs.slime_volleyball import SlimeVolleyEnv, BaselinePolicy
from npcpy.ft.neat import NEATEvolver, NEATConfig, NEATNetwork
from npcpy.ft.engine import get_engine

# Play against the original 120-param RNN baseline
env = SlimeVolleyEnv()  # default opponent = BaselinePolicy

# Or self-play
env = SlimeVolleyEnv(self_play=True)
obs, reward, done, info = env.step(action_right, action_left)

# Train with NEAT (npcpy) — specify compute backend
evolver = NEATEvolver(
    input_size=12, output_size=3,
    config=NEATConfig(population_size=200),
    engine="numpy",  # or "jax", "mlx", "cuda"
)

def fitness(network):
    obs = env.reset(); done = False; total = 0
    while not done:
        raw = network.activate(obs)
        action = [1 if raw[i] > 0 else 0 for i in range(3)]
        obs, reward, done, info = env.step(action)
        total += reward
    return total

best_genome = evolver.run(fitness, generations=500)

InfoPoker-v1 — Partial Information Decomposition

Text chunked into "cards" dealt to players. Players form hypotheses from partial info and bet on confidence.

env = gym.make("InfoPoker-v1",
    source_text="The transformer architecture uses self-attention to process "
                "sequences in parallel. Multi-head attention allows attending "
                "to different representation subspaces...",
    num_players=4,
)
observations, info = env.reset()

More Environments

Environment Type Description
SlimeVolley-v1 Physics/RL 2D volleyball, NEAT/RL training
InfoPoker-v1 Multi-agent Text decomposition poker
HypothesisBJ-v1 Multi-agent Hypothesis blackjack
Synthesis-v1 Multi-agent Debate tournament with synthesis
GridWorld-v1 Navigation Spatial nav with partial observability
Maze-v1 Navigation Limited-visibility maze
TicTacToe-v1 Competitive Classic board game
ConnectFour-v1 Competitive 7x6 board game
Pokemon-v1 Emulator Pokemon via PyBoy with vision

Training

NEAT Neuroevolution (via npcpy)

from npcpy.ft.neat import NEATEvolver, NEATConfig

# Multi-backend: numpy, jax, mlx, cuda
evolver = NEATEvolver(
    input_size=12, output_size=3,
    config=NEATConfig(
        population_size=200,
        add_node_rate=0.05,
        add_connection_rate=0.08,
        species_threshold=3.0,
    ),
    engine="mlx",  # Apple Silicon acceleration
)

best = evolver.run(fitness_fn, generations=500)

Genetic Model Evolution

Evolve ensembles of specialized LLM models through gameplay:

from npc_gym.training import TrainingLoop, TrainingConfig

config = TrainingConfig(
    env_class=InfoPoker,
    env_kwargs={"source_text": corpus},
    agent_class=HybridAgent,
    num_agents=4,
    num_epochs=10,
    games_per_epoch=100,
)
loop = TrainingLoop(config)
loop.run()

Trace Collection for DPO

Games produce traces that convert to preference pairs for DPO fine-tuning:

trace = env.get_trace()
pairs = trace.to_preference_pairs(min_reward_gap=0.2)
# [{"prompt": ..., "chosen": ..., "rejected": ..., "reward_gap": ...}]

Architecture

npc_gym/
├── core/
│   ├── env.py           # Base Environment (gymnasium-compatible)
│   ├── spaces.py        # Box, Discrete, MultiBinary, MultiDiscrete, Text, Card, Deck
│   ├── compat.py        # Gymnasium compatibility layer (make, register, wrappers)
│   ├── info.py          # Information structures (PID)
│   └── agent.py         # Agent classes (Random, LLM, Hybrid, NPC)
├── envs/
│   ├── slime_volleyball.py  # SlimeVolley with original physics + baseline AI
│   ├── card_game.py         # Base card game
│   ├── info_poker.py        # InfoPoker
│   ├── hypothesis_bj.py     # HypothesisBlackjack
│   ├── synthesis.py         # SynthesisTournament
│   ├── grid_world.py        # GridWorld, Maze, ItemCollector
│   ├── tictactoe.py         # TicTacToe, ConnectFour
│   └── emulator/            # Game emulator environments
├── training/
│   ├── loop.py          # Training orchestrator
│   ├── traces.py        # Trace collection
│   └── evolution.py     # Genetic model evolution
├── streaming/           # Real-time text processing
├── analytics/           # Metrics and visualization
└── rendering/           # Web visualization server

Integration with npcpy

npc-gym builds on npcpy:

  • NEAT neuroevolution (npcpy.ft.neat) — evolve neural network topologies
  • Compute engines (npcpy.ft.engine) — numpy, JAX, MLX, CUDA backends
  • LLM interactions (npcpy.llm_funcs) — multi-provider LLM calls
  • NPCArray mixtures (npcpy.npc_array) — ensemble inference, voting, consensus
  • Fine-tuning (npcpy.ft) — SFT, DPO, diffusion, genetic algorithms
  • NPC agents (npcpy.npc_compiler) — agent personas with tools and memory

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

npc_gym-0.1.1.tar.gz (113.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

npc_gym-0.1.1-py3-none-any.whl (132.9 kB view details)

Uploaded Python 3

File details

Details for the file npc_gym-0.1.1.tar.gz.

File metadata

  • Download URL: npc_gym-0.1.1.tar.gz
  • Upload date:
  • Size: 113.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for npc_gym-0.1.1.tar.gz
Algorithm Hash digest
SHA256 78352bcbd022496f808fc9301248b441cf0f53f7aa6949a2e95bd5664e7a1bbd
MD5 720d776f16ae6109167d899f2a66efb2
BLAKE2b-256 d15ec7702dab35483279cdc510c27c84b278cba6096644123c673ab23f0f8006

See more details on using hashes here.

File details

Details for the file npc_gym-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: npc_gym-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 132.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for npc_gym-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 49105b703121d95773d86706fbe1e54f0b7904fc4ac1e377cbb4e618ae65b0e0
MD5 104795549e95d0015c6e6ee685e521ed
BLAKE2b-256 dbc2a51ff68e5bf9c3d94d4a6f7bd840133d13cab4ee15b344cc30a5b24c9975

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page