Drop-in gymnasium replacement for training hybrid LLM+ML agents through games and neuroevolution

These details have not been verified by PyPI

Project links

Homepage

Project description

npc-gym

A drop-in gymnasium replacement for training hybrid LLM+ML agents through games, RL environments, and neuroevolution.

import npc_gym as gym

# Use npc-gym environments
env = gym.make("SlimeVolley-v1")
obs = env.reset()
obs, reward, done, info = env.step([1, 0, 1])  # forward + jump

# Or use any existing gymnasium environment — falls through automatically
env = gym.make("CartPole-v1")

pip install -e .

What's Included

Everything gymnasium has, plus multi-agent games, partial information, LLM agents, and evolutionary training.

Spaces

import npc_gym as gym

gym.Box(-1, 1, shape=(12,))      # Continuous
gym.Discrete(n=5)                 # Discrete
gym.MultiBinary(3)                # Binary vectors
gym.MultiDiscrete([3, 4, 5])      # Multi-dim discrete
gym.Text(max_length=1024)         # Natural language
gym.Dict({"obs": gym.Box(0, 1, shape=(4,)), "text": gym.Text()})

Wrappers

env = gym.make("SlimeVolley-v1")
env = gym.TimeLimit(env, max_episode_steps=3000)
env = gym.ClipReward(env, min_r=-1, max_r=1)
env = gym.FlattenObservation(env)

# Base classes for custom wrappers
class MyWrapper(gym.ObservationWrapper):
    def observation(self, obs):
        return normalize(obs)

Registration

# Register custom environments
gym.register("MyEnv-v0", entry_point="my_module:MyEnv", max_steps=1000)
env = gym.make("MyEnv-v0")

# List all available
print(gym.list_envs())

Environments

SlimeVolley-v1 — Physics-Based RL

Direct port of hardmaru/slimevolleygym with identical physics, coordinate system, and baseline AI. Train agents via NEAT neuroevolution or any RL algorithm.

from npc_gym.envs.slime_volleyball import SlimeVolleyEnv, BaselinePolicy
from npcpy.ft.neat import NEATEvolver, NEATConfig, NEATNetwork
from npcpy.ft.engine import get_engine

# Play against the original 120-param RNN baseline
env = SlimeVolleyEnv()  # default opponent = BaselinePolicy

# Or self-play
env = SlimeVolleyEnv(self_play=True)
obs, reward, done, info = env.step(action_right, action_left)

# Train with NEAT (npcpy) — specify compute backend
evolver = NEATEvolver(
    input_size=12, output_size=3,
    config=NEATConfig(population_size=200),
    engine="numpy",  # or "jax", "mlx", "cuda"
)

def fitness(network):
    obs = env.reset(); done = False; total = 0
    while not done:
        raw = network.activate(obs)
        action = [1 if raw[i] > 0 else 0 for i in range(3)]
        obs, reward, done, info = env.step(action)
        total += reward
    return total

best_genome = evolver.run(fitness, generations=500)

InfoPoker-v1 — Partial Information Decomposition

Text chunked into "cards" dealt to players. Players form hypotheses from partial info and bet on confidence.

env = gym.make("InfoPoker-v1",
    source_text="The transformer architecture uses self-attention to process "
                "sequences in parallel. Multi-head attention allows attending "
                "to different representation subspaces...",
    num_players=4,
)
observations, info = env.reset()

More Environments

Environment	Type	Description
`SlimeVolley-v1`	Physics/RL	2D volleyball, NEAT/RL training
`InfoPoker-v1`	Multi-agent	Text decomposition poker
`HypothesisBJ-v1`	Multi-agent	Hypothesis blackjack
`Synthesis-v1`	Multi-agent	Debate tournament with synthesis
`GridWorld-v1`	Navigation	Spatial nav with partial observability
`Maze-v1`	Navigation	Limited-visibility maze
`TicTacToe-v1`	Competitive	Classic board game
`ConnectFour-v1`	Competitive	7x6 board game
`Pokemon-v1`	Emulator	Pokemon via PyBoy with vision

Training

NEAT Neuroevolution (via npcpy)

from npcpy.ft.neat import NEATEvolver, NEATConfig

# Multi-backend: numpy, jax, mlx, cuda
evolver = NEATEvolver(
    input_size=12, output_size=3,
    config=NEATConfig(
        population_size=200,
        add_node_rate=0.05,
        add_connection_rate=0.08,
        species_threshold=3.0,
    ),
    engine="mlx",  # Apple Silicon acceleration
)

best = evolver.run(fitness_fn, generations=500)

Genetic Model Evolution

Evolve ensembles of specialized LLM models through gameplay:

from npc_gym.training import TrainingLoop, TrainingConfig

config = TrainingConfig(
    env_class=InfoPoker,
    env_kwargs={"source_text": corpus},
    agent_class=HybridAgent,
    num_agents=4,
    num_epochs=10,
    games_per_epoch=100,
)
loop = TrainingLoop(config)
loop.run()

Trace Collection for DPO

Games produce traces that convert to preference pairs for DPO fine-tuning:

trace = env.get_trace()
pairs = trace.to_preference_pairs(min_reward_gap=0.2)
# [{"prompt": ..., "chosen": ..., "rejected": ..., "reward_gap": ...}]

Architecture

npc_gym/
├── core/
│   ├── env.py           # Base Environment (gymnasium-compatible)
│   ├── spaces.py        # Box, Discrete, MultiBinary, MultiDiscrete, Text, Card, Deck
│   ├── compat.py        # Gymnasium compatibility layer (make, register, wrappers)
│   ├── info.py          # Information structures (PID)
│   └── agent.py         # Agent classes (Random, LLM, Hybrid, NPC)
├── envs/
│   ├── slime_volleyball.py  # SlimeVolley with original physics + baseline AI
│   ├── card_game.py         # Base card game
│   ├── info_poker.py        # InfoPoker
│   ├── hypothesis_bj.py     # HypothesisBlackjack
│   ├── synthesis.py         # SynthesisTournament
│   ├── grid_world.py        # GridWorld, Maze, ItemCollector
│   ├── tictactoe.py         # TicTacToe, ConnectFour
│   └── emulator/            # Game emulator environments
├── training/
│   ├── loop.py          # Training orchestrator
│   ├── traces.py        # Trace collection
│   └── evolution.py     # Genetic model evolution
├── streaming/           # Real-time text processing
├── analytics/           # Metrics and visualization
└── rendering/           # Web visualization server

Integration with npcpy

npc-gym builds on npcpy:

NEAT neuroevolution (npcpy.ft.neat) — evolve neural network topologies
Compute engines (npcpy.ft.engine) — numpy, JAX, MLX, CUDA backends
LLM interactions (npcpy.llm_funcs) — multi-provider LLM calls
NPCArray mixtures (npcpy.npc_array) — ensemble inference, voting, consensus
Fine-tuning (npcpy.ft) — SFT, DPO, diffusion, genetic algorithms
NPC agents (npcpy.npc_compiler) — agent personas with tools and memory

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.1

Apr 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

npc_gym-0.1.1.tar.gz (113.7 kB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

npc_gym-0.1.1-py3-none-any.whl (132.9 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file npc_gym-0.1.1.tar.gz.

File metadata

Download URL: npc_gym-0.1.1.tar.gz
Upload date: Apr 6, 2026
Size: 113.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for npc_gym-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`78352bcbd022496f808fc9301248b441cf0f53f7aa6949a2e95bd5664e7a1bbd`
MD5	`720d776f16ae6109167d899f2a66efb2`
BLAKE2b-256	`d15ec7702dab35483279cdc510c27c84b278cba6096644123c673ab23f0f8006`

See more details on using hashes here.

File details

Details for the file npc_gym-0.1.1-py3-none-any.whl.

File metadata

Download URL: npc_gym-0.1.1-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 132.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for npc_gym-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`49105b703121d95773d86706fbe1e54f0b7904fc4ac1e377cbb4e618ae65b0e0`
MD5	`104795549e95d0015c6e6ee685e521ed`
BLAKE2b-256	`dbc2a51ff68e5bf9c3d94d4a6f7bd840133d13cab4ee15b344cc30a5b24c9975`

See more details on using hashes here.

npc-gym 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

npc-gym

What's Included

Spaces

Wrappers

Registration

Environments

SlimeVolley-v1 — Physics-Based RL

InfoPoker-v1 — Partial Information Decomposition

More Environments

Training

NEAT Neuroevolution (via npcpy)

Genetic Model Evolution

Trace Collection for DPO

Architecture

Integration with npcpy

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes