A tangible Reinforcement Learning engine with Matchboxes and colored Beads

These details have not been verified by PyPI

Project links

Project description

Matchbox-RL

A Tangible Reinforcement Learning Engine for Python.

Matchbox-RL is a visualization-first AI library based on Donald Michie's 1961 MENACE (Machine Educatable Noughts And Crosses Engine).

It implements tabular Q-Learning using Michie's original physical metaphor: Matchboxes represent States, and Colored Beads represent Actions. This approach makes the probability distributions of the agent tangible, inspectable, and easy to visualize.

Ideal for:

Education: Demonstrating Reinforcement Learning concepts with concrete examples.
Discrete Games: Tic-Tac-Toe, Nim, Hexapawn, Grid Worlds.
Visualization: Visualize the internal states of basic RL applications.

Key Features

Physical Metaphor: Operates on Matchbox, Bead, and pick() logic to simulate the probabilistic selection of actions.
Inspectable State: Built-in render() method prints color-coded bead histograms to the terminal, visualizing the agent's confidence.
Smart Colors: Supports standard names ("red") and Hex Codes ("#FF00AA") for TrueColor terminal visualization.
Configurable Learning: Adjust initial bead counts, max beads, and reward/punishment values via LearningConfig.
Zero Dependencies: Pure Python. Runs anywhere.

Installation

pip install matchbox-rl

Quick Start

1. Define your Actions as Beads

Each bead represents an action the agent can take. For Tic-Tac-Toe, each position (0-8) is a possible move.

from matchbox import Engine, Bead, LearningConfig

# Define the 9 board positions as beads
beads = [
    Bead(name="Cell0", action=0, color="red"),
    Bead(name="Cell1", action=1, color="blue"),
    Bead(name="Cell2", action=2, color="green"),
    # ... cells 3-8
]

2. Create the Engine

# Initialize with Michie's original MENACE settings
config = LearningConfig(
    initial_beads=4,      # Michie's original
    win_reward=3,         # +3 beads on win
    draw_reward=1,        # +1 bead on draw
    lose_punishment=1,    # -1 bead on loss
)

engine = Engine(beads=beads, config=config)

3. Play and Train

# X plays corner (position 0), what should O play?
state_id = "X        "  # Board state as string

action = engine.get_move(state_id)
print(f"Agent chose: {action}")  # e.g., "Agent chose: 4" (center)

# After the game ends, reinforce the behavior
engine.train(result='win')   # +3 beads
engine.train(result='draw')  # +1 bead
engine.train(result='lose')  # -1 bead (confiscate)

Visualizing the Agent

Matchbox-RL lets you view the learned policy as a physical collection of beads.

# After 100,000 games of training:
print(engine.render_box("X        "))

Example:

MENACE Tic-Tac-Toe

Configuration

Adjust the reinforcement schedule by passing a LearningConfig object.

from matchbox import LearningConfig

config = LearningConfig(
    initial_beads=4,      # Starting beads per action
    max_beads=20,         # Cap on beads per action
    win_reward=3,         # Beads added on WIN
    draw_reward=0,        # Beads added on DRAW
    lose_punishment=1     # Beads removed on LOSS
)

engine = Engine(beads=beads, config=config)

The History

In 1961, Donald Michie, a British AI researcher, did not have a computer powerful enough to test his theories on reinforcement learning. So, he built one out of 304 matchboxes and thousands of colored beads.

He called it MENACE (Machine Educatable Noughts And Crosses Engine). He physically played Tic-Tac-Toe against it, removing beads when it lost and adding beads when it won. Over time, the pile of matchboxes learned to play a perfect game.

Matchbox-RL is a faithful software recreation of that physical mechanism.

Examples

See the examples/ directory:

simple_walk.py - 2D Grid World with walls and treasure
tic_tac_toe.py - Learning agent vs random opponent

python examples/simple_walk.py
python examples/tic_tac_toe.py

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Dec 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchbox_rl-1.0.0.tar.gz (160.0 kB view details)

Uploaded Dec 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

matchbox_rl-1.0.0-py3-none-any.whl (9.3 kB view details)

Uploaded Dec 28, 2025 Python 3

File details

Details for the file matchbox_rl-1.0.0.tar.gz.

File metadata

Download URL: matchbox_rl-1.0.0.tar.gz
Upload date: Dec 28, 2025
Size: 160.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.20

File hashes

Hashes for matchbox_rl-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`2355fc071f17d4bdd7511fc7337ba2f037281fc40aefc3300b51cdea3a7b2d2d`
MD5	`2d406323ae51e09f4c479988ab449405`
BLAKE2b-256	`b811bf69021d9ea83d9c9521fa362be2e5e7596dc42432c2e37f59f246d40ffe`

See more details on using hashes here.

File details

Details for the file matchbox_rl-1.0.0-py3-none-any.whl.

File metadata

Download URL: matchbox_rl-1.0.0-py3-none-any.whl
Upload date: Dec 28, 2025
Size: 9.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.20

File hashes

Hashes for matchbox_rl-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4d7b8419b7beb5030709c7369fd9d3fa00827aade1a49b35bd5da5365ca33165`
MD5	`6905f195a15af82ee87d44a11f4b8331`
BLAKE2b-256	`f62a78276b776c56c6f7a634d35da28f23eed97a14b16b398b103452859c1a07`

See more details on using hashes here.

matchbox-rl 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Matchbox-RL

Key Features

Installation

Quick Start

1. Define your Actions as Beads

2. Create the Engine

3. Play and Train

Visualizing the Agent

Configuration

The History

Examples

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes