generating and working with datasets of mazes
Project description
maze-dataset
This package provides utilities for generation, filtering, solving, visualizing, and processing of mazes for training or evaluating ML systems. Primarily built for the maze-transformer interpretability project. You can find our paper on it here: http://arxiv.org/abs/2309.10498
This package includes a variety of maze generation algorithms, including randomized depth first search, Wilson's algorithm for uniform spanning trees, and percolation. Datasets can be filtered to select mazes of a certain length or complexity, remove duplicates, and satisfy custom properties. A variety of output formats for visualization and training ML models are provided.
You can view and search through a wide variety of example mazes here: understanding-search.github.io/maze-dataset/examples/maze_examples
Citing
If you use this code in your research, please cite our paper:
@misc{maze-dataset,
title={A Configurable Library for Generating and Manipulating Maze Datasets},
author={Michael Igorevich Ivanitskiy and Rusheb Shah and Alex F. Spies and Tilman Räuker and Dan Valentine and Can Rager and Lucia Quirke and Chris Mathwin and Guillaume Corlouer and Cecilia Diniz Behn and Samy Wu Fung},
year={2023},
eprint={2309.10498},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={http://arxiv.org/abs/2309.10498}
}
Installation
This package is available on PyPI, and can be installed via
pip install maze-dataset
Please note that due to an issue with the
rust-fstpackage, some tokenization features are not available on macOS. Please see #57
Docs
The full hosted documentation is available at https://understanding-search.github.io/maze-dataset/.
Additionally, our notebooks serve as a good starting point for understanding the package.
Usage
Creating a dataset
To create a MazeDataset, you first create a MazeDatasetConfig:
from maze_dataset import MazeDataset, MazeDatasetConfig
from maze_dataset.generation import LatticeMazeGenerators
cfg: MazeDatasetConfig = MazeDatasetConfig(
name="test", # name is only for you to keep track of things
grid_n=5, # number of rows/columns in the lattice
n_mazes=4, # number of mazes to generate
maze_ctor=LatticeMazeGenerators.gen_dfs, # algorithm to generate the maze
maze_ctor_kwargs=dict(do_forks=False), # additional parameters to pass to the maze generation algorithm
)
and then pass this config to the MazeDataset.from_config method:
dataset: MazeDataset = MazeDataset.from_config(cfg)
This method can search for whether a dataset with matching config hash already exists on your filesystem in the expected location, and load it if so. It can also generate a dataset on the fly if needed.
Conversions to useful formats
The elements of the dataset are SolvedMaze objects:
>>> m = dataset[0]
>>> type(m)
maze_dataset.maze.lattice_maze.SolvedMaze
Which can be converted to a variety of formats:
# visual representation as ascii art
print(m.as_ascii())
# RGB image, optionally without solution or endpoints, suitable for CNNs
import matplotlib.pyplot as plt
plt.imshow(m.as_pixels())
# text format for autoreregressive transformers
from maze_dataset.tokenization import MazeTokenizerModular, TokenizationMode, PromptSequencers
m.as_tokens(maze_tokenizer=MazeTokenizerModular(
prompt_sequencer=PromptSequencers.AOTP(), # many options here
))
# advanced visualization with many features
from maze_dataset.plotting import MazePlot
MazePlot(m).plot()
Development
We use this makefile template with slight modifications for our development workflow. This project uses uv for dependency and virtual environment management.
- clone with
git clone https://github.com/understanding-search/maze-dataset - if you don't already have uv, install it. We only guarantee compatibility with
uvnewer than0.8.0 make depto install all dependenciesmake helpwill print all available commandsmake testwill run basic tests to ensure the package is working- run just the unit tests with
make test-unit - see all tests with explanations using
make helpormake help | grep test
- run just the unit tests with
make formatwill run ruff to format and check the code
Note: due to compatibility issues between the
rust_fstpackage and Darwin/macOS systems, not all tests will pass on these systems. However,make test-unitandmake test-notebooks-muutilsshould still pass. Please see #57 for updates on resolving this problem.
Contributing
We welcome contributions! We use GitHub issues to track bugs and feature requests. If you have a bug fix or a new feature to contribute, please open a pull request. We are also happy to provide usage support and answer questions about the package via issues!
While we expect that the core interface of the package is stable, we are very open to adding new features. We're particularly excited about adding new maze generation algorithms and new output formats. Please feel free to both suggest new formats or algorithms, and to implement them and open PRs! For more info on how to add a new maze generation algorithm, see the documentation on generators.
We are also aware that like any piece of software, maze-dataset is not without bugs. If something isn't working as expected, please open an issue and we will do our best to fix it. It helps us keep things tidy if you first search existing bug reports to see if your issue has already been reported.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file maze_dataset-1.4.2.tar.gz.
File metadata
- Download URL: maze_dataset-1.4.2.tar.gz
- Upload date:
- Size: 17.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c106e29050956312b4e2cfa8fc5e953af0c3e5c35a925ad55801ae8937bcad07
|
|
| MD5 |
96c5d61dc36c9e9ca35192a40b9d5c86
|
|
| BLAKE2b-256 |
7e6240518fc8d19dd69acb23731d665fb98544df768f93b356f18fa69170f3e3
|
File details
Details for the file maze_dataset-1.4.2-py3-none-any.whl.
File metadata
- Download URL: maze_dataset-1.4.2-py3-none-any.whl
- Upload date:
- Size: 145.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
565b37d6eb0a0d487ecff3ab3bdbc80d2b13d329a59dd40b77b3d504f9bd4296
|
|
| MD5 |
df24f7328028d36f8da696b5b60dc7c1
|
|
| BLAKE2b-256 |
b32765205ad9d3aa02a4a040a93907cd5e12377545ee6b372c289ca9939e491f
|