Skip to main content

generating and working with datasets of mazes

Project description

Maze Dataset Logo

maze-dataset

PyPI   Docs   Examples   arXiv

Diagram

PyPI   Python Version   Checks   Coverage   code size, bytes   GitHub commit activity   GitHub closed issues   GitHub closed pull requests   PyPI - Downloads

maze-dataset

This package provides utilities for generation, filtering, solving, visualizing, and processing of mazes for training or evaluating ML systems. Primarily built for the maze-transformer interpretability project. You can find our paper on it here: http://arxiv.org/abs/2309.10498

This package includes a variety of maze generation algorithms, including randomized depth first search, Wilson's algorithm for uniform spanning trees, and percolation. Datasets can be filtered to select mazes of a certain length or complexity, remove duplicates, and satisfy custom properties. A variety of output formats for visualization and training ML models are provided.

Maze generated via percolation Maze generated via constrained randomized depth first search Maze with random heatmap MazePlot with solution

You can view and search through a wide variety of example mazes here: understanding-search.github.io/maze-dataset/examples/maze_examples

Citing

If you use this code in your research, please cite our paper:

@misc{maze-dataset,
    title={A Configurable Library for Generating and Manipulating Maze Datasets}, 
    author={Michael Igorevich Ivanitskiy and Rusheb Shah and Alex F. Spies and Tilman Räuker and Dan Valentine and Can Rager and Lucia Quirke and Chris Mathwin and Guillaume Corlouer and Cecilia Diniz Behn and Samy Wu Fung},
    year={2023},
    eprint={2309.10498},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={http://arxiv.org/abs/2309.10498}
}

Installation

This package is available on PyPI, and can be installed via

pip install maze-dataset

Docs

The full hosted documentation is available at https://understanding-search.github.io/maze-dataset/.

Additionally, our notebooks serve as a good starting point for understanding the package.

Usage

Creating a dataset

To create a MazeDataset, which inherits from torch.utils.data.Dataset, you first create a MazeDatasetConfig:

from maze_dataset import MazeDataset, MazeDatasetConfig
from maze_dataset.generation import LatticeMazeGenerators
cfg: MazeDatasetConfig = MazeDatasetConfig(
	name="test", # name is only for you to keep track of things
	grid_n=5, # number of rows/columns in the lattice
	n_mazes=4, # number of mazes to generate
	maze_ctor=LatticeMazeGenerators.gen_dfs, # algorithm to generate the maze
    maze_ctor_kwargs=dict(do_forks=False), # additional parameters to pass to the maze generation algorithm
)

and then pass this config to the MazeDataset.from_config method:

dataset: MazeDataset = MazeDataset.from_config(cfg)

This method can search for whether a dataset with matching config hash already exists on your filesystem in the expected location, and load it if so. It can also generate a dataset on the fly if needed.

Conversions to useful formats

The elements of the dataset are SolvedMaze objects:

>>> m = dataset[0]
>>> type(m)
maze_dataset.maze.lattice_maze.SolvedMaze

Which can be converted to a variety of formats:

# visual representation as ascii art
m.as_ascii() 
# RGB image, optionally without solution or endpoints, suitable for CNNs
m.as_pixels() 
# text format for autoreregressive transformers
from maze_dataset.tokenization import MazeTokenizerModular, TokenizationMode
m.as_tokens(maze_tokenizer=MazeTokenizerModular(
    tokenization_mode=TokenizationMode.AOTP_UT_rasterized, max_grid_size=100,
))
# advanced visualization with many features
from maze_dataset.plotting import MazePlot
MazePlot(maze).plot()
textual and visual output formats

Development

we use this makefile template with slight modifications for our development workflow.

  • clone with git clone https://github.com/understanding-search/maze-dataset
  • make dep to install all dependencies
  • make help will print all available commands
  • make test will run basic tests to ensure the package is working
  • make format will run ruff to format and check the code

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maze_dataset-1.3.2.tar.gz (18.1 MB view details)

Uploaded Source

Built Distribution

maze_dataset-1.3.2-py3-none-any.whl (145.7 kB view details)

Uploaded Python 3

File details

Details for the file maze_dataset-1.3.2.tar.gz.

File metadata

  • Download URL: maze_dataset-1.3.2.tar.gz
  • Upload date:
  • Size: 18.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for maze_dataset-1.3.2.tar.gz
Algorithm Hash digest
SHA256 576d50ac4151cc72578c7cf2547c24e010902a59ae195f7ed25f4a8d2d30ced4
MD5 44905f83d90e2c9d29dfda84f32de285
BLAKE2b-256 30e885115498f87e7c078da0f86e106a0d772f3ecb6e684ee9c85d26489d7a49

See more details on using hashes here.

File details

Details for the file maze_dataset-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: maze_dataset-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 145.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for maze_dataset-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 159087a5e3d2d07160c78a5183e0fd761fcc2c1ab44a69dcedd97c6619240316
MD5 97c2eea4b8ee66d5ba87f5d618a9c3ec
BLAKE2b-256 57e02d463c551d941d6c376dc04a7aa6833b1bb955e285520b5c838d23fee7c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page