No project description provided

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

PyPI - Downloads code size, bytes GitHub commit activity GitHub closed pull requests

`maze-dataset`

This package provides utilities for generation, filtering, solving, visualizing, and processing of mazes for training ML systems. Primarily built for the maze-transformer interpretability project. You can find our paper on it here: http://arxiv.org/abs/2309.10498

This package includes a variety of maze generation algorithms, including randomized depth first search, Wilson's algorithm for uniform spanning trees, and percolation. Datasets can be filtered to select mazes of a certain length or complexity, remove duplicates, and satisfy custom properties. A variety of output formats for visualization and training ML models are provided.

Usage

Most of the functionality is demonstrated in the ipython notebooks in the notebooks/ folder.

demo_dataset.ipynb how to easily create a dataset of mazes, utilities for filtering the generates mazes via properties, and basic visualization. View this one first.
demo_tokenization.ipynb converting mazes to and from textual representations, as well as utilities for working with them.
demo_latticemaze.ipynb internals of the LatticeMaze and SolvedMaze objects, and advanced visualization.

Creating a dataset

To create a MazeDataset, which inherits from torch.utils.data.Dataset, you first create a MazeDatasetConfig:

from maze_dataset import MazeDataset, MazeDatasetConfig
from maze_dataset.generation import LatticeMazeGenerators
cfg: MazeDatasetConfig = MazeDatasetConfig(
	name="test", # name is only for you to keep track of things
	grid_n=5, # number of rows/columns in the lattice
	n_mazes=4, # number of mazes to generate
	maze_ctor=LatticeMazeGenerators.gen_dfs, # algorithm to generate the maze
    maze_ctor_kwargs=dict(do_forks=False), # additional parameters to pass to the maze generation algorithm
)

and then pass this config to the MazeDataset.from_config factory method:

dataset: MazeDataset = MazeDataset.from_config(
    # your config
	cfg,
    # and all this below is completely optional
	# do_download=False,
	# load_local=False,
	# do_generate=True,
    # save_local=True,
	# gen_parallel=False,
)

This method can search for whether a dataset with matching config hash already exists on your filesystem in the expected location, and load it if so. It can also generate a dataset on the fly if needed.

Conversions to useful formats

The elements of the dataset are SolvedMaze objects:

>>> m = dataset[0]
>>> type(m)
maze_dataset.maze.lattice_maze.SolvedMaze

Which can be converted to a variety of formats:

# visual representation as ascii art
m.as_ascii() 
# RGB image, optionally without solution or endpoints, suitable for CNNs
m.as_pixels() 
# text format for autoreregressive transformers
from maze_dataset.tokenization import MazeTokenizer, TokenizationMode
m.as_tokens(maze_tokenizer=MazeTokenizer(
    tokenization_mode=TokenizationMode.AOTP_UT_rasterized, max_grid_size=100,
))
# advanced visualization with many features
from maze_dataset.plotting import MazePlot
MazePlot(maze).plot()

textual and visual output formats

Installation

This package is available on PyPI, and can be installed via

pip install maze-dataset

Development

This project uses Poetry for development. To install with dev requirements, run

poetry install --with dev

A makefile is included to simplify common development tasks:

make help will print all available commands
all tests via make test
- unit tests via make unit
- notebook tests via make test_notebooks
formatter (black, pycln, and isort) via make format
- formatter in check-only mode via make check-format

Citing

If you use this code in your research, please cite our paper:

@misc{maze-dataset,
    title={A Configurable Library for Generating and Manipulating Maze Datasets}, 
    author={Michael Igorevich Ivanitskiy and Rusheb Shah and Alex F. Spies and Tilman Räuker and Dan Valentine and Can Rager and Lucia Quirke and Chris Mathwin and Guillaume Corlouer and Cecilia Diniz Behn and Samy Wu Fung},
    year={2023},
    eprint={2309.10498},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={http://arxiv.org/abs/2309.10498}
}

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.5.3

May 14, 2024

0.5.2

Mar 19, 2024

0.5.1

Mar 5, 2024

0.5.0

Feb 23, 2024

0.4.5

Dec 5, 2023

0.4.4

Nov 14, 2023

0.4.3

Oct 5, 2023

0.4.2

Oct 4, 2023

0.4.1

Sep 29, 2023

0.4.0

Sep 19, 2023

0.3.6

Sep 19, 2023

0.3.5

Sep 5, 2023

0.3.4

Sep 5, 2023

0.3.3

Sep 4, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maze_dataset-0.5.3.tar.gz (48.4 kB view hashes)

Uploaded May 14, 2024 Source

Built Distribution

maze_dataset-0.5.3-py3-none-any.whl (57.1 kB view hashes)

Uploaded May 14, 2024 Python 3

Hashes for maze_dataset-0.5.3.tar.gz

Hashes for maze_dataset-0.5.3.tar.gz
Algorithm	Hash digest
SHA256	`750054dc4720030680d078d26311138f267cde3b8ed89e145bb0ae4366d532c0`
MD5	`2592eca4426240d282330c83baf54997`
BLAKE2b-256	`2380b898041a20a0679578543e7a9cc5b2bebd233e74e9331de61e3e1eefbcd7`

Hashes for maze_dataset-0.5.3-py3-none-any.whl

Hashes for maze_dataset-0.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`29e3c150e70186d6b9a0823a8914c884d96f6b236bb577b6920983a65d4bfa1d`
MD5	`c68611e35b36f8337b7819245eaf7a17`
BLAKE2b-256	`10f54376f6250db7a048edd5850d5118043534d2eb2e70999909dfab93929ed0`