No project description provided
Project description
AC-Solver
- Overview
- Andrews-Curtis Conjecture
- Abstract and paper
- Installation
- Usage
- Notebooks
- Contributing
- Contact info
- Citation
- Acknowledgments
Overview
This repository accompanies the paper "What Makes Math Problems Hard for Reinforcement Learning: A Case Study." It includes an implementation of the Andrews-Curtis (AC) Environment in Gymnasium, two classical search algorithms (BFS and Greedy Search), and a PPO agent that works within this environment. Additionally, the repository contains Jupyter notebooks for reproducing the analyses and figures presented in the paper.
Abstract and paper
Using a long-standing conjecture from combinatorial group theory, we explore, from multiple angles, the challenges of finding rare instances carrying disproportionately high rewards. Based on lessons learned in the mathematical context defined by the Andrews--Curtis conjecture, we propose algorithmic improvements that can be relevant in other domains with ultra-sparse reward problems. Although our case study can be formulated as a game, its shortest winning sequences are potentially $10^6$ or $10^9$ times longer than those encountered in chess. In the process of our study, we demonstrate that one of the potential counterexamples due to Akbulut and Kirby, whose status escaped direct mathematical methods for 39 years, is stably AC-trivial.
Read full paper on arxiv [TODO: add link to arxiv]
Andrews-Curtis Conjecture
Andrews-Curtis conjecture is a long-standing open problem in combinatorial group theory and low-dimensional topology. It states that every balanced presentation of the trivial group could be transformed into the trivial presentation using actions on relators: inverses, conjugation and concatenation. More precisely, given presentation of trivial group of form $\langle x_{1}, x_{2}, \ldots, x_{n} | r_{1}, r_{2}, \ldots, r_{n} \rangle$ can be transformed into $\langle x_{1}, x_{2}, \ldots, x_{n} |x_{1}, x_{2}, \ldots, x_{n} \rangle$ using following set of moves:
- inverse: changing some $r_{i}$ with $r_{i}^{-1}$,
- conjugation: changing some $r_{i}$ with $qr_{i}q^{-1}$ for some $q$,
- concatenation: changing some $r_{i}$ with $r_{i}r_{j}$.
While many counterexamples to this conjecture were proposed over the years, finding trivializing sequences is notoriously hard. Many aspects of this math problem make it a perfect setup for studying how Reinforcement Learning can identify rare and long sequences of moves that close the desired goal.
Installation
To work with the AC Environment or build upon it, you can simply install the package using pip:
pip install ac_solver
If you wish to reproduce the plots and analyses in the paper, you will need to clone the repository locally. Here is the recommended process:
-
Clone the repository:
git clone https://github.com/shehper/AC-Solver.git cd AC-Solver
-
Make a virtual environment and install the package locally using pip:
python -m venv ./env source ./env/bin/activate pip install .
Usage
After installation, you can start using the environment and agents as follows:
Initializing the AC Environment
The ACEnv
class is initialized using the ACEnvConfig
class. By default, ACEnv()
initializes the environment with the trivial presentation $\langle x, y | x, y \rangle$, represented in code as [1, 0, 2, 0]
.
If you want to specify your own presentation, you can do so using a list. For example, to initialize the environment with the simplest presentation from the Akbulut-Kirby series, $\langle x, y | x^2 = y^3, xyx = yxy \rangle$, you can follow these steps:
-
Import the classes.
from ac_solver import ACEnvConfig, ACEnv
-
Define the presentation as a list:
presentation = [1, 1, -2, -2, -2, 0, 0, 1, 2, 1, -2, -1, -2, 0]
-
Initialize the
ACEnvConfig
with this presentation:config = ACEnvConfig(initial_state=presentation)
-
Create the environment with the custom configuration:
env = ACEnv(config)
This allows you to explore different presentations within the AC Environment.
Performing Classical Search
To perform greedy search in the neighborhood of a specified presentation, do:
from ac_solver import greedy_search
greedy_search(presentation)
Similarly, for breadth-first-search:
from ac_solver import bfs
bfs(presentation)
You can specify the number of nodes to explore through max_nodes_to_explore
argument of these functions. By default, search is done over 10000 nodes. If the search is successful in reaching a trivial state, a path of AC moves is returned to the user.
Solving the Environment with PPO
To train a PPO agent on the AC environment, run the following command in your terminal:
python ac_solver/agents/ppo.py
By default, this command trains the PPO agent on an AC graph with initial states drawn from approximately 1200 presentations of the Miller-Schupp series, as listed in this file. You can customize your run by passing any hyperparameters listed in args.py via the command line.
Notebooks
The notebooks/
directory contains Jupyter notebooks that reproduce the figures and results discussed in the paper:
Classical-Search-and-PPO-in-AC-Environment.ipynb
: Reproduces figures in Sections 1, 3, and 4 of the paper.Scaling-PPO-in-AC-Environment.ipynb
: Reproduces figures in Section 5 of the paper.Stable-AK3.ipynb
: Provides code demonstrating that AK(3) is a stably AC-trivial presentation, a major result of the paper.
To run these notebooks, you must clone the repository locally as described above.
Contributing
Contributions to this project are welcome! If you'd like to contribute, please follow these steps:
-
Fork the repository.
-
Create a new branch for your feature or bugfix.
-
Run tests with
pytest
and ensure your code is formatted withblack
:poetry run pytest poetry run black .
-
Submit a pull request with a clear description of your changes.
Contact info
[TODO]
Citation
@article{
}
[TODO]
Acknowledgments
This project’s PPO implementation is based on the CleanRL library.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ac_solver-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09dc0f65f4f2072b4f8e0ef86d1a248ae5e21b39ea05cb7cd1d92c1f82c1febf |
|
MD5 | c2e58539c4f9caaafe12b84ae2de451b |
|
BLAKE2b-256 | a5eb489ee9905219ead532411e67294c54d5e8ebeb71b56c72eb47193fd39de4 |