Awesome myugpt created by Cardinal-Robo-Taxi
Project description
MyuGPT
MyuZero Paper: https://arxiv.org/abs/1911.08265
MyuZero uses AI guided Monte Carlo tree search to make good decisions and hence play games like Atari, Go, Chess, Shogi at a super-human level. Tesla has shown that it has recently applied a similar approach of AI Guided Tree Search for Path Planning. The difference being, at the moment Tesla likely uses their hard-coded simulator for training (along with their large dataset of user data). LLMs can takes the a programming problem statement as input along with the current code and its output and produces new code to process as output
There is potential to build a super human coding agent using LLMs and MyuZero
Inspiration
To summarise the MyuZero Paper, there are three neural networks:
- h(img) -> S : Environment Encoder takes an image as input and provides a latent space representation as output
- f(S) -> P,V : Policy-Value Function takes the environment state as input and produces a distribution of policies to take P, and their corresponding future reward value V.
- g(Si, Ai) -> Ri Si+1 : Dynamics Model takes a state action pair (S, A) for a given frame i as input and produces the next state Si+1 along with the reward Ri for the action Ai.
- The Environment Encoder is used to convert the sensor reading to a latent space. The Policy-Value Function is used to produce good candidate branches to explore further in the Monte Carlo Tree Search. The Dynamics Model facilitates the system to look into the future. Thus the the networks along with the Monte Carlo Tree Search framework is able to make an informed decision by looking down branches with potential and picking the one with the highest reward.
In the context of LLMs as coding agents, this is how it would translate:
- h(env) -> S : Environment Encoder takes the problem statement, current code written and the output of the compiled code and wraps it all up into a text prompt for GPT
- f(S) -> P,V : Policy-Value Function is an LLM. We would have to prompt it to produce a value as well (ask it to score itself). By varying the temperature of the model, we can sample multiple possible chains of thought and follow the most likely path
- g(Si, Ai) -> Ri Si+1 : Dynamics Model is the code interpreter which the code request from GPT as output, runs the code and updates the environment (new code and output) Monte Carlo Tree search, guided by the three networks will be used to explore potential trajectories the car can take in the near future (say 1 to 5 seconds) and the trajectory with the highest reward is picked.
What would we have to look into:
- Prompt engineering to translate the environment to a prompt
- We will have to look into how this reward is calulated
Datasets
AlphaCode's Code Contests Dataset
CodeForces Dataset
LeetCode Dataset
- https://www.kaggle.com/datasets/gzipchrist/leetcode-problem-dataset
- 1,825 Leetcode problems and was last updated in April 2021
Usage
$ python -m myugpt
#or
$ myugpt
Development
Read the CONTRIBUTING.md file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file myugpt-0.1.0.tar.gz
.
File metadata
- Download URL: myugpt-0.1.0.tar.gz
- Upload date:
- Size: 15.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 723b9c2c0d4dbf90cfe0c759470cb27cd977518ba8ca36b0913924db3adfc371 |
|
MD5 | 94ab4bbd8c07842c1a3a04ffe188d5d2 |
|
BLAKE2b-256 | 673bc87a79fd4b5e137399e9f0de42a6fbbcadb805a83a3ed8d88733af56bc1d |
File details
Details for the file myugpt-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: myugpt-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c800f2cac669de67f12b90bb8332b958746ab7d5ef43a2408df385e2c45aa8b |
|
MD5 | f557248ffe1177d22649a21d889fde50 |
|
BLAKE2b-256 | 2d8375dac070c89b61bbe97982c3c9187aa3d0fb1a639d84107db2b02f105d1c |