Skip to main content

Awesome myugpt created by Cardinal-Robo-Taxi

Project description

MyuGPT

PyPI - Downloads PyPI - Version codecov CI GitHub License

MyuZero Paper: https://arxiv.org/abs/1911.08265

MyuZero uses AI guided Monte Carlo tree search to make good decisions and hence play games like Atari, Go, Chess, Shogi at a super-human level. Tesla has shown that it has recently applied a similar approach of AI Guided Tree Search for Path Planning. The difference being, at the moment Tesla likely uses their hard-coded simulator for training (along with their large dataset of user data). LLMs can takes the a programming problem statement as input along with the current code and its output and produces new code to process as output

There is potential to build a super human coding agent using LLMs and MyuZero

Inspiration

MyuZero

To summarise the MyuZero Paper, there are three neural networks:

  • h(img) -> S : Environment Encoder takes an image as input and provides a latent space representation as output
  • f(S) -> P,V : Policy-Value Function takes the environment state as input and produces a distribution of policies to take P, and their corresponding future reward value V.
  • g(Si, Ai) -> Ri Si+1 : Dynamics Model takes a state action pair (S, A) for a given frame i as input and produces the next state Si+1 along with the reward Ri for the action Ai.
  • The Environment Encoder is used to convert the sensor reading to a latent space. The Policy-Value Function is used to produce good candidate branches to explore further in the Monte Carlo Tree Search. The Dynamics Model facilitates the system to look into the future. Thus the the networks along with the Monte Carlo Tree Search framework is able to make an informed decision by looking down branches with potential and picking the one with the highest reward.

In the context of LLMs as coding agents, this is how it would translate:

  • h(env) -> S : Environment Encoder takes the problem statement, current code written and the output of the compiled code and wraps it all up into a text prompt for GPT
  • f(S) -> P,V : Policy-Value Function is an LLM. We would have to prompt it to produce a value as well (ask it to score itself). By varying the temperature of the model, we can sample multiple possible chains of thought and follow the most likely path
  • g(Si, Ai) -> Ri Si+1 : Dynamics Model is the code interpreter which the code request from GPT as output, runs the code and updates the environment (new code and output) Monte Carlo Tree search, guided by the three networks will be used to explore potential trajectories the car can take in the near future (say 1 to 5 seconds) and the trajectory with the highest reward is picked.

What would we have to look into:

  1. Prompt engineering to translate the environment to a prompt
  2. We will have to look into how this reward is calulated

Datasets

AlphaCode's Code Contests Dataset

CodeForces Dataset

LeetCode Dataset

Usage

$ python -m myugpt
#or
$ myugpt

Development

Read the CONTRIBUTING.md file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

myugpt-0.1.1.tar.gz (16.1 kB view hashes)

Uploaded Source

Built Distribution

myugpt-0.1.1-py3-none-any.whl (12.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page