PolyTrack Imitation Learning Agent
Project description
PILA — PolyTrack Imitation Learning Agent
PILA (PolyTrack Imitation Learning AI) is an imitation learning agent trained to play PolyTrack by learning directly from recorded human gameplay — not from hard-coded rules.
Instead of manually programming behavior, PILA uses supervised learning to map game states to player actions, allowing it to imitate realistic human driving.
What Is PILA?
PILA learns how to play PolyTrack by:
- Observing gameplay data (states + actions)
- Training a neural network on this data
- Reproducing player behavior in real time inside the game
This approach eliminates hand-written logic and relies entirely on learning by example.
How It Works
Data Collection
-
Gameplay is recorded as observations (inputs) and actions(input and output)
-
Outputs represent player controls:
- Steering
- Throttle
- Brake
Training
- A neural network is trained using imitation learning
- The model minimizes loss between:
- Predicted actions
- Recorded player actions
- Trained models are saved as checkpoints(every 2 epochs) or complete save for later use
Inference / Playing
- The trained model reads live game frames
- It predicts the next actions using the current frame
- Actions are sent to the game as keyboard inputs
Requirements
- Modern CPU or GPU
- Python 3.11.9
how to use PILA
for macOS visit the macOS branch
for windows and linux follow this guide⬇
start by cloning the repo
git clone https://github.com/tryfonaskam/pila.git
cd pila
after you successfully cloned the repo you will need to install the requirements for the code to work correctly to do that run this command
pip install -r requirements.txt
or install the manually
note all requirements should be installed
the installation is now complete.
dataset creation and training
you should start with one of these commands
⬇for automatic recapture⬇
python3 loop_datamaker.py
after the execution of the script you should switch to your game window
the game window should be windowed fullscreen for the code to work automatically
or you NEED to set REGION = [None = full monitor] or set (left, top, right, bottom)
press F1 to start capturing. then play the game and press F10 to stop and save
NOTE 1. you need to play the game smoothly and correctly for the model to be good
NOTE 2. you can just press F1 to start capturing again
the keys that are geting captured are just Shift, Ctrl, w, a, s, d, q, e and mouse movment
[!TIP] increase the FPS for better capturing
[!CAUTION] this will greatly increase the size of each run
training
for this step you will need to have a modern GPU or CPU to start training run this command⬇
python3 train.py
this will start training of the model using the datasets/ directory. a checkpoint is saved every SAVE_EVERY = N the default is 2 in the checkpoints/ directory you can stop training at any point and re-run the script to automatically resume training
to see a graphical representation of the loss, mag_key and mag_mouse run this command⬇
for linux⬇
tensorboard --bind_all --logdir runs/
for windows you can run the tensorboard.bat file or run this command⬆
training settings
BASE_WEIGHT = 0.4 # minimum weight for zero frames
DYNAMIC_WEIGHT = 0.1 # how much nonzero frames add (0–1)
ATTN_SCALE = 0.1 # sensitivity
NOTE: a nonzero frame is a frame that has a value of 0 inside the actions.csv
BASE_WEIGHTminimum importance of every frame. if a frame is 0 It still contributes 40% of a normal loss
DYNAMIC_WEIGHTExtra weight added when something happens. The more action → the more weight (up to a limit).
ATTN_SCALE = 0.1ATTN_SCALE = 0.1. Controls how fast attention grows
playing
to get the model to play the game you need to run this command⬇
python3 play.py
after that you have ~5 seconds to switch to your game
NOTE: it needs to be the same size and resolution as in training
now the model will play the game
[!IMPORTANT] if anything goes wrong press "esc" to stop the model
advantages
one big list of advantages is the User-Friendly and Robust Workflow
1. Automatic Data Organization: The data collection scripts automatically create and manage datasets in clearly labeled folders (run01, run02, etc.), keeping your experiments organized.
2. Resumable Training: You can stop and resume the training process at any time without losing progress, which is perfect for long training sessions and serious projects.
3. Integrated Visualization: With TensorBoard support, you can easily graph the model's learning progress to see how its performance improves over time.
Flexible and Configurable: Key parameters for data collection, training, and inference are clearly defined, making it easy to experiment and tune the agent's behavior.
Clear and Modular Codebase: The project is well-structured and serves as an excellent learning resource for anyone interested in imitation learning, computer vision, or game AI.
KEY ADVANTAGES
1. Another advantage is that the code’s modular, flexible design allows it to be reused across different games and applications. This adaptability makes it easy to integrate into new projects or expand its functionality without rewriting the core logic.
2. End-to-End Imitation Learning: the project provides a complete, self-contained pipeline for building a game-playing AI. It handles everything from data collection and training to real-time inference, making it a comprehensive solution.
3. Learns by Example, Not by Rules: PILA learns directly from observing human gameplay. This "show, don't tell" approach is powerful because it requires no hard-coded logic or manual programming of behaviors, allowing it to learn complex and nuanced driving styles.
4. Real-Time Performance: The agent operates in real-time, using efficient screen capturing (dxcam) and a streamlined model to react to live gameplay without significant lag.
5. Intelligent Training with Attention: The training process uses a custom attention mechanism that gives more weight to frames with significant player actions. This helps the model focus on the most important moments of gameplay, leading to more efficient and effective learning.
extra stuff
if you would like to have more/less inputs and outputs you can do that
loop_datamaker
setp 1. add/remove keys inside loop_datamaker.py
# Control keys
KEY_W = 'w'
KEY_S = 's'
KEY_A = 'a'
KEY_D = 'd'
KEY_Q = 'q'
KEY_E = 'e'
KEY_SPACE = 'space'
KEY_SHIFT = 'shift'
KEY_CTRL = 'ctrl'
# mouse buttons
MOUSE_LEFT = "left"
MOUSE_RIGHT = "right"
step 2. then you will need to update the code to use the new keys
# Keyboard controls
w_s = axis(KEY_S, KEY_W)
a_d = axis(KEY_A, KEY_D)
q_e = axis(KEY_Q, KEY_E)
space = 1.0 if keyboard.is_pressed(KEY_SPACE) else 0.0
shift_ctrl_val = shift_ctrl()
# mouse controls
left_click = 1.0 if mouse.is_pressed(MOUSE_LEFT) else 0.0
right_click = 1.0 if mouse.is_pressed(MOUSE_RIGHT) else 0.0
step 3. update this part
# Save record
records.append([
frame_name,
w_s,
a_d,
q_e,
space,
shift_ctrl_val,
mouse_dx_scaled,
mouse_dy_scaled,
left_click,
right_click
])
final. step 4. update this
# Save data after stopping
df = pd.DataFrame(records, columns=[
"frame","w_s","a_d","q_e","space","shift_ctrl","mouse_dx","mouse_dy","left_click","right_click"
])
train
step 1. update this part by adding or removing keys(should be the same as in loop_datamaker.py)
y = np.array([
float(rows[i]["w_s"]),
float(rows[i]["a_d"]),
float(rows[i]["q_e"]),
float(rows[i]["space"]),
float(rows[i]["shift_ctrl"]),
float(rows[i]["mouse_dx"]),
float(rows[i]["mouse_dy"]),
float(rows[i]["left_click"]),
float(rows[i]["right_click"])
], dtype=np.float32)
final. step 2. change so the last numbere is the number of outputs from step. 1⬆
class ControlNet(nn.Module):
def __init__(self):
super().__init__()
self.cnn = nn.Sequential(
nn.Conv2d(12, 32, 5, stride=2),
nn.ReLU(),
nn.Conv2d(32, 64, 5, stride=2),
nn.ReLU(),
nn.Conv2d(64, 128, 3, stride=2),
nn.ReLU(),
nn.AdaptiveAvgPool2d((1, 1))
)
self.fc = nn.Linear(128, 9) #the last numbere is the number of outputs
play
step 1. you can copy and paste step 2⬆ to play.py do that here
class ControlNet(nn.Module):
def __init__(self):
super().__init__()
self.cnn = nn.Sequential(
nn.Conv2d(12, 32, 5, stride=2),
nn.ReLU(),
nn.Conv2d(32, 64, 5, stride=2),
nn.ReLU(),
nn.Conv2d(64, 128, 3, stride=2),
nn.ReLU(),
nn.AdaptiveAvgPool2d((1, 1))
)
self.fc = nn.Linear(128, 9)
step 2. update this code with you new inputs/outputs
w_s, a_d, q_e, space, shift_ctrl, mouse_dx, mouse_dy, left_click, right_click = out
step 3. update these accordingly
# W and S
if w_s > WASD_THRESH:
keyboard.press("w")
keyboard.release("s")
elif w_s < -WASD_THRESH:
keyboard.press("s")
keyboard.release("w")
else:
keyboard.release("w")
keyboard.release("s")
# A and D
if a_d > WASD_THRESH:
keyboard.press("d")
keyboard.release("a")
elif a_d < -WASD_THRESH:
keyboard.press("a")
keyboard.release("d")
else:
keyboard.release("a")
keyboard.release("d")
# Q and E
if q_e > QE_THRESH:
keyboard.press("e")
keyboard.release("q")
elif q_e < -QE_THRESH:
keyboard.press("q")
keyboard.release("e")
else:
keyboard.release("q")
keyboard.release("e")
# space and mouse clicks
if space > SPACE_THRESH:
keyboard.press("space")
else:
keyboard.release("space")
if left_click > CLICK_THRESH:
mouse.press(button='left')
mouse.release(button='right')
elif right_click < -CLICK_THRESH:
mouse.press(button='right')
mouse.release(button='left')
else:
mouse.release(button='left')
mouse.release(button='right')
final update this
# stop release all keys
camera.stop()
keyboard.release("shift")
keyboard.release("ctrl")
keyboard.release("q")
keyboard.release("e")
keyboard.release("w")
keyboard.release("a")
keyboard.release("s")
keyboard.release("d")
keyboard.release("space")
mouse.release(button='left')
mouse.release(button='right')
exetra
you can run python3 cluster.py
to visualize patterns in the training data.
it is going to look like this
Credits
tryfonaskam - Project author and lead developer.
Designed and implemented the full imitation learning pipeline, including data capture, dataset organization, neural network architecture, training workflow, checkpointing, and real-time inference, contributor for pila macOS. Responsible for model training, experimentation, documentation, and overall project execution and contributing a custom-designed test track used for evaluation.
sahusaurya - Project ideation and environment contribution.
Helped shape the initial project direction, identifying PolyTrack as an appropriate environment for imitation learning, and contributing a custom-designed test track used for evaluation.
polytrack - the game used for training the model, and play the model
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pila-0.1.0.tar.gz.
File metadata
- Download URL: pila-0.1.0.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d8b79725c8f4dfc2bd331ef06d337631d1ed7ede8769c4e6545cc1dab6c295e
|
|
| MD5 |
2435d60e9a2ebb1c77d7423fa7f9ec67
|
|
| BLAKE2b-256 |
952423eb1df5e179cdcc2813aa1a9b338564d472c6eece59bb47d413ea0745d1
|
File details
Details for the file pila-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pila-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3eb79a3e43e231e2221e21ce8e6533cf3963d1e9077b6eab47621f46e1160cf2
|
|
| MD5 |
b0aa24be3c9ece6adaab59115a4cd94a
|
|
| BLAKE2b-256 |
0928383ce905e2b2d22e98378590aba57dd0c11de371d45c9ef43d2a879eedf2
|