Skip to main content

A Gymnasium reinforcement learning environment for Hill Climb Racing

Project description

Hill Climb Racing -- Gymnasium RL Environment

Python 3.10+ Gymnasium License: GPL v3

A reinforcement learning environment for Hill Climb Racing, built on Farama Gymnasium with Box2D physics and Pygame rendering. Train agents using Stable-Baselines3 or any Gymnasium-compatible RL library.

Originally developed for a bachelor's thesis at Leiden University (LIACS), supervised by Matthias Muller-Brockhausen and Evert van Nieuwenburg. The thesis explores how different action spaces, reward functions, and reward shaping strategies affect PPO agent performance in an HCR-like environment. The best agent -- using a continuous action space with an aggressive wheel-speed-based reward -- achieved a mean score of 773 (out of 1000) in evaluation, and consistently reached the maximum score of 1000 in an environment with increasing difficulty after only 200k training steps. The original experimentation code, training scripts, and result graphs can be found on the thesis branch.

The game is a Python rewrite of Code Bullet's Hill Climb Racing AI (JavaScript), with added Gymnasium integration, multiple reward/action configurations, and procedural terrain generation using Perlin noise.

Demo of the environment

Table of Contents

Features

  • Gymnasium environment -- standard reset() / step() / render() API
  • Two action spaces -- 3-action discrete (idle / gas / reverse) or continuous motor speed
  • Five reward functions -- distance-based, action-based, wheel-speed-based, and two airtime variants
  • Two reward intensities -- aggressive or soft penalty shaping
  • Procedurally generated terrain -- Perlin noise with configurable difficulty
  • Box2D physics -- realistic car, suspension, ragdoll driver, and collision detection
  • Human play mode -- play with keyboard via the hill-climb-play command
  • Pre-trained baselines -- 13 PPO models included for comparison

Installation

This project uses uv for dependency management. Install it first if you don't have it:

# Linux / macOS
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Prerequisites

Box2D requires the SWIG build tool:

# Ubuntu / Debian
sudo apt-get install swig

# macOS
brew install swig

# Windows (via conda)
conda install swig

Install from source

git clone https://github.com/alexzh3/hillclimbracing.git
cd hillclimbracing
uv sync

To also install Stable-Baselines3 for training:

uv sync --extra train

Alternatively, if you prefer plain pip:

pip install .
pip install ".[train]"   # with Stable-Baselines3

Quick Start

The snippet below opens a Pygame window, creates the environment with a random agent, and runs it for 2000 steps. The agent picks a random action (idle, gas, or reverse) each frame, so it will drive erratically and crash quickly -- but it's a good way to verify the installation works and see the environment in action.

uv run python -c "
import gymnasium as gym
import hill_racing_env
env = gym.make('hill_racing_env/HillRacing-v0', render_mode='human')
obs, _ = env.reset(seed=42)
for _ in range(2000):
    obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
    env.render()
    if terminated or truncated:
        obs, _ = env.reset()
env.close()
"

Or equivalently in a Python script:

import gymnasium as gym
import hill_racing_env  # registers the environment

env = gym.make("hill_racing_env/HillRacing-v0", render_mode="human")
obs, info = env.reset(seed=42)

for _ in range(2000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    env.render()
    if terminated or truncated:
        obs, info = env.reset()

env.close()

Environment Configuration

Pass these keyword arguments to gym.make():

Parameter Type Default Description
action_space str "discrete_3" "discrete_3" (3 actions) or "continuous" (motor speed)
reward_function str "distance" Reward function to use (see below)
reward_type str "aggressive" "aggressive" or "soft" penalty shaping
max_steps int 1200 (20s at 60 FPS) Steps without progress before truncation
original_noise bool False Switch terrain noise algorithm (see below)

Terrain noise

The original_noise parameter selects between two Perlin noise implementations for terrain generation:

  • False (default, experiment noise) -- uses noise.pnoise1() from the noise library (output range -1 to 1, passed through abs()). This produces steeper, more challenging terrain with sharper elevation changes. The terrain starts with a 500-pixel flat section so the car can build speed before hitting the hills. This is the noise used to train all the included baseline models.
  • True (original noise) -- uses the custom Perlin noise ported from Processing (output range 0 to 1), matching the original JavaScript implementation by Code Bullet. This generates smoother, more gradual terrain that is easier to traverse. There is no flat starting section, so terrain begins immediately from spawn.

Example with custom configuration:

env = gym.make(
    "hill_racing_env/HillRacing-v0",
    render_mode="human",
    action_space="continuous",
    reward_function="wheel_speed",
    reward_type="soft",
    max_steps=1800,
)

Observation Space

The observation is a Dict with four keys:

Key Space Shape Description
chassis_position Box (2,) Car (x, y) position in meters
chassis_angle Box (1,) Car rotation in degrees [0, 360]
wheels_speed Box (2,) Angular speed of back and front wheel
on_ground MultiBinary (2,) Whether each wheel is touching the ground

Action Space

Discrete (discrete_3)

Action Meaning
0 Idle (motor off)
1 Gas (drive forward)
2 Reverse

Continuous (continuous)

A single Box(low=-13, high=13, shape=(1,)) value controlling motor wheel speed directly. Negative values drive forward, positive values reverse.

Reward Functions

Five reward functions are available, each with two intensity variants:

Function Description
distance Reward based on forward progress relative to previous max distance
action Fixed reward per action type (gas = +1, idle/reverse = penalty)
wheel_speed Reward based on wheel angular velocities
airtime_distance Distance reward with an airtime bonus/penalty
airtime_wheel_speed Wheel speed reward with an airtime bonus/penalty

Reward type controls penalty magnitude:

  • aggressive: idle = -0.5, reverse = -1.0
  • soft: idle = -0.1, reverse = -0.2

Death or getting stuck always gives a reward of -100. Reaching the maximum score terminates the episode.

Human Play Mode

Play the game yourself using keyboard controls:

uv run hill-climb-play
Key Action
D / Right Arrow Gas (forward)
A / Left Arrow Reverse
Escape Quit

Pre-trained Baseline Models

The package includes 13 pre-trained PPO models in hill_racing_env/envs/baseline_models/. All models were trained for the thesis experiments using Stable-Baselines3's PPO implementation with default hyperparameters.

Model filenames encode their configuration:

ppo_{action_space}_{reward_function}_{reward_type}_{timesteps}_{seed}.zip

Where base = discrete_3 and cont = continuous action space.

Loading a model with Stable-Baselines3:

from stable_baselines3 import PPO
from pathlib import Path
import hill_racing_env

model_dir = Path(hill_racing_env.__file__).parent / "envs" / "baseline_models"
model = PPO.load(model_dir / "ppo_cont_wheel_speed_aggressive_1000_0.zip")

Evaluation results

The best models from each configuration were evaluated over 1000 episodes (from thesis Table 1). Score is the distance travelled (max 1000). Speed = score / episode length in timesteps.

Model Action Space Reward Type Mean Score Mean Length Speed
ppo_cont_wheel_speed_aggressive_1000_0 continuous wheel_speed aggressive 773 13185 0.059
ppo_cont_wheel_speed_soft_1000_0 continuous wheel_speed soft 765 13316 0.057
ppo_base_soft_1000_0 discrete distance soft 574 2299 0.250
ppo_cont_soft_1000_0 continuous distance soft 528 4833 0.109
ppo_base_action_soft_1000_0 discrete action soft 396 1349 0.294

The continuous wheel-speed agent achieves the highest score but is the slowest driver (5x slower than the discrete action-based agent). The discrete distance-based agent offers the best balance of score and speed.

All available models

Model Action Space Reward Function Reward Type Timesteps
ppo_base_aggressive_1000_0 discrete distance aggressive 1000k
ppo_base_soft_1000_0 discrete distance soft 1000k
ppo_base_action_aggressive_1000_0 discrete action aggressive 1000k
ppo_base_action_soft_1000_0 discrete action soft 1000k
ppo_base_action_soft_300_0 discrete action soft 300k
ppo_base_wheel_speed_aggressive_1000_0 discrete wheel_speed aggressive 1000k
ppo_base_wheel_speed_soft_1000_0 discrete wheel_speed soft 1000k
ppo_base_wheel_speed_soft_300_0 discrete wheel_speed soft 300k
ppo_cont_1000_0 continuous distance default 1000k
ppo_cont_aggressive_1000_0 continuous distance aggressive 1000k
ppo_cont_soft_1000_0 continuous distance soft 1000k
ppo_cont_wheel_speed_aggressive_1000_0 continuous wheel_speed aggressive 1000k
ppo_cont_wheel_speed_soft_1000_0 continuous wheel_speed soft 1000k

Key findings from the thesis

  • Best overall agent: Continuous action space + aggressive wheel-speed reward (mean score 773). In an environment with difficulty increasing until the end, this agent consistently reached the max score of 1000 after only 200k training timesteps.
  • Reward function and action space are coupled: Distance-based rewards work better with discrete actions, while wheel-speed rewards work better with continuous actions (since both the reward and action operate on the same variable).
  • Aggressive vs soft: The reward type (penalty intensity) made little difference for wheel-speed rewards in continuous action space, but aggressive penalties hurt action-based rewards in discrete action space.
  • Airtime rewards did not improve agent airtime or score -- the ground-contact penalty outweighed any benefit.
  • Speed trade-off: The highest-scoring agents are also the slowest. The discrete action-based soft agent is 5x faster than the best wheel-speed agent despite scoring lower.

Testing

Run the test suite:

uv run pytest

Tests cover package imports, environment creation with all configuration combinations, the reset/step loop, observation and action space contracts, and the Perlin noise module.

Project Structure

hillclimbracing/
├── pyproject.toml
├── README.md
├── LICENSE
├── tests/
│   ├── conftest.py                  # Shared fixtures (headless pygame setup)
│   ├── test_env.py                  # Environment creation, reset, step tests
│   ├── test_spaces.py               # Observation & action space contract tests
│   └── test_perlin.py               # Perlin noise unit tests
└── hill_racing_env/
    ├── __init__.py                  # Registers hill_racing_env/HillRacing-v0
    └── envs/
        ├── __init__.py              # Public API exports
        ├── hill_racing.py           # HillRacingEnv (Gymnasium environment)
        ├── hill_racing_human.py     # Human-playable standalone mode
        ├── car.py                   # Car chassis, suspension, motor controls
        ├── wheels.py                # Wheel bodies with Box2D joints
        ├── person.py                # Ragdoll driver (head + torso)
        ├── agent.py                 # Agent wrapper (score, state, lifecycle)
        ├── ground.py                # Procedural terrain generation
        ├── perlin.py                # Perlin noise (ported from Processing)
        ├── pictures/                # Sprite assets
        └── baseline_models/         # Pre-trained PPO model checkpoints

Acknowledgements

License

This project is licensed under the GNU General Public License v3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hill_climb_racing_env-0.1.0.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hill_climb_racing_env-0.1.0-py3-none-any.whl (1.9 MB view details)

Uploaded Python 3

File details

Details for the file hill_climb_racing_env-0.1.0.tar.gz.

File metadata

  • Download URL: hill_climb_racing_env-0.1.0.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hill_climb_racing_env-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7dfd2f6f330e94dda5b6b0a00192d9c23172c90f0e644b660f5caf3278096bf4
MD5 de2457c35a322f0ee7c8d9af54eeaf4e
BLAKE2b-256 bd2db67b6bc74deba0b1cc7f10886ed0e71581b3994deb34cb626ff4082fcf92

See more details on using hashes here.

File details

Details for the file hill_climb_racing_env-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hill_climb_racing_env-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for hill_climb_racing_env-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd1fd78c4713635a655d134bff7665bfd49cd1cb4991e0de3382658bb808fdb8
MD5 535debacd358ad7b555c4fd954432831
BLAKE2b-256 f1b00b101bea419eb0222ac4b3a048853d97e7b1b4e1e932fa02393e2c46f732

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page