Standalone reusable-booster landing environment for reinforcement learning.

These details have not been verified by PyPI

Project links

Project description

Platform Lander

A standalone reusable-booster landing environment based on Gymnasium LunarLander v3 physics, but without importing Gymnasium. The task is to land a SpaceX-style booster upright on a moving floating platform. Missing the platform and falling into the ocean, or contacting the platform in a non-vertical position, terminates the episode as failure.

Install

After the package has been published to PyPI:

pip install platform_lander

Before the PyPI release is available, install the same package directly from the book repository subdirectory:

pip install "platform_lander @ git+https://github.com/aburkov/theDRLbook.git#subdirectory=test_environments/platform_lander"

For local development from this folder:

pip install -e .

Google Colab

Use the same install command in the first notebook cell. Colab usually needs swig before Box2D builds:

!apt-get -qq install swig
!pip install -q platform_lander

Then import normally:

from platform_lander import PlatformLander

env = PlatformLander(render_mode="rgb_array", enable_wind=True, wind_power=5.0)
obs, info = env.reset(seed=0)
obs, reward, terminated, truncated, info = env.step(2)
frame = env.render()

Display a rendered frame in Colab:

import matplotlib.pyplot as plt

plt.imshow(frame)
plt.axis("off")
plt.show()

Local Script

To watch the booster in a local Pygame window, install the package in editable mode and run the demo:

pip install -e .
python examples/demo.py

The test file is headless, so running pytest or python tests/test_platform_lander.py will not open an animation window.

To train a discrete policy with the textbook single-trajectory REINFORCE algorithm and then show three animated runs:

pip install -e ".[train]"
python vanilla_reinforce.py

The repository also includes incremental REINFORCE variants:

python rtg_reinforce.py                                  # vanilla + per-timestep reward-to-go
python average_reinforcement_baseline_reinforce.py       # reward-to-go + running scalar RTG baseline
python value_function_baseline_reinforce.py              # reward-to-go + learned value-function baseline
python batch_reinforce.py                                # vanilla + trajectory batches
python full_reinforce.py                                 # batches + reward-to-go + selectable scalar baseline

Each training script writes a log, per-episode CSV data, and a checkpoint under runs/ by default, for example runs/full_reinforce.log, runs/full_reinforce.csv, and runs/full_reinforce.pt. Override those paths with --log-file, --csv-file, and --model-file.

To load the hardcoded runs/full_reinforce.pt checkpoint and watch several animated policy rollouts:

python watch_trained_policy.py

To generate one side-by-side results graph per variant from the saved CSV files:

python plot_reinforce_results.py

For a quick smoke test without opening the animation window:

python vanilla_reinforce.py --episodes 3 --max-steps 20 --no-animation

Training scripts also expose reward-scale controls for experiments:

python gamma_dropped_rtg_reinforce.py --success-reward 500 --failure-reward -500 --shaping-factor 0.5

from platform_lander import PlatformLander

env = PlatformLander(enable_wind=True, wind_direction=(1, 0.2), wind_power=5.0)
obs, info = env.reset(seed=0)

for _ in range(1000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        print(info)
        break

env.close()

API Notes

PlatformLander(continuous=False) uses Discrete(4) actions.
Actions: 0 no-op, 1 upper-left attitude jet, 2 bottom engine, 3 upper-right attitude jet.
continuous=True uses a two-value Box(-1, 1, shape=(2,)) action. The first value controls bottom-engine throttle from off at -1 to full power at +1; the second value controls the left/right attitude jets with its sign and throttle magnitude.
Wind is controlled with enable_wind, wind_power, wind_direction, and set_wind(...).
The platform moves horizontally at 1.15 / 3.0 world units per second by default.
Episodes start with the platform at a random x/direction and the booster directly above it, tilted like \ by 20 degrees with zero initial velocity.
Terminal rewards default to success_reward=100.0 and failure_reward=-100.0.
Dense shaping is multiplied by shaping_factor, which defaults to 1.0.
Dense shaping rewards lateral alignment, low relative horizontal speed, low vertical speed, vertical attitude, low angular velocity, and foot contact. It also rewards vertical closeness to the platform, but only while the booster's bottom is horizontally above the platform.
The booster has 50 units of engine budget by default. Discrete actions consume one unit per engine fire; continuous actions consume fractional budget according to throttle use. After the budget is exhausted, engine commands have no effect and the booster continues ballistically.
The observation includes the fraction of engine budget remaining.
The package provides local Box and Discrete spaces and does not import Gymnasium.

Publishing

Build the package from this directory:

python -m build

Upload the generated dist/platform_lander-*.tar.gz and dist/platform_lander-*.whl files to PyPI with a PyPI account that owns the platform_lander project name:

python -m twine upload dist/*

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.13

Jun 25, 2026

0.1.12

Jun 10, 2026

0.1.11

Jun 6, 2026

0.1.10

Jun 6, 2026

0.1.9

Jun 5, 2026

0.1.8

Jun 5, 2026

0.1.7

Jun 5, 2026

0.1.6

Jun 5, 2026

0.1.5

Jun 5, 2026

0.1.4

Jun 4, 2026

0.1.3

Jun 4, 2026

0.1.1

Jun 4, 2026

0.1.0

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

platform_lander-0.1.13.tar.gz (25.8 kB view details)

Uploaded Jun 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

platform_lander-0.1.13-py3-none-any.whl (18.7 kB view details)

Uploaded Jun 25, 2026 Python 3

File details

Details for the file platform_lander-0.1.13.tar.gz.

File metadata

Download URL: platform_lander-0.1.13.tar.gz
Upload date: Jun 25, 2026
Size: 25.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for platform_lander-0.1.13.tar.gz
Algorithm	Hash digest
SHA256	`259d75cff2cc24df7e09ef20a04affdffc57c5d8ab2fef045390043f021bbf50`
MD5	`635ea606c47090665a59accbf767c018`
BLAKE2b-256	`71beb1abb991159e973da972c7d009208a5bcb6d0dfc5034e5f906d99ec59a29`

See more details on using hashes here.

File details

Details for the file platform_lander-0.1.13-py3-none-any.whl.

File metadata

Download URL: platform_lander-0.1.13-py3-none-any.whl
Upload date: Jun 25, 2026
Size: 18.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for platform_lander-0.1.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1e263793529adf11e1e18d8f1958521e88471a0b4a86bb149afc1bbe9628462f`
MD5	`878437fab6732f196645b4812c7b7284`
BLAKE2b-256	`1be62bc8fbd327516dea5c852a9a0607c77e8c9e5247650fa721061f7b6ac638`

See more details on using hashes here.

platform-lander 0.1.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Platform Lander

Install

Google Colab

Local Script

API Notes

Publishing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes