A fully modular framework to make Reinforcement Learning quick and easy

Project description

HelloRL

Reinforcement Learning is usually confusing to get started, but HelloRL makes it quick and easy. It’s fully modular so you can upgrade your training once you’re ready for more, and even invent your own RL algorithms. Please ⭐ this repo if you find it useful.

Why is RL usually so hard?

Reinforcement Learning is a family of ML algorithms (like Actor Critic, A2C, PPO etc.) designed for agents to learn from experience. These algorithms are all similar, but they also have unique implementation details and subtle differences. Every RL framework implements each algorithm from scratch, reproducing many of the same steps across hundreds of lines of code, but with minor implementation differences along the way.

Trying to swap between them and keep your code working can be a nightmare. If you want to experiment with a new idea on top of Actor Critic, and then try it on a PPO implementation, you would have to spend hours integrating, and hope you didn’t make a mistake. It's a minefield -- it's so easy to trip yourself up and get something wrong without realising.

Introducing HelloRL

HelloRL flips this on its head, with a single train function and swappable modules, to build and mix together any RL algorithm easily.

HelloRL:

A modular library for Reinforcement Learning
Built around a single train function that covers every popular algorithm, from discrete online policies like Actor Critic, to continuous offline policies like TD3.
Swap modules in and out to mix algorithms together. Go from online to offline learning with just a few easy changes. Follow along with the provided notebooks to make sure you got it right.
Build your own custom modules and validate your ideas quickly.

Features

Over 20 swappable modules:
- Actors: Discrete, Stochastic, Deterministic
- Critics: Critic, QCritic
- Agents: Agent, AgentWithTargets
- RewardTransform: None, Scale
- RolloutMethod: MonteCarlo, A2C
- AdvantageMethod: Standard, GAE
- AdvantageTransform: None, Normalize
- DataLoadMethod: Single, Epochs, Replay
- GradientTransform: None, ClipNorm
- LearningRateSchedule: Constant, LinearAnneal
- Critic Loss: Standard, Clipped, Q
- PolicyObjective: Standard, Clipped
Plus other configurable hyperparameters: policy delay, gamma, entropy coef, exploration std, tau
Supports Discrete and Continuous outputs
Supports online and offline training within the same loop
Supports major RL algorithms, with sample implementations in notebooks:
- Actor Critic
- A2C
- PPO
- DDPG
- TD3
Supports OpenAI/Farama Gymnasium environments, particularly tested on CartPole and Lunar Lander, which the example notebooks demonstrate.
Supports PyTorch.

Extras

Progress

Progress utility gives nice in-line training progress with a trend graph. It makes things a little bit more sophisticated than printing a line for every 1000 timesteps. It'll show up automatically when you run trainer.train().

Modal

You might run training and see good results, but then you run it again and things get worse. Is it because of something you changed, or just down to randomness? To find that out, you could run your training loop 100 times, and then compare your results, but this would be 100x slower and take hours!

Instead, you can use Modal, which lets you run hundreds of sessions at once, remotely on their machines. I’ve been using the free version, they give a $30 credit every month, and it’s covered me for all my CPU training.

They also have a cool dashboard so you can track everything, but our progress module also supports Modal runs, and gives live updates while the training runs happen remotely.

Modal simply needs to setup auth, one time. Run modal setup or python -m modal setup on the command line, within the project. Here is more info.

(I’m not affiliated with Modal. I am also not responsible if you run through your Modal credits or wrack up a bill.)

Lunar Lander Upgraded

Lunar Lander was released 10 years ago, and it's a great testing environment for RL, but the graphics are quite primitive and uninspiring, so I upgraded them, as you can see in the gif further up. To get the shiny new version:

import helloRL.utils.sim

env = gym.make('LunarLanderUpgraded-v1', continuous=True, render_mode='rgb_array')

The additional import also isn't required if you're already doing import helloRL.

Getting started with HelloRL

With UV or Pip:

uv add helloRL

pip install helloRL

Example of training Actor Critic:

from helloRL import *

env_name = 'LunarLander-v3'
continuous = True
n_timesteps = 100000

env = gym.make(env_name, continuous=continuous)
state_dim = env.observation_space.shape[0]
action_dim = env.action_space.shape[0]
action_range = torch.tensor(np.stack([env.action_space.low, env.action_space.high]))

actor = StochasticActor(state_dim=state_dim, action_dim=action_dim, action_range=action_range)
critic = Critic(state_dim=state_dim)
agent = Agent(actor=actor, critics=[critic])
params = Params()

returns, lengths = trainer.train(agent, env_name, continuous, params, n_timesteps)

To upgrade from MonteCarlo to A2C rollout:

params = Params(
    rollout_method=RolloutMethodA2C(n_steps=16, n_envs=4),

)

returns, lengths = trainer.train(agent, env_name, continuous, params, n_timesteps)

To add targets from DDPG (an offline algorithm with many other differences):

agent_params = AgentWithTargetsParams(
    tau=0.005
)
agent = AgentWithTargets(actor=actor, critics=[critic], params=agent_params)

And so on...

You can see examples of how to implement each major algorithm within the /notebooks directory.

HelloRL is the first public release from i10e, a robot intelligence research lab based in London.

Built by Andrew Hart (website / X).

Please star this repo ⭐ if you find it useful.

Project details

Release history Release notifications | RSS feed

This version

2.0.1

Feb 10, 2026

2.0

Feb 10, 2026

1.0

Feb 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hellorl-2.0.1.tar.gz (1.2 MB view details)

Uploaded Feb 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hellorl-2.0.1-py3-none-any.whl (1.2 MB view details)

Uploaded Feb 10, 2026 Python 3

File details

Details for the file hellorl-2.0.1.tar.gz.

File metadata

Download URL: hellorl-2.0.1.tar.gz
Upload date: Feb 10, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for hellorl-2.0.1.tar.gz
Algorithm	Hash digest
SHA256	`3111f80ffe3b19ef9b39775f7245fb584dbb0bec2eee3cef5ff5bf9fb944d652`
MD5	`77aa87e98956868165fadac29d307fde`
BLAKE2b-256	`63fc706d7167b826b71c6715559284bd025d3cd5ad3831e6fa13d6b5bb243d37`

See more details on using hashes here.

File details

Details for the file hellorl-2.0.1-py3-none-any.whl.

File metadata

Download URL: hellorl-2.0.1-py3-none-any.whl
Upload date: Feb 10, 2026
Size: 1.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for hellorl-2.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d8c80ea47defed93bced6035604d1e4ca90c17ca136d16b6cea5c33e36018218`
MD5	`bc5bea822951fdcb7be081e076d02310`
BLAKE2b-256	`5db43b5d853a572873a59f0cafb519c2cf7fc60e2bfcc9e15a5df5ddf49a52b2`

See more details on using hashes here.

helloRL 2.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

HelloRL

Why is RL usually so hard?

Introducing HelloRL

Features

Extras

Progress

Modal

Lunar Lander Upgraded

Getting started with HelloRL

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes