An implementation of the DQN algorithm and all improvments of rainbow DQN

Project description

Modular-DQN

Fully modular implementation of rainbow DQN, allowing for each feature to be toggled individually.

Authors

Laurenz Levi Spielmann

Pascal Makossa

Julian Bohnenkämper

This Project was part of the Fachprojekt: Applied Deep Reinforcement Learning at TU Dortmund.

Install

To install run pip install modular-dqn

Requirements

torch
prettytable
gymnasium
opencv-python
wandb (optional)
rich (optional)

Usage

To start learning with our implementation simply execute the modular-dqn command. While we have provided sensible default values for most hyperparameters, they can be adjusted individually.

Required Arguments

Argument	Type	Description
`--env` \| `--environment`	string	Gymnasium environment to learn
`-s` \| `--steps`	int	Number of steps in Training
`-π` \| `--policy`	string	Network to use (MLP

Optional Arguments

Argument	Type	Default	Description
`--device`	string	None	Device used by pytorch (cpu
`--lr` \| `--learning_rate`	float	1e-3	Learning rate used
`-ε` \| `--epsilon`	float	1	Initial epsilon used for epsilon-greedy policy
`--edi` \| `--epsilon_decay_interval`	int	1e3	Epsilon decay step interval
`--eds` \| `--epsilon_decay_step`	float	0.1	Size of epsilon decay step
`--e_min` \| `--epsilon_min`	float	0.1	Minimal epsilon value
`-𝛾` \| `--gamma`	float	0.9	Discount factor for future rewards
`-𝜏` \| `--tau`	float	0.95	Polyak update factor for target network
`--bs` \| `--batch_size`	int	32	Batch size used to update the Q-Function
`--seed`	int	None	Seed for the environment
`--rm_size` \| `--replay_memory_size`	int	1e7	Replay memory maximum capacity
`--rec_trigger`	int	None	Records every `rec_trigger` episodes if provided
`--wandb`	boolean	False	Whether progress should be logged to wandb
`--tags`	List[string]	None	Tags to add to the run on wandb
`--li` \| `--log_interval`	int	10	Number of episodes between logs
`--load_file`	string	None	Relative path where the network should be loaded from if provided
`--optimizer`	string	SGD	Name of optimizer to be used (e.g. SGD
`--skip_frames` \| `--skp`	int	1	The number of frames to skip each step
`--clip` \| `--reward_clipping`	int	None	Set to 0 for hard or any other scale for soft clipping divided by scale
`-α` \| `--alpha`	float	0.5	Alpha for priority replay
`-β` \| `--beta`	float	0.5	Beta for priority replay
`--store_model`	Flag	-	Stores model every `rec_trigger` interval if set
`--ddqn`	Flag	-	Enables double deep Q-Learning
`--per`	Flag	-	Enables prioritized replay memory
`--n_step`	int	1	Sets n-step transition length
`--noisy`	Flag	-	Enables noisy linear layers for exploration
`--dueling`	Flag	-	Uses dueling networks architecture
`--cat` \| `categorical`	Flag	-	Uses categorical dqn loss
`--rainbow`	Flag	-	Enables all improvements (`--n_step` should still be set)
`--kwargs`	Dict	None	Additional kwargs passed to environment on creation (usage
`--progress`	Flag	-	Display progress bar in terminal (requires rich)
`--loss`	string	SmoothL1
`--obs_size`	Tuple	None	Rescale image observations to given size (usage
`--heatmaps`	float	0.0	Heatmap opacity in videos or 0.0 for no heatmaps (only works with CNN and image observations)
`--graphs`	Flag	False	Generates Q-Value graph in videos (only works on linux currently)

Available Optimizers

Name	Optimizer
`Adadelta`	torch.optim.Adadelta
`Adagrad`	torch.optim.Adagrad
`Adam`	torch.optim.Adam
`AdamW`	torch.optim.AdamW
`SparseAdam`	torch.optim.SparseAdam
`Adamax`	torch.optim.Adamax
`ASGD`	torch.optim.ASGD
`LBFGS`	torch.optim.LBFGS
`NAdam`	torch.optim.NAdam
`RAdam`	torch.optim.RAdam
`RMSProp`	torch.optim.RMSprop
`Rprop`	torch.optim.Rprop
`SGD`	torch.optim.SGD

Available Loss Functions

Name	Loss
`L1`	torch.nn.functional.l1_loss
`MSE`	torch.nn.functional.mse_loss
`CrossEntropy`	torch.nn.functional.cross_entropy
`CTC`	torch.nn.functional.ctc_loss
`NLL`	torch.nn.functional.nll_loss
`PoissonNLL`	torch.nn.functional.poisson_nll_loss
`GaussianNLL`	torch.nn.functional.gaussian_nll_loss
`KLDiv`	torch.nn.functional.kl_div
`BCE`	torch.nn.functional.binary_cross_entropy
`Huber`	torch.nn.functional.huber_loss
`SmoothL1`	torch.nn.functional.smooth_l1_loss
`SoftMargin`	torch.nn.functional.soft_margin_loss

Project details

Release history Release notifications | RSS feed

This version

1.0.0

Mar 21, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modulardqn-1.0.0.tar.gz (32.4 kB view hashes)

Uploaded Mar 21, 2024 Source

Built Distribution

modulardqn-1.0.0-py3-none-any.whl (38.4 kB view hashes)

Uploaded Mar 21, 2024 Python 3

Hashes for modulardqn-1.0.0.tar.gz

Hashes for modulardqn-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`bc04035253634b746feaeec4cba3d4873bdba74bdf6966d04b1f7bc94f814cec`
MD5	`65e3cc284d8ff45b87f8d0a29316390c`
BLAKE2b-256	`594b2d9205ee93fd215a06996e3cea29f6c60e48c4cb1e538d1504d1b160cc10`

Hashes for modulardqn-1.0.0-py3-none-any.whl

Hashes for modulardqn-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`99d2318c987eaacfc04d5437c8be2035880c428365d92011275efa1e604bd7ff`
MD5	`7d3ca8237e2169255ed2eb01604837e4`
BLAKE2b-256	`140079a1431921c07de3a3b3f730888a1f616b091f31f05653a0843b71d6d0c5`