All Forms of Reinforcement Learning
Project description
AFRL - All Forms of Reinforcement Learning
The main goal of this project is to provide a framework for reinforcement learning research. The framework is designed to be modular and easy to extend. It is written in Python and built on top of PyTorch. The framework is still in its early stages and is under active development.
Usage
The framework is designed to be modular and easy to extend. The main components of the framework are:
- Environments: The environments are the tasks that the agent is trying to solve.
- Agents: The agents are the algorithms that are trying to solve the environment.
- Trainers: The trainers are the methods the agents are training with.
Here's the list of the trainers along with their current implementation status:
- DQN
- DDQN
- Dueling DQN
- Double Dueling DQN
- DDPG
- SDDPG
- Prioritized Experience Replay
- TRPO
- PPO
- TD-Lambda
- SARSA
- REINFORCE
- Actor-Critic
- A2C
- A3C
- SAC
To make it easier to choose the right trainer for the right environment, here's a table that shows the different trainers and the environments that they support:
Method | Discrete Action Space | Continuous Action Space | Single-Agent | Multi-Agent | Low-Dimensional Obs | High-Dimensional Obs |
---|---|---|---|---|---|---|
DQN | ✔️ | ❌ | ✔️ | ❌ | ✔️ | ❌ |
DDQN | ✔️ | ❌ | ✔️ | ❌ | ✔️ | ❌ |
Dueling DQN | ✔️ | ❌ | ✔️ | ❌ | ✔️ | ❌ |
Double Dueling DQN | ✔️ | ❌ | ✔️ | ❌ | ✔️ | ❌ |
DDPG | ❌ | ✔️ | ✔️ | ❌ | ✔️ | ✔️ |
SDDPG | ❌ | ✔️ | ✔️ | ❌ | ✔️ | ✔️ |
Prioritized Experience Replay | ✔️ | ✔️ | ✔️ | ❌ | ✔️ | ❌ |
TRPO | ✔️ | ✔️ | ✔️ | ❌ | ✔️ | ✔️ |
PPO | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
TD-Lambda | ✔️ | ✔️ | ✔️ | ❌ | ✔️ | ❌ |
SARSA | ✔️ | ❌ | ✔️ | ❌ | ✔️ | ❌ |
REINFORCE | ✔️ | ✔️ | ✔️ | ❌ | ✔️ | ✔️ |
Actor-Critic | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
A2C | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
A3C | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
SAC | ❌ | ✔️ | ✔️ | ❌ | ✔️ | ✔️ |
Characteristics of Different Environments for RL Algorithms
Here is a list of characteristics of different environments, along with some environment examples that have those characteristics and some generic rules of thumb in order to choose the right method for your environment:
1. Discrete vs Continuous Action Space
- Examples:
- Discrete: Tic-Tac-Toe, Grid Worlds
- Continuous: Robotic arm control, Portfolio management
- Rules of Thumb:
- For discrete action spaces, DQN variants and PPO are commonly used.
- For continuous action spaces, DDPG and SAC offer better performance.
2. Single-Agent vs Multi-Agent
- Examples:
- Single-Agent: Mountain Car, Cartpole
- Multi-Agent: Poker, Multi-robot coordination
- Rules of Thumb:
- Single-agent tasks often use DQN, PPO, or DDPG.
- Multi-agent tasks typically benefit from specialized algorithms like MADDPG, or generic methods like PPO that are adapted for multi-agent scenarios.
3. Low-Dimensional vs High-Dimensional Observation Space
- Examples:
- Low-Dimensional: Frozen Lake, Taxi-v3
- High-Dimensional: Atari games, Visual navigation
- Rules of Thumb:
- Low-dimensional problems can be tackled with simpler algorithms like SARSA or TD-Lambda.
- High-dimensional problems often require methods capable of handling complex function approximation, such as Convolutional Neural Networks (CNNs) in DQN for image-based tasks.
4. Partially Observable vs Fully Observable
- Examples:
- Partially Observable: Poker, Hide and Seek
- Fully Observable: Chess, Go
- Rules of Thumb:
- Fully observable environments can make use of simpler methods such as DQN or PPO.
- Partially observable environments may require methods with memory capabilities like LSTM or GRU incorporated into the algorithm (e.g., DRQN).
5. Sparse vs Dense Reward
- Examples:
- Sparse Reward: Maze navigation, robotic grasping
- Dense Reward: Cartpole, Lunar Lander
- Rules of Thumb:
- Dense reward problems can use most algorithms effectively.
- Sparse reward problems often benefit from algorithms designed for exploration, such as those utilizing curiosity-driven mechanisms or hierarchical methods.
By considering these environment characteristics and rules of thumb, one can make a more informed decision when selecting an appropriate reinforcement learning algorithm for a specific task.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file afrl-0.0.1.4.tar.gz
.
File metadata
- Download URL: afrl-0.0.1.4.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e9a30e0230129818ec14fecef74bb7b9634ee6dab5e7be4af790e0f2ded8279 |
|
MD5 | 471fd6f275ee6ac582be1a01ba1f6396 |
|
BLAKE2b-256 | 9482cba4cd9385363eadc3d34f246b566a4ac96b228ab194b9b5d0bd36c4d604 |
File details
Details for the file afrl-0.0.1.4-py3-none-any.whl
.
File metadata
- Download URL: afrl-0.0.1.4-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f8431831b098c4784e2a9b518710c3120eeb04089d322e3fadf91a041813190 |
|
MD5 | 15bd07758c7c6e51967faf6c33ba6cdc |
|
BLAKE2b-256 | 046115c0cf74fa800ad90713e6456e8974a7f383935db812a7a7a61457ee28ff |