X-Transformer for RL
Project description
x-transformers-rl (wip)
Implementation of a transformer for reinforcement learning using x-transformers
Install
$ pip install x-transformers-rl
Usage
import numpy as np
class Sim:
def reset(self, seed = None):
return np.random.randn(5) # state
def step(self, actions):
return np.random.randn(5), np.random.randn(1), False # state, reward, done
sim = Sim()
# learning
from x_transformers_rl import Learner
learner = Learner(
state_dim = 5,
num_actions = 2,
reward_range = (-1., 1.),
max_timesteps = 10,
world_model = dict(
attn_dim_head = 16,
heads = 4,
depth = 1,
)
)
learner(sim, 100)
Example
Lunar Lander
$ pip install -r requirements.txt
Then
$ python train_lander.py
Citation
@inproceedings{Wang2025EvolutionaryPO,
title = {Evolutionary Policy Optimization},
author = {Jianren Wang and Yifan Su and Abhinav Gupta and Deepak Pathak},
year = {2025},
url = {https://api.semanticscholar.org/CorpusID:277313729}
}
@article{Schulman2017ProximalPO,
title = {Proximal Policy Optimization Algorithms},
author = {John Schulman and Filip Wolski and Prafulla Dhariwal and Alec Radford and Oleg Klimov},
journal = {ArXiv},
year = {2017},
volume = {abs/1707.06347},
url = {https://api.semanticscholar.org/CorpusID:28695052}
}
@article{Farebrother2024StopRT,
title = {Stop Regressing: Training Value Functions via Classification for Scalable Deep RL},
author = {Jesse Farebrother and Jordi Orbay and Quan Ho Vuong and Adrien Ali Taiga and Yevgen Chebotar and Ted Xiao and Alex Irpan and Sergey Levine and Pablo Samuel Castro and Aleksandra Faust and Aviral Kumar and Rishabh Agarwal},
journal = {ArXiv},
year = {2024},
volume = {abs/2403.03950},
url = {https://api.semanticscholar.org/CorpusID:268253088}
}
@article{Lee2025HypersphericalNF,
title = {Hyperspherical Normalization for Scalable Deep Reinforcement Learning},
author = {Hojoon Lee and Youngdo Lee and Takuma Seno and Donghu Kim and Peter Stone and Jaegul Choo},
journal = {ArXiv},
year = {2025},
volume = {abs/2502.15280},
url = {https://api.semanticscholar.org/CorpusID:276558261}
}
@misc{xie2025simplepolicyoptimization,
title = {Simple Policy Optimization},
author = {Zhengpeng Xie and Qiang Zhang and Fan Yang and Marco Hutter and Renjing Xu},
year = {2025},
eprint = {2401.16025},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2401.16025},
}
@misc{cheng2025reasoningexplorationentropyperspective,
title = {Reasoning with Exploration: An Entropy Perspective on Reinforcement Learning for LLMs},
author = {Daixuan Cheng and Shaohan Huang and Xuekai Zhu and Bo Dai and Wayne Xin Zhao and Zhenliang Zhang and Furu Wei},
year = {2025},
eprint = {2506.14758},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2506.14758},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
x_transformers_rl-0.0.99.tar.gz
(22.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file x_transformers_rl-0.0.99.tar.gz.
File metadata
- Download URL: x_transformers_rl-0.0.99.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e35ef989439e125a8b9981ea769b1c2297d92f8bd40d14f127c950aeda14dc1
|
|
| MD5 |
c47115a76b5486e8d76c7fb70c59789d
|
|
| BLAKE2b-256 |
2b0ae32753e249a393c3225082b0a558ef15a0ed843783f026fde0c32d6be304
|
File details
Details for the file x_transformers_rl-0.0.99-py3-none-any.whl.
File metadata
- Download URL: x_transformers_rl-0.0.99-py3-none-any.whl
- Upload date:
- Size: 21.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15e1e189c6c324a17eeec4013a48f741c32b363ffdcd79e43f1319e2c4d58c32
|
|
| MD5 |
d150e6ab15fcbcb30cb8b44df4d38205
|
|
| BLAKE2b-256 |
83c90973fc19fd45f27d3125091622b34562edc917cc04f07f39c23fdf2bfa24
|