Unsupervised pre-training with PPG
Project description
Unsupervised On-Policy Reinforcement Learning
This work combines Active Pre-Training with an On-Policy algorithm, Phasic Policy Gradient.
Active Pre-Training
Is used to pre-train a model free algorithm before defining a downstream task. It calculates the reward based on an estimatie of the particle based entropy of states. This reduces the training time if you want to define various tasks - i.e. robots for a warehouse.
Phasic Policy Gradient
Improved Version of Proximal Policy Optimization, which uses auxiliary epochs to train shared representations between the policy and a value network.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
active-pre-train-ppg-0.0.8.tar.gz
(15.7 kB
view hashes)
Built Distribution
Close
Hashes for active-pre-train-ppg-0.0.8.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93457c1c3f49ea2805642163507288ecfdbdc94c9a706b49801902b374bcaaf4 |
|
MD5 | 0cec392ad9e334b06782ccbad3961836 |
|
BLAKE2b-256 | 63697185e02d407c3d8f4a1e2298d82bbbbd485d53654a39c0a2680811338fe6 |
Close
Hashes for active_pre_train_ppg-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0448ec3ffb617a79f1ca82de2f4fa66dbc56ab27cf04e5789d74859edc2f6190 |
|
MD5 | 27ac9b5a63722df0211df059aedf3c6d |
|
BLAKE2b-256 | abbcbccccf41a2db157fd2aabf1fcc47b6c7267e6a4897dcd026b7e2e7715c72 |