Unsupervised pre-training with PPG
Project description
Unsupervised On-Policy Reinforcement Learning
This work combines Active Pre-Training with an On-Policy algorithm, Phasic Policy Gradient.
Active Pre-Training
Is used to pre-train a model free algorithm before defining a downstream task. It calculates the reward based on an estimatie of the particle based entropy of states. This reduces the training time if you want to define various tasks - i.e. robots for a warehouse.
Phasic Policy Gradient
Improved Version of Proximal Policy Optimization, which uses auxiliary epochs to train shared representations between the policy and a value network.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
active-pre-train-ppg-0.0.4.tar.gz
(15.7 kB
view hashes)
Built Distribution
Close
Hashes for active-pre-train-ppg-0.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2f5b0f9e8c93b748fbc45a7967d0ae840450b95bad90d769ab17486ecc89e21 |
|
MD5 | 6847ae504690170972e467ee2f4447e9 |
|
BLAKE2b-256 | 9b3a8e7a4d8375338b19bf43e4328bf4acacd123733fe2e59cde39ec253c5c30 |
Close
Hashes for active_pre_train_ppg-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f0005a886fefe084d086ec1227d8bc864ce1624c883c40aee77e5d08725bdfc |
|
MD5 | 26054373c94f16e536f98b724e6f7b2e |
|
BLAKE2b-256 | cd24e21719e9650d7fdf0bf13fd8f7b27af7590759df21682708818ca2b1431b |