Unsupervised pre-training with PPG
Project description
Unsupervised On-Policy Reinforcement Learning
This work combines Active Pre-Training with an On-Policy algorithm, Phasic Policy Gradient.
Active Pre-Training
Is used to pre-train a model free algorithm before defining a downstream task. It calculates the reward based on an estimatie of the particle based entropy of states. This reduces the training time if you want to define various tasks - i.e. robots for a warehouse.
Phasic Policy Gradient
Improved Version of Proximal Policy Optimization, which uses auxiliary epochs to train shared representations between the policy and a value network.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
active-pre-train-ppg-0.0.7.tar.gz
(15.7 kB
view hashes)
Built Distribution
Close
Hashes for active-pre-train-ppg-0.0.7.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9e39561cb4f2ad5049d64369624bf224b0383208c50e87dafefb86f7f81601d |
|
MD5 | b1a81649cbb9ab5f7a6a4a89fc487467 |
|
BLAKE2b-256 | d1c8a014ea2420f6af026043ce0c604f69aaf91de725f1170f4de4890ec240a8 |
Close
Hashes for active_pre_train_ppg-0.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5dfa76f5b2994f63f0a39e736f2d249cf16bdd8c858e944d087843b25e236ec |
|
MD5 | a281e869761ad65e65d616e37bed19a8 |
|
BLAKE2b-256 | e57b227bba5899c960d70234f7546d3a70596175cb96c19653bd2b4e207f2dab |